THE NOTES OF "Concurrent programs wait faster" by Tony Hoare at MSR TechFest 2003 Extracted from ppt file by Oyvind Teig, Norway See https://web.archive.org/web/20120313101852/http://research.microsoft.com/en-us/people/thoare/concurrentprogramswaitfaster.ppt Also see https://www.teigfam.net/oyvind/home/technology/062-waiting-faster/ start --- 1 In the modern world, many programs spend much of their time waiting. Unfortunately, so do their users. To reduce both kinds of waiting time, it is necessary for the programs to wait faster. Wait times can be significantly reduced by waiting for two or more events concurrently - either waiting for both of them, or (more effectively) waiting just for just one of them, whichever happens first. 2 I will explain how this is achieved by an example, taken from real life. Most days I go to work by bus. There are two bus routes that pass between my home and my office: a city route, number 9, and a country route, number 7. Let P stand for a decision to take the more interesting city route; and let Q stand for the decision to take the country route, nice in good weather. If I decide P , I find my average waiting time at the bus stop is twenty eight minutes. If I decide Q , I have to wait on average the same time. The busses arrive independently at random, at widely varying intervals. Since the journey time is only ten minutes, I really have a strong motive to reduce my time spent waiting at the bus stop. There is one remarkable way that I can do this: suppose I do not decide in advance on which bus to take, but just take the first bus that arrives; then I find that my average wait time goes down to just eleven minutes! I get to work much quicker, and I have done it, not by speeding up the bus, but simply by waiting twice as fast, or more! You can tell I am waiting faster, because I'm waiting less than half as long. I will now spend a little time to explain this paradox of waiting times, and show how the same technique applied inside a computer can make concurrent programs wait faster. 3 Here is an example calculation of how long one has to wait for an event to occur. Since the times are highly variable, you needs first to make a rough estimate of the probabilities involved. On this slide I have drawn a histogram of waiting times. The first column shows that there is a ninety percent chance that the wait time will be only one time unit (say a second), the second column shows a nine percent chance that it will be ten seconds, and the third column shows a one percent chance that the wait will be as much as a hundred seconds. Of course, real busses will arrive at many other intervals than these, but this kind of discrete approximation is good enough for most practical purposes. To calculate the average, multiply the height of each column by the value written at its base. Then add them up, as shown in the equation shown above the histogram. The overall average delay is 2.8 seconds. We will assume that the wait times for the three different waits for the three events P, Q, and R, are independent random variables, all with this same distribution, so they all have this same average. 4 The whole point of waiting simultaneously for the first of two events is in order to wait faster. The multiple wait actually reduces the delay to minimum of the delays of the two events. You might expect it to halve the delay; but for the particular probabilities that I have chosen in my example, the multiple wait actually reduces the average delay from 2.8 seconds to just under 1.1 ! The calculation is shown by the histogram on this slide. The paradox of waiting faster is explained because it is extremely unlikely that both events will be delayed by a hundred seconds - only one chance in ten thousand. So the contribution of this case (0.01 sec) to the average becomes negligible. 5 Of course, it is computer programs that we are really interested in, and programs do not wait for busses. To generalise our story, we will use letters P, Q and R to stand for distinct and independent events that a computer program may have to wait for; for example an input or an output from the user or from a remote site, or perhaps the acquisition of a lock or some other resource, or perhaps just an alarm call which occurs after a specified interval. In all these cases, the event that ends the wait is considered to be atomic, and either occurs completely or not at all; anything which is not atomic in this sense has to be regarded as more than one event. With a suitable implementation, a transaction or a remote procedure can also be regarded as atomic in this sense. 6 These simple examples are atomic events are taken from the Win32 API. The first is an alarm call setting, which waits for almost exactly three seconds. The second line is a call of a primitive operation of input from the user's console. The third is a simple wait for occurrence of an event with pre-assigned handle p. The second parameter of the wait gives a time-out after a specified interval. But in this example, I have disabled the time-out by making it infinite. This is for the sake of simplicity. In practice, I strongly recommend that all waits should be carefully protected by a time-out set to an appropriate duration. 7 This is one of the possible implementations of a wait for the first of two events, which have been given handles p and q ; we assume these handles have been created and assigned before starting this fragment of code. The body of the method calls the Win32 function WaitForMultipleObjects, passing to it a temporary array of handles which is initialised to contain just these two events. The last parameter of the function is a time-out interval, set to INFINITE for simplicity. The falsity of the third parameter indicates that the wait is just for the first of the two events rather than for both of them. If either event has an outstanding unacknowledged signal, the method returns immediately; otherwise it waits. When the wait terminates, the result indicates which of the two events has actually been signalled - giving the lower index if both have occurred. The conditional then deals with the selected event, thereby acknowledging the signal. If both events have occurred before the return, the one that has not been dealt with remains unacknowledged. Proper treatment of errors has been removed so that the code will fit on one slide. In this and subsequent examples, our implementation assumes that no event will be shared by two concurrent threads. If it is, a lock will be needed around the code to enforce exclusion. 8 Here is another implementation of waiting for the first of two events. In this case, the events are read operations on non-blocking windows sockets. The sockets are first collected in a temporary collection, and the select function waits for the first read to be ready. When this happens, the newly ready socket (or sockets) is marked as ready; the function FD_ISSET then tests whether its first parameter is marked as ready, in which case the read operation is performed. Otherwise, the second parameter must be ready, and it is read instead. The reading automatically removes the ready marker. If both sockets are ready, the above implementation makes an arbitrary choice of the first parameter. It might equally well choose the second. It would be dangerous to care about this choice, because it is dependent on timing, which we usually cannot control. I will explain the problem of race conditions more fully later. Further details of the API are given below. int select(int nfds, fd_set* readfds, fd_set* writefds, fd_set* exceptfds, const struct timeval* timeout); Parameters: nfds [in] Ignored. Included only for compatibility with Berkeley sockets. readfds [in, out] Optional pointer to a set of sockets to be checked for readability. writefds [in, out] Optional pointer to a set of sockets to be checked for writability exceptfds [in, out] Optional pointer to a set of sockets to be checked for errors. timeout [in] Maximum time for select to wait, provided in the form of a TIMEVAL structure. Set the timeout parameter to NULL for blocking operation (i.e. an INFINITE WAIT). Return Values: The select function returns the total number of socket handles that are ready and contained in the fd_set structures, zero if the time limit expired, or SOCKET_ERROR if an error occurred. If the return value is SOCKET_ERROR, WSAGetLastError can be used to retrieve a specific error code. The fd_set is a data structure indicating the set of socket handles that are being waited upon by the select call. This is initialised to the empty set with the macro call FD_ZERO(). A socket handle is added to the set with the macro call FD_SET(). When the select() function returns, the FD_ISSET() macro is used to check if a socket event is being signalled for a given socket handle. If a socket event has been signalled following a select call, that socket handle is implicitly removed from the fd_set and must be reset with FD_SET() if you wish to include it in the active socket handle set for a subsequent call to select(). Further details on the Winsock API can be found in the MSDN documentation http://msdn.microsoft.com. 9 Here is another effective and efficient way of waiting concurrently -- wait for both of the events to occur, in either order. Let me illustrate it again by my analogy with busses. My two children go to two different schools, each served by a different bus line (in Britain we do not have special school buses). My wife takes both children to the bus stop in the morning, and waits there to make sure that both of them get on their proper busses. Let us suppose that the wait times for the busses are the same as I have described before. How long does my wife have to wait at the stop? Obviously, she has to wait not for the first but for the second of the two busses to arrive. It turns out she has to wait 45 minutes on average, four times as long as me, when I wait for just the first of the busses. 10 Waiting for both of two events involves the larger of two delays, and can be quite a lot worse than waiting for the smaller. When one of the busses suffers a long delay, the passenger who waits only for the first bus does not notice, but the passenger who waits for both of them suffers badly. And the probability of this is now quite high - nearly two percent. That is why the third column of the histogram makes the largest contribution (nearly two) to the average. 11 Returning to the topic of programs, the implementation of both is very similar to that of first , except that the third parameter of the waiting function is set to true rather than false; this indicates that all the events in the array of handles must occur before the wait terminates. 12 Here is the implementation using sockets. First it waits for the first of the two sockets to become ready, and next, if necessary, it waits for the remaining one. The first call on select marks the newly ready socket or sockets from temp , and the second call ignores all sockets in temp that are already ready. The implementation shown above is unnecessarily sequential. With care, a socket can be read as soon as it is known that it is ready. 13 The worst of all ways of waiting is sequential, waiting for two events in a particular order. Consider again the example from real life. My wife was really happy when our daughter was old enough to go to the bus stop by herself, and to take her younger brother with her. The daughter took responsibility for putting the boy on a number seven bus, before she could catch her own number nine bus to take her to her school. That is why she had to wait for the two busses in the right order, first a number seven for her brother, then a number nine for herself. As a result, her average wait time increased yet further to fifty six minutes, the arithmetic sum of the wait times for the two busses separately. The problem with sequential waiting is that she had to miss every number-nine bus that passes when waiting for the first number-seven bus to arrive, as well as ignoring every number-seven bus that passes afterwards. Of course, all our calculations have assumed that all the delays are independently distributed, which is sometimes far from the case. 14 Returning to the topic of programs, the simplest implementation of then is just sequential composition, denoted in the C programming language by semi-colon. We will see that, in certain contexts, a more complicated implementation is needed. 15 This slide show an interaction effect between sequential composition and the first operator. The timing chart shows the bad case when the first event P of a sequence occurs first, and the second event Q occurs last. As a result, the commitment to P is followed by the long wait for Q . It would have been better in this case to wait a bit longer for R , because the subsequent wait for S happens to be much shorter that the wait for Q . But even computers are not blessed with the foresight that would be needed to implement this strategy. 16 Here is an interesting interaction between the then operator and the both operator. The equation shown on this slide states exactly the serialisability property of atomic events or transactions. It says that the concurrent execution of two transactions is guaranteed to have the same effect as their serial execution in either order. The choice of order is not arbitrary; it is made when the first of the two events actually occurs. There is an important application of this law. It is well known that for small threads, the overhead of thread creation is too high. It is then a good idea to transform a concurrent program involving the operator both into a purely sequential program, using then and first . The transformation is achieved by application of the algebraic law shown above. In future, we may hope that such optimisations will be made by an optimising compiler or some other programmer productivity tool. 17 I have mentioned that then is often simply implemented as sequential composition. But more care is needed when it appears in association with a wait for the first of two events. We need to represent the commitment to the choice that is made when the first of the two events has occurred. In the above program, the first event is either P or R . The function WaitForMultipleObject indicates in its result which of the alternative events has terminated the wait. This is used to select the corresponding continuation, Q in one case and S in the other. 18 Here is the implementation of the same construction in terms of windows sockets. I will not explain it further. The design patterns that I have shown in this lecture are the work of Dinan Gunawardena, my colleague at Microsoft Research at Cambridge. I am grateful for his co-operation in making them available to show you. 19 Here is a summary of our calculations. As mentioned before, the expected value of the sequential wait (row 5) is the sum of the average separate waits, 2.8 plus 2.8. More surprisingly we get the same sum when we add the expected wait for the first event (row 3) to the expected wait for both events (row 4); 1.1 plus 4.5 is also 5.6. This is a useful check on the arithmetic. The explanation lies in an algebraic law which tells us that the sum of two numbers is equal to the sum of their minimum plus their maximum X + Y = (X min Y) + (X max Y) X min Y is precisely the time spent waiting for the first of two events, and X max Y is exactly the time spent waiting for the second of them. And averages distribute through sums, so the law applies to the averages of these two waiting times too. 20 The particular numerical results that I have calculated are dependent on my particular choice of example distribution of wait times, which has a high variance. But other distributions give similar results, though less spectacular. The Poisson distribution is a commonly used approximation for wait times for random events; it makes waiting for the first of two events three times better than waiting for both of them. For the uniform distribution, the wait for a single event is twice as good. But if the wait time is an exactly predictable constant (for example an alarm call, which is due in exactly 2.8 seconds), the variance is zero; and the difference between waiting for the first event and waiting for both of them disappears. For example, suppose a taxi always takes exactly 28 minutes to answer a call. Then there is no advantage in calling two taxis, and taking the first that arrives. Note that in all cases the sum of the row equals 5.6. I have given you the proof of this fact, and it is independent of the choice of any particular probability distribution for waiting times. 21 Let me end my story with a warning about non-determinism, of the kind that can arise from race conditions. Whenever you wait concurrently for more than one event, you lose control of which actual event will occur, because you cannot predict or control the timing. The events may even appear to occur at the same time. You have to be prepared for all eventualities. This is the inevitable price that must be paid for faster waiting. Let me illustrate the danger of non-determinism by my real-life example. One day, the lady who lives next door to me is recruited to my Company, and we decide to travel to work together. Because I don't really care, I always allow her the choice of two routes through the city or through the country. But after a while, I find that my hours at the office have been significantly reduced. Looking back, I find that my average wait time has been over 40 minutes rather than just eleven. This seems to be because my colleague nearly always chooses the second of the two busses that arrives. Is this because she enjoys my company so much? Or is it because she doesn't want me to be promoted ahead of her? Well, that was a risk that I took when I relinquished control over the choice of the busses. When I say I don't care, I had really better make sure that I do not. By the way, all the events characters in my little stories about the busses are entirely fictitious, including even myself ! The only realism in my story is its description of the level of service offered by the public transport system in Britain. 22 I conclude with a list of bullet-points, relating what I have said to what you are probably doing already. Firstly, planning the waiting strategy of a program is just as important as planning its functionality, both for its performance and for its responsiveness to its users and its environment. So identify all points in your program where it may wait - I believe there will soon be tools to help you do this. Make sure that each wait waits for as many events as could possibly happen. Estimate typical and acceptable wait durations, and protect each wait by a time-out so as to deal with the extreme cases. For any non-trivial wait, try to release resources before the start, and acquire them again afterwards if necessary. 23 In order to use concurrent waiting safely, I suggest the following guidelines. Firstly, don't allow two threads to wait simultaneously for the same event. The resulting non-determinism could be dangerous. Remember that during a wait, there are many things that may change - obviously the real time clock can move on, but also many other properties of the environment may change, including the values of any global variables that are not protected by locks. Everything that matters must be tested again after the wait. If you assume that a certain condition does not change during the wait, write this condition as an assertion (called an invariant) in the code, so that its violation can be detected during test. For safety, write the same assertion before and after each wait. In that way, if any other concurrent thread violates the invariant, it will be detected. Finally, I wish the best success in exploiting the possibilities of concurrent waiting in your own programs. May all your waits be both fast and safe. --- end