Eventual concurrency

Contents

1 Intro
2 Disclaimer
3 The problem
4 Concurrency
5 Process model
6 Some process models
7 Event-loops vs. blocking processes
8 Asynchronous with messages
9 select, epoll and libevent
10 Conclusion
11 Backlog
12 Notepad
13 References

Started 12Sep2013, finished 19Sep2013. Updated 28Jan2014
This page is in group Technology.

Intro

In this blog page I will try to understand how so-called «single threaded» software architectures based on «event-loop concurrency» and callbacks (where mostly all blocking is «evil»), differs from light-weight processes that communicate using channels (where any needed non-blocking is built explicitly (or by using buffered channels)). Updates:

I think Joe Duffy at Microsoft describes a rather similar problem as «continuation-passing or lightweight blocking coroutines» [09]
Sept2014: My blog note Not so blocking after all puts the «blocking» word in contexts

Disclaimer

I am not a computer scientist. I am an embedded programmer over some 30+ years. But I can’t seem to stop writing, exposing what I painfully realise in the process: every time I try to read myself up and quietly open a door, there is a roar of knowledge in the new room. This is where I think you are, and I should really just listen. But still:

This blog is a follow-up from my blog Pike & Sutter: Concurrency vs. Concurrency as well as from the naïve Block play for blockers. Plus from some more academic papers, starting here, and a host of notes visible on the right side here. My two most recent papers may also be directly read from the «PUBLICATIONS» menu above. They are about what I call XCHANs and FEATHERING (left urls pointing to their summary places).

The problem

I am anthropomorphizing: the role-playing below describes software roles, as if the software were I, you, him/her, they or even friends. Or MYBOSS, ME and MYFRIEND and AFRIEND:

                   ____                              _________ 
                  | ME |1==========================>| AFRIEND |
                  |    |<---------------------------|         |\
                  |FREE|                            |         | \ 
 ________         |----|         __________         |   DO    |  s
| MYBOSS |2======>|    |2======>| MYFRIEND |        |         |  l
|        |<-------|    |<-------|          |        |   DO    |  o
|--------|        |----|        |    DO    |\       |         |  w
|  FREE  |3-->    |FREE|3-->    |          | \      |   DO    | /
|--------|        |----|        |    DO    | |      |         |/
| THANKS |1<==    |    |1<==    |          | /  1<==|   OK    | 
| THANKS |2<======|    |2<======|    OK    |/       |_________|
|________|        |____|        |__________|

When I ask AFRIEND a favour it may take time to finish, and I want to be free and do something else in the meantime. Like answering MYBOSS who asks me a favour: to ask another of MYFRIENDs for another favour. And I want to do something else also in that meantime.

The problem is that I don’t know how long any of my requests will take. I only know that I want to take the requests that are asked of me as immediately as possible; since my requesters also want to do something else in the meantime – because they too won’t know how long time I (and my friends) would use. It’s easy to lose track.

Concurrency

In this case I assume the worst: both MYBOSS, ME, AFRIEND and MYFRIEND are software sharing the same processor, even the same core. But AFRIEND or MYFRIEND may use a potentially slower medium to find the information or do the service. Let me further assume that I really ask the same friend to look up something for me, but the requests may differ (or they may be equal). The answer may be found on an external disk or in a cash since somebody else just got the same reply. Surely I don’t want to hold a requester for a reply, since that could potentially make other requesters block.

So, blocking is ‘evil’. At application level. Requests and replies – events – must flow concurrently, they must feel simultaneous. It’s actually part of the specification. Or to be more precise, since my query at any time could become stored, and it could be attempted to be read back even before writing has been finished, any necessary waiting must be as short as possible, perhaps ‘unnoticeable’.

(However, if the request was to paint a wall, a new request would have to wait (block) until the first paint was dry. Or wait (block) for a reply from an interplanetary spaceflight. But these may be within the specification if the painter can only paint and there is only one wall, or the commander’s only job is to listen for the reply. At this level the specification states that they have nothing else to do. Others could go on, but not these.)

Also, I want any answer I receive from my server and then present back to my client to be the one that’s related to the original query. And I want them not to cross, so that an older reply is not overtaken by a new. All this per session, from knock and opened door to closed door.

«Eventual concurrency». We can agree on that. But much of the above specification isn’t as obvious as you may think.

Process model

With object oriented programming it’s the object that is the basic entity. In Java, every object has a seed of concurrency, since every object’s base class, the Object, has methods that will enable it to run in a loop without spinning round forever. The process model includes code with this special kind of loop that in the clean case will only have communication with other such processes as side effects. It could update a line of text in a display, but for another process to know what’s written there, it would be easier to query the process than to read the line from the unprotected display memory. During a millisecond the display line could contain half and half of new and old line text. However, a proper query to the object for the text may be queued up for access, be alone and synchronized after the query has been accepted and the full new line text is filled up for the reply. It’s safe if it’s simple.

But a non-process object will also have its own «process model». A querier will often have sent off a reference to a handling function to the server. When the server has finished it finds the reference and does a callback to the handler with the result. But the handler also needs to get rid of the reply somewhere, and it’s nothing new that such server functions can not step on each other’s toes. So, correct encapsulation is also important.

These servers may have persistent state or storage. It’s hardly a good idea to expose this to any other object, perhaps not even a sub-class object(?). A process could contain a finite state machine (FSM) or be stateless, or anything in between. State machines may be «adequate» and «powerful» but state-less designs may be even more of the same. Quotes are from Rob Pike’s paper [01], I’ll come back to it, since it is extraordinary interesting (and a blow to event-loop programming).

It may not be smart to do the callback immediately; it could be stored for later use in some sort of queue. To get all this ticking there often is a scheduler or an event-loop. It may pick callback info or data messages or timeouts when one process has come to its temporary completion. This completion could also be blocking or synchronized.

Also, how a process (thread, callback’ed function) waits for something, as well as how that mechanism scales are important parts of the process model.

Smallest memory requirement and context switching time also are factors that group process models – from inexpensive to expensive.

Some process models

In the Linux world processes execute concurrently at any time and shared resources must be protected. Threads execute inside processes but not simultaneously, and scheduling of the threads is done at communication points; see [02] (Pike) for more.

In Go there are concurrent goroutines which basically are scheduled at synchronization points, i.e. when a channel is blocking. That is provided only channels are used for communication between goroutines, not shared memory and locks. Goroutines are inexpensive, so you would see them used abundantly.

Event-loops vs. blocking processes

While I was reading myself up for this note, a thread at golang-nuts pointed me to another paper by Rob Pike: «A Concurrent Window System«. If you are still curious, please read that paper, see [01]. Then you can come back here and read a listed summary, made of quotes from this anno 1989 paper. I have emphasized the event-loop vs. blocking processes view, not the window system itself. Red text by me:

«Traditional window systems offer their clients — the programs that call upon them for services — an interface consisting primarily of a graphics library and a stream of ‘events’: tokens representing key presses, mouse motion and so on. The clients of these systems tend to be written as state machines with transitions triggered by these events. Although this style of programming is adequate, it is uncomfortable; state machines are powerful but inconvenient.» (p1)
«In contrast to systems such as NeWS, which allow their clients to be written using concurrent techniques, this (new) system requires its clients to be written concurrently. The only interface to the window system is through parallel synchronous communication channels. The main observation from this exercise is the importance of specifying the interface between the window system and its clients succinctly and completely. Given such a specification, it becomes possible to interconnect small programs, each of which implements a piece of the specification, to form larger interactive applications in a manner similar to the pipeline approach to text processing on the UNIX system.» (p2)
«The behavior of the mouse is harder to model. It has three buttons that go up and down and two dimensions of translational motion. The usual solution is to represent the mouse’s behavior by a series of events: button down, button up, motion, motion while button down, etc. It is simpler instead to track the mouse by a series of synchronous messages reporting the entire state of the mouse.» (p4)
«The mouse may be programmed as if it were being polled, which is probably the simplest way to write mouse software. An event-based interface would instead have to be programmed to reconstitute the state so the condition may be tested. Why provide a complex interface when the programming language can already handle all the complexity required? Events add unnecessary complexity to the problem of interpreting the mouse. But we can only avoid that complexity when we can write code to read the mouse channel independently of the code that handles the keyboard; otherwise, the event mechanism is the only way to collect both mouse and keyboard actions. To keep those two tasks separate, we need processes.» (p5)
«The channels that carry messages between processes in Newsqueak, and hence within the window system, are synchronous, bufferless channels as in Communicating Sequential Processes (CSP) or Squeak, but carry objects of a specific type, not just integers [Hoare 78, Card 85].» (p6)
«This sounds very much like the usual event-based loop of most window systems and their applications. The main difference is that none of the software needs to be written as state machines. The problems have been decoupled, and the individual components — window multiplexer, clients, and mouse and keyboard handlers for the individual clients — can execute concurrently, independently, and without explicit state. Simpler software results.» (p6)
«What happens when a user of the system wants to delete a window? The usual method is to send an event or, worse, an asynchronous poisonous message (a ‘signal’ in the UNIX system) to the client, terminating it suddenly. This brutality is avoidable; the system can just notify the client, using the same synchronous methods, that it is being asked to exit. When the client is ready, it can exit by the same method as above. In other words, instead of the client being killed, it can be asked to leave.» (p7)
«The price to pay for this style is twofold: the need to write in a concurrent language, using novel techniques; and the possibility of deadlock.» .. «A deadlock is really nothing more than a peculiar form of bug, and any method for eliminating bugs will work for deadlocks. The best method is to design the system to be bug-free.» (p8)
«Although the design is easily implemented in Newsqueak — the language was built for the task — it can be put together in more traditional environments. Any system that allows synchronous message passing and multiplexing can be used to construct a synchronous window system. The interprocess communication tools in most UNIX systems, particularly pipes plus the select or poll system calls, are sufficient to implement this design [UNIX 4BSD, UNIX SYSV]. Although such a system would involve substantially more code than the Newsqueak version, it would still be conceptually simpler than a conventional window system.» (p10)
«Conclusions. Window systems are not inherently complex. They seem complex because we traditionally write them, and their client applications, as single processes controlled by an asynchronous, event-driven interface. We force them into the mold of real-time software. They can be simplified greatly by writing them as multiple processes, each with individual, synchronous communication channels on which to receive data and control information. It is possible to write a complete, useful window system, comparable in basic power to commercial systems, using just a few hundred lines of code in a concurrent language. Even in traditional languages, simplicity can be achieved by replacing event-driven interfaces with synchronous inter- faces and some easily-provided multiplexing functions. An interface based on synchronous communication allows a novel style of construction that permits interactive applications to be assembled from piece parts, much as in standard UNIX pipelines.» (p10)

I have heard event-loops being haunted and praised, but after so many years this is the first time that it has been placed before my eyes that clearly. Thanks, Rob – you saved this blog page!

But I need two thoughts in my head at the same time now: after all CSP (and its textual CSPm language), breaks matters down to states – even synchronous channels. Finite State Machines also have come to stay. They are good abstractions both for formal verification and application level programming. State machines are also used extensively in SDL. Some things have states, but some things we have seen may best exist without them. Another thing: stateless programming is HR (Highly Recommended) for the highest Safety Integrity Level (SIL 4) in the IEC 61508 safety critical standard. So maybe I even need three or four thoughts in my head! But my toolbox in my wood shop has even more tools!

Asynchronous with messages

I presented this blog note at work, and also showed Pike’s above paper. He’s advocating processes and synchronous full messages, as opposed to event-loops, callbacks and parameter-less signals (concrete events). This causes simpler and more stateless sw components.

I wonder, if Pike now in 2013 had used Go for his window system, would that particular example be as simple with buffered channels or would he prefer unbuffered? (Not taking the buffered vs. unbuffered discussion here.)

And when it comes to loosely coupled FSMs with full messages (Finite State Machines, as used the mentioned SDL blog note) – would that also make the window system as simple as in Pike’s paper? Holzmann has shown that communicating FSM machines may equally well be designed as asynchronous/non-blocking or synchronous/blocking [08].

select, epoll and libevent

In the world I am reading myself up on (discovering its complexities, not its simplicities) there is not only a select [04]. Select does not scale well since it needs to move a lot of bytes from kernel to user space every time, so the max number of file descriptors is therefore the set limit FD_SETSIZE (1024). It also has to look at all the file descriptors every time, and scales as O(n).

So, epoll [05] was invented for Linux:

«epoll is a new system call introduced in Linux 2.6 (sic: 2.5.44 of Oct 2002). It is designed to replace the deprecated (sic: «older») select (and also poll). Unlike these earlier system calls, which are O(n), epoll is an O(1) algorithm – this means that it scales well as the number of watched file descriptors increase. select uses a linear search through the list of watched file descriptors, which causes its O(n) behaviour, whereas epoll uses callbacks in the kernel file structure». [06]

Another mechanism is libevent [07]. I have heard that it is implemented as select or epoll, depending on what’s best and available. I read at [07] that

«The libevent API provides a mechanism to execute a callback function when a specific event occurs on a file descriptor or after a timeout has been reached. … libevent is meant to replace the event loop found in event-driven network servers. An application can just call event_dispatch() and then add or remove events dynamically without having to change the event loop. … Using callbacks on signals, libevent makes it possible to write «secure» signal handlers as none of the user supplied signal handling code runs in the signal’s context.»

When I read about these, and read the comments about how to avoid the pitfalls and problems associated with this or that way to do things, I am happy I don’t have to poll for a lib of some select replacement of an event loop (pun intended).

The internet runs on these things, so programmers do manage! Often, none of the mechanisms are seen by the application programmers. There would be application messages that in lower layers have been packed and unpacked according to some schema like XML or JSON. The connections and data bases would typically also not be at the application level. But really, this is not what I intended this blog note to be concerned with. So I’ll leave it here.

Many of these mechanisms seem to be patches needed either because Rob Pike’s mentioned: «A Concurrent Window System» [01] had not been written yet, or had not been read. Or, as much I would hate to mention it, there is no better way(s)?

Conclusion

Since Linux processes are more expensive than threads, programmers tend to make few processes and try to avoid threads. So programmers often prefer the event-loop and callback pattern and thread-less programming. (After I wrote this note I have rediscovered a paper by Peter H. Welch about the dangers of callbacks in OO [10]) They may also want to use higher level abstractions, see previous chapter. Of course, Go is able to change this, and perhaps even threads in the new C/C++ versions (that I have somewhat mentioned at [03]). But for my part I would wait for a solid CSP library, like PyCSP for Python or JCSP for Java. (PyCSP uses Python callbacks to optionally specify ALT guard handlers!)

Backlog

I meant to cover some of this, but this page has become bloated enough. But now I won’t easily forget. And it may trigger some thoughts for you as well:

Node.js is supposed to contain callbacks within callbacks, nested to any number of levels. This probably makes that scheme even more difficult?
Node.js has both synchronous and asynchronous methods. With the asynchronous methods there is no guaranteed ordering. The correct way to do this is to chain the callbacks. In busy processes, the programmer is strongly encouraged to use the asynchronous versions of these calls. The synchronous versions will block the entire process until they complete – halting all connections (From this url)
At Wikipedia page Event-driven programming, chapter Criticism and best practice (here) the alternative to event-driven programming is state machines! The tools are freely floating in the toolbox!
Stack usage of a callback. Does this make things more complex?
The E programming language (Wikipedia) states that «E combines message-based computation with Java-like syntax. A concurrency model based on event loops and promises ensures that deadlock can never occur. In E, all values are objects and computation is performed by sending messages to objects. Each object belongs to a vat (analogous to a process). Each vat has a single thread of execution, a stack frame, and an event queue. E has two ways of sending messages: the immediate call and the eventual send. An immediate call is just like a typical function or method call in a non-concurrent language: the sender waits until the receiver finishes and returns a value. An eventual send sends the message while producing a placeholder for the result called a promise. The sender proceeds immediately with the promise. Later, when the receiver finishes and yields a result, the promise resolves to the result. Since only eventual sends are allowed when communicating with remote objects, deadlocks cannot happen. In distributed systems, the promise mechanism also minimizes delays caused by network latency.»
I think this is related to the new C/C++ futures model. And my XCHAN also is deadlock free.

Notepad

A quote from golang-nuts:

Blocking in Go in not bad. It is good, because it simplifies programs w/o sacrificing performance. That «blocking is bad» usually stems either from «old single-thread unix world» or from «thread-per-request» world. Their arguments do not hold for Go [Here]

References

A Concurrent Window System by Rob Pike, AT&T Bell Laboratories Murray Hill, New Jersey 07974 (1989), see http://swtch.com/~rsc/thread/cws.pdf (hosted by Russ Cox). It’s from the time when Pike worked with Newsqueak. It’s also at CiteSeerX. It was this golang-nuts thread that made me aware of it. Next reference is Pike’s 11-year newer implementation of a window system:
Rio: Design of a Concurrent Window System by Rob Pike, Computing Sciences Research Center Bell Labs, Lucent Technologies (2000) http://doc.cat-v.org/plan_9/3rd_edition/rio/rio_slides.pdf
C1X and C++0x concurrency in C and C++ Working Drafts, my blog note 022, see http://oyvteig.blogspot.no/2011/01/022-c1x-and-c0x-concurrency-in-c-and-c.html
select, see https://en.wikipedia.org/wiki/Select_(Unix)
epoll, see https://en.wikipedia.org/wiki/Epoll
Using epoll() For Asynchronous Network Programming by Mr. Oleksiy (aka Alexey) Kovyrin, see http://kovyrin.net/2006/04/13/epoll-asynchronous-network-programming/ Referred from [05] above.
libevent, see https://en.wikipedia.org/wiki/Libevent
Design and Validation of Computer Protocols. Gerard J. Holzmann. Prentice Hall, 1991. ISBN 0-13-539925-4. (This book has newer releases called «Spin Model Checker», by the same author). See pages starting at 167. Available on the net at http://spinroot.com/spin/Doc/Book91_PDF/x20v_1991.pdf
C# for Systems Programming, blog by Joe Duffy (Microsoft), see http://joeduffyblog.com/2013/12/27/csharp-for-systems-programming/.
Also see Continuation-passing style (CPS) at http://en.wikipedia.org/wiki/Continuation-passing_style
Life of occam-Pi by Peter H. WELCH, School of Computing, University of Kent, UK at Communicating Process Architectures 2013 (CPA-2013)
http://www.wotug.org/papers/CPA-2013/Welch13a/Welch13a.pdf – Paper
http://www.wotug.org/papers/CPA-2013/Welch13a/Welch13a-slides.pdf – Presentation. Also referenced at https://www.teigfam.net/oyvind/home/technology/079-wysiwyg-semantics/#Ref03

Øyvind Teig