Asynchronous I/O and critical regions

Designer's notes #8 - Home - Prev - Next
Øyvind Teig, Trondheim, Norway (http://www.teigfam.net/oyvind/)

Some easy editing rules to help you off your own toes

Assume you're into embedded programming. Assume you have some kind of run time system or operating system and doing it all in ANSI C. This is no real-time language, like occam, which would disallow all constructs discussed here. However, with C, we'll certainly have to make the best of it.

Then, the problem: how to let a process (we could also call it task or thread in this note), ...how to let a process not manipulate data which, any microsecond later - or worse - just now, may also, later be manipulated by an interrupt function. Just when the process increments a value, just while it's been moved from RAM to internal register. The value is 10. Then it happens: the interrupt arrives and the processor hw sets the process aside, to let the interrupt have the machine. The interrupt decides to decrement that same value to 9 and stores it before it returns. The RAM will contain 9 until the process shortly overwrites it with 11. Confusion in embedded programs may or may not be discovered. But we sleep better (or stay in the company longer) the less such possibilities we leave behind us.

This problem was solved in the sixties by the Dutch computer scientist Dijkstra. However, in our system we can't use his solution, the semaphore. We dropped semaphores in our runtime system, which is message based. Besides, with semaphores, they may not be suitable for use in interrupt, because blocking there is a goof idea.

Communicating between processes and interrupts defines hot-spots process code, where access to common data, like buffers, must run uninterrupted. Critical regions in process code must be protected, f.ex. by disabling interrupts over the region. Or better, disable the actual interrupt, to remove race conditions.

... Forward refs Local_ForExternOnly_
... Forward refs Local_ForBoth_CriticalRegion_
... Access functions reachable externally Extern_
... Internal Local_ForExternOnly_ functions
... Internal Local_ForBoth_CriticalRegion_ functions
... Internal Local_ForInterruptOnly_ functions
... Interrupt functions
Above: screen "clip" from my folding editor Winf (see [11])
Now, say that you have access functions and interrupt functions in the same C file, an ok idea. If the process calls an access function, which calls a function which may also at any time be called from the interrupt, we're lost. Reentrant common functions here is not a good idea.

It is not as difficult as one may think, to keep track of which function calls which.

Introduce a naming scheme.

The CREW rule (Concurrent Read, Exclusive Write) governs access to common data. It spells: if you have at least one writer, then all access must be exclusive, and both parties must ensure that all writes and (yes!) reads are exclusive. Or, always do it, why should not at least one write? Don't make your own rule. Ohms law can't be improved. If one part reads and one writes and you think you know the roles at any time, to let go of protection, still you should protect. If you rely on the atomicity of op-code, beware, it's a shaky foundation. Another compiler, or another optimization level may break your assumptions. Or you may forget to check.

An interrupt, by its very nature, defines its own exclusiveness. Provided any higher level interrupts do not in any way touch the data.

Here is my suggestion:

At any stage, rename all instances:
  1. Identify all functions that are called externally. Prefix them with Extern_.
  2. Go through the Extern_ functions and find all calls in them. Prefix all these functions with Local_ForExternOnly_.
  3. Go through all interrupt functions and find all called functions. If you find any Local_ForExternOnly_ calls, it's bad news if you don't continue reading this note. However, rename them with prefix Local_ForBoth_CriticalRegion_. The other called functions you should prefix with Local_ForInterruptOnly_.
  4. Now, go back to the Extern_ functions and wrap all calls to Local_ForBoth_CriticalRegion_ functions with disable-this-interrupt and enable-this-interrupt.

It is the last point that saves your project. Saves it from strange errors that don't easily reproduce. Saves you from service trips.

However, observe that if the software you fixed is a lower level protocol driver and it continues to run after the glitch or race or hazard of this stupid integer, a higher OSI level will do a retransmission for you making you ignorant of the error. Ignorant that some green lights out there went on too early. However, lawyers are good at finding telephone numbers. Then, you try to explain.

Other publications at http://www.teigfam.net/oyvind/pub/pub.html