Channels and rendezvous vs. safety-critical systems

New 19Dec2011, moved here 11April2017, updated 03Mar2024

This note is in group Technology and My Go (golang) notes, and discusses the Ada Ravenscar Profile (subset) for safety-critical systems – as seen from the outside of the Ada community. (Update 28Apr2019: observe that the Ada Conformity Assessment Authority works on another profile, the Ada Jorvik Profile that is having less restrictions than the Ravenscar Profile [9]). This blog note also tries to discuss whether such a profile would have been necessary for the occam programming language  (an “occam Ravenscar Profile” (subset)?). It also tries to open the door to the xC language by XMOS. I certainly have got a lot of help from some named computer scientist friends out there. I would not have been able to do this alone. Thanks!

This blog note has been moved from http://oyvteig.blogspot.no/2011/12/035-channels-and-rendezvous-vs-safety.html (which is now empty).

2011

In this note I will try to figure out why the Ada programming language (in the Ada Ravenscar Profile [1]) had to make a subset for safety-critical systems that explicitly prohibited the beautiful select. And – would occam (or perhaps CSP – but this has no meaning, since CSP is an algebra) – have had to do the same, i.e. prohibit ALTs (and even CHANnels?).

To it me it seems like the Ada language has an inherent problem: the select mechanism and the rendezvous are based on queuing. I was surprised to read about this, as I have learnt that the Ada concurrency features are based on CSP. I will try to explain how this understanding could be plausible.

I have done some CSP channel scheduler coding, so my fingers are dirty [3]. There is no queue in what we did. There was no queue in the transputer which ran occam natively. There was no queue in SPoC (I have written some about the Southampton Portable occam Compiler here).

When a receiver sets up an input ALT, all the components’ channels may be blocked upon by several senders. They have all flagged their sender channel as “first”. There is no queue of these senders. If more that one is ready when the input ALT is started, either the first seen (PRI ALT) or a non-deterministic choice should be made. (Implementation may be lacking this feature, but verifiers like Promela or CSPm will explore all possible choices when they hunt for liveness properties. More about nondeterminism Nondeterminism (note 49)) Also, none of the senders may be ready yet.

Instead of queuing the senders, the senders’ channels are tagged as “first” on channel, and the sender is left (blocked or descheduled). It will not see light again before a receiver (in an ALT or just an input) also appears on the channel, as “second”. So, there is no need to queue anything. But, is the “implicit queue” a proper queue, so that there is no difference?

(Update 25Oct2016: When writing Towards a taxonomy(?) of CSP-based systems (note 135) I discovered that maybe the reason is that all channels are one-to-one, there is no “many” on either channel end. This is so for occam and XC. Stay tuned.)

I am not sure why Ada does queuing on the select, accept and entry mechanisms. Each entry has a queue of processes, fifo by time sequence. Is it because it also supports protected objects (classical monitor with guards instead of condition variables)? Is it because it’s bidirectional?

The Ravenscar profile prohibits select, demands protected objects, but queue size is 1, which to me seems to collapse the queue to no queue. From [4] one of the Ravenscar features is described as “Protected types and objects, defined at the library level, with a maximum of one entry per object and with a maximum of one task queued at any time on that entry. The entry barrier must be a single Boolean variable (or a Boolean literal).”

So, oversimplified (or wrong?), to make Ada usable for safety critical systems, it seems to have collapsed to occam?

Of course, high integrity Ada as defined by the Ravenscar profile, is concerned with schedulability analysis, static analysis, control flow, data flow, information flow, symbolic execution and formal code verification.

And at the bottom line, “the definition of Ravenscar requires preemptive scheduling of tasks.” But seen from an occam perspective, the subsequent sentence is logical: “However a similar profile could be defined that specified non-preemptive execution” – but from the Ravenscar profile, I don’t understand that differentiation. I would understand the latter better than the first, from schedulability perspective. Of course, in a cooperative scheduling perspective, a single process must run to next scheduling point before another process is allowed to (or ru..n with any number of dots, up to indefinite), whereas with preemptive, the next process could be there in a snap. But that’s most interesting if hardware driven software interrupts are not considered. Occam didn’t have interrupts, they were proper processes with a channel placed at an event pin.

The ALT or select mechanism is the non-deterministic feature of these languages. I found a paper for a conference, where they said that unwanted non-determinism is not acceptable (causing races etc.), but controlled non-determinism is something to allow, or even strive for [5]. (Again, see the Note 049 (above), this is much about the internal and external choice of CSPm)

So, the Ada Ravenscar profile, also makes controlled non-determinism unwanted?. Did they really have to? Promela expands all the executable paths of its case structure, what’s wrong with Ada not to allow it?

Should you stumble upon this note and have some comments, please comment or send me a mail!

And a little about Go (golang)

It seems like Go [6] also has scheduling queues attached to their beautiful CSP channels! At each end! There is a little about this in a query I posted to golang-nuts user group [7].

Comments

Comment #1

I have received this comment in an email from a computer scientist (in reply to my direct mail query):

“Ada tasks can communicate in two ways – via message-passing (rendezvous) or via protected objects. The latter decouples the tasks more and allows more accurate timing (scheduling) analysis to be carried put. The Ravenscar profile is for hard real-time systems where schedulability is crucial. So the profile, in effect, throws away the rendezvous and only allows communication and synchronisation via protected objects.

Now for occam there is no alternative to message passing and the rendezvous – so it is not easy to see how an occam-ravenscar profile could be defined.”

Later on:

“I had a quick read of your article, I still feel you are not being fair to Ada – it supports, as I said, two quite different communication methods – message passingand monitor.

For message passing it uses a many-to-one model – occam uses one-to-one, so queuing is more an issue in the Ada model (though queuing on ALTs takes the place of queuing on entries – Ada programs need less ALTs/selects)

Now Ravenscar has removed ‘select’ as it has removed message-based communication, but the protected object is still a source of (good) non-determinacy. Many tasks calling the same PO will access in an order not predetermined.”

Comment #2

(Peter has allowed me to include his mail here, name and all. He attached the article, I made a reference to it instead [1])

——

I would like thank you for your blog article which I found very thought provoking. You write,

“….it seems like the Ada language has an inherent problem: the select mechanism and the rendezvous are based on queuing..”

Which seems correct, as it says here:
http://en.wikibooks.org/wiki/Ada_Programming/Tasking

“…an entry is executed by the called task, not the calling task, which is suspended until the call completes. If the called task is not ready to service a call on an entry, the calling task waits in a (FIFO) queue associated with the entry…”

So far as performance is concerned, I don’t know if making a calling task wait in a FIFO is much different to making a calling task wait on an occam channel.

However, there could be a big difference from a timing analysis point of view, in that, in the case of an occam ALT, the number of input channels is apparent in the code for the ALT, whereas in the case of an Ada rendezvous, the number of calling tasks cannot be known by merely inspecting the code of the rendezvous.

Maybe it was that aspect that required the rendezvous to be excluded from the Ravenscar Profile.

However, in spite of the above, the following paper shows that it is quite easy to implement CSP channels in Ravenscar Ada using protected objects.

“Extending Ravenscar with CSP Channels”, Diyaa-Addein Atiya and Steve King. Department of Computer Science, University of York [1]

And once one has implemented CSP channels, one can implement ALTs by polling ready flags in the channels. (Comment 2017 (1): busy polling, see below)

That might give the convenience of a rendezvous in a form in which timing analysis could satisfactorily be done.

Best regards,
Peter Morris
Marden, Australia

Thank you for these comments, and for your time!

2017 (1)

Here is sequence from a mail thread in April 2017. I have been allowed by each person by name to paste it here. I have pasted it somewhat theme based. Newest at the bottom, first first:

occam Ravenscar?

Initial mail by me:

The Ravenscar profile is for safety critical systems written in Ada. It basically takes the rendezvous away. This opens for schedulability analysis. [1]

I have wondered about this in [2] (the “Computer scientist” quoted is close to the Ravenscar profile..)

Finally I now have had a mail with Roger Shepherd how the transputer tackled this, whether there also would be any reason to make an “occam Ravenscar” profile. From what he answered I think the answer is no. [3]

Any comments welcomed. Please state if your comment here might by name be quoted in the blog note. If I don’t see this specifically I will not do any copy paste. (But I will not guarantee that I might not mail you and ask in case I think you may have forgotten…)

For any one of you who are extremely knowledgable about the XMOS processors and XC I would be delighted to see this viewed also from that perspective..

[1] https://en.wikipedia.org/wiki/Ravenscar_profile
[2] This note
[3] https://www.teigfam.net/oyvind/home/technology/138-determined-about-buffers-and-bit-arrays/

Larry (Lawrence) Dickson:

In [2], Peter Morris says,

And once one has implemented CSP channels, one can implement ALTs by polling ready flags in the channels.

Does “polling” mean what it customarily means, trying again and again until an OK is found? If so, I would disagree that the described approach is a substitute for occam ALT, which does not poll.

Therefore, I disagree that an “occam Ravenscar” profile is superfluous. However, for schedulability it must be occam + priority. Given absolute high priority as found on the Transputer, and cycle-counted limits on high-priority code between deschedulings, a round-robin scheduler can guarantee response within a known count of cycles.

Me:

In the context of «timing analysis» I don’t think he would have thought of busy polling.

Peter Morris:

When I suggested polling I did mean busy polling, but only for systems in which polling could meet the timing requirements.

Provided that is the case, I don’t think it matters if the ALT is busy or non-busy.

Me:

Do you mean busy polling of a single ALT doing nothing else, or busy-polling that may be preempted? I guess the last, if not, it would starve?

Peter Morris:

Yes, the latter. To discourage wasteful polling I would put a yield() statement in each ALT loop.

Added later by me for this blog note only: A student told me that Go (golang) is implemented on top of an operating system (which I knew), and it uses spinlock “all over the place” in the implementation of CSP (new to me). I’ll get a reference.

Incidentally, some years ago I attempted to develop a CSP library for Ravenscar Ada using protected objects as described in this paper:

Extending Ravenscar with CSP channels ([8]

Protected objects made channels easy to implement and I implemented non-busy ALTs as well, but when I ran test code overnight I often found it had hung by next morning due to one of the channels appearing to have “deadlocked”. That is, at some point a receiving end appeared to have received a message and so should have sent an “acknowledgement” to allow the sender to proceed, but it appeared that the “acknowledgement” must have got lost, because the sender was waiting.

This led me to suspect a bug related to protected objects in the port of Libre GNAT GPL Ada to Windows XP. I intended to test for this by installing and compiling on Linux, but have not got round to that yet.

Me:

Perhaps you mean that a subset of occam would be needed to get meaningful cycle count? That occam as it is won’t give us this now. I believe that the XMOS architecture takes it further. Is XC a subset of occam on the transputer?

Larry (Lawrence) Dickson:

What I am getting at would take it either way – could be interpreted as a subset of occam (i.e. occam + restrictions) or something that takes occam further. I have actually used this successfully and thought it rather obvious.

What takes occam further is occam + Transputer priority, and the subset of occam in question is the high-priority subset of occam, with the restriction that each high-priority process is restricted to a known maximum cycle count before it deschedules. In addition there must be a round-robin scheduler. The consequence of this is that wait time before response of a high-priority process (when ready) is less than the sum of the maxima for all the high-priority processes (you can subtract the time of the responding process itself). In the case of an ALT, this would hold for a unique winner, because output is ready and the communication on the ALT side after disabling will therefore be immediate with no descheduling. IT DOES NOT MATTER IF THE OTHER SIDE OF THE COMMUNICATION IS LOW PRIORITY, though the low-priority process’s response to the communication might be delayed if other high-priority processes “spin” to the exclusion of low priority. So to make it robust against that, you have to impose a maximum cycle count before high priority becomes dependent on low priority, and enforce spacing of external stimuli. (I have a US-only patent that deals with stuff like that, which Øyvind says I should not be ashamed to mention!)

This is all standard interrupt stuff, done ad hoc on other processors. The provable result is due to the beautiful design of the Transputer, which must (and, at the bare metal, can) be emulated on other processors to get the result. This does not so much have to do with CSP, I am learning, as it operates on a kind of sub-CSP level.

Me:

Are you saying that in your opinion we would need an occam Ravenscar subset? Would it compile with occam 2 or would you need N priorities etc? I think it only had max two?

N-priority

Larry (Lawrence) Dickson:

The N-priority question is answered in my US-only patent

http://www.google.com/patents/US20140101663

– which by the way everyone is welcome to develop on, as I am only claiming hardware, not software, and only in the USA. Basically, more than 2 priority levels are OK but only a few (say up to 6 or 7) separated by wide ratios in natural times, so that higher looks like interrupts (stolen cycles) to lower, and lower looks like idle time to higher. Then you can design as if to a two-priority Transputer. Think keyboard interrupt as a high-priority process. If you lean on a key it only fires about 30 times a second, and the hi-pri keystroke calculates quickly and goes into a FIFO (another hi-pri process) then both are blocked – the kb process till next keystroke (30,000 microseconds later) and the FIFO till lo-pri process draws out keystroke. You can build with several interrupt services and design so response is fast and lo-pri is never starved. You have to avoid what might in CSP analogous terms be called “quasi-divergence” where hi-pri stuff starts talking to other hi-pri stuff and pushes the lo-pri response out for a long time.

So since occam 2 with Transputer priorities is certainly a subset of this, I see no problem with compiling it, but the restrictions as described above have to be imposed, and then you get a pretty easy proof of response times in any particular case. I imagine an Xmos-like design with lots of cores would be a good fit, because some cores could be high-priority Transputers with few processes in effect, handling all the quick-response stuff, and other cores could be low-priority-with-FIFO in effect, and by doing a chained response analysis you could get the result you need. (I actually did something like this for Ford Motor Company in the 1990s, using multi-T2 TRAMs of my own design to decimate incoming radar data which was very fast and impatient.)

We certainly need true occam on the Xmos or similar device (like the Adapteva Parallella?). If ALT is implemented by a select() we have to groan and bear it. Even if polling is required, that is just cycle theft balanced against response latency. But it has to be able to be absolutely analogous to Transputer behavior.

 

xC on XCore by XMOS

(Comment by me when I pasted this thread into here: the XC by XMOS chapter in the note about Nondeterminism also discusses some of this. I also have also built an xC scratchpad here: xC is C plus x)

Roger Shepherd:

As an aside, regarding the Xmos processor and XC. It is some years since I took a look in detail but I remember a couple of important things.

Firstly, in terms of the scheduler, the Xcore has a more primitive processor than the transputer. That is, the operations provided are more primitive than the operations in the transputer, but enable you to build the same (or at least a very similar) model as the transputer. XC does this but there are some subtle differences which I suspect have little or practical impact.

One I remember is the implementation of Fork and Join. The explanation of the difference is made difficult by the sparseness of the identity of “process” in the transputer. The transputer implementation of

SEQ
     A
     PAR
         B
         C
         D
     E

has a process which first executes A, spawns (“startp”) C and D, and continues as B. The join is implemented as an “endp” executed at the end, of B, C and D. The last one of which to execute “endp” (i.e. the last one to join) continues and executes E.

I can’t remember the implementation of the fork in XC – the point of uncertainty is whether there were 3 or 4 processes which implement the PAR. I think there are, in effect 4, as if the program with three processes (B, C, D) spawned at the join, and forking process waiting. B, C and D then execute and the join restarts the forking process (now as E) when the third process terminates. [The alternative I have in mind is that, as in the transputer implementation, two processes are spawned, but the join effectively causes the original process to continue, rather than the final process as in the tramsputer].

Secondly, in terms of XC I was very unhappy about the restrictions it placed on the use of channels (no repetition) in ALTs and in TIME ? guards. To be clear, the Occam rules could have been implemented by the xcore – it was a limitation of the core. The Occam rules are much nicer and useful in a number of cases (some of which you can address in XC but, I would argue in a less clear way than in Occam).

Whether these points are still true, I don’t know. As I said it is a long time since I’ve looked at XC.

However, do have a look at 141:[Replicated select case]!

Me:

I have done some xC to try to find out more. This compiles:

xC code showing use of array in timerafter

#include <platform.h>

#define NUM_CLIENTS 2
#define NUM_TIMERS 2

typedef interface my_interface_t {
    void rpc(const int n);
} my_interface_t;

[[combinable]] void PS1 (server my_interface_t my_interface[NUM_CLIENTS]) {
    int   time[NUM_TIMERS];
    timer tmr[NUM_TIMERS];
    int   guard = 0;

    for (int index_of_timer=0; index_of_timer < NUM_TIMERS; index_of_timer++) {
        tmr[index_of_timer] :> time[index_of_timer];
    }

    while (1) {
        select {
            case (guard==1) => tmr[int index_of_timer] when timerafter(time[index_of_timer]) :> void: {
                time[index_of_timer] += 1000;
                guard = 0;
            } break;
            case my_interface[int index_of_client].rpc(const int n): {
                guard = 1;
            }  break;
        }
    }
}

void PC1 (client my_interface_t my_interface) {
    my_interface.rpc(1);
}

void T1 (void) {
    // ...
}

int main(void) {
    interface my_interface_t my_interface[NUM_CLIENTS];
    par {
        on tile[0]:         PC1 (my_interface[0]);
        on tile[0]:         PC1 (my_interface[1]);
        on tile[0].core[0]: PS1 (my_interface);
        par {
            on tile[0]: T1();
        }
    }
    return 0;
}
Since I made PS1 [[combinable]] I just as well placed it on tile core. When I did that I also had to qualify the others with on tile. [[combinable]] means that many can run on the same hw thread (one per logical core) sharing the same select loop underneath, and I only have one, then it wasn’t really necessary. But I wanted to decorate as much as possible for you guys. It has replicated timer and interface calls with 2 clients. I didn’t use channels, but they are treated like interfaces here. I didn’t make any session in the interface, decorated with [[notification]] and [[clears_notification]]. If I had put a guard for the rpc call then I must write [[guarded]]for it in the interface. (References to the somewhat outdated literature in https://www.teigfam.net/oyvind/home/technology/141-xc-is-c-plus-x/)

I don’t know the fork and join logic here, except that I think it is like in occam.

References

  1. The Ada Ravenscar Profile: http://en.wikipedia.org/wiki/Ravenscar_profile. Also see Jorvik Profile in [9])
  2. Occam programming language: http://en.wikipedia.org/wiki/Occam_(programming_language)
  3. “New ALT for Application Timers and Synchronisation Point Scheduling”, (Øyvind Teig, Per Johan Vannebo), at http://www.teigfam.net/oyvind/pub/pub_details.html#NewALT
  4. “Guide for the use of the Ada Ravenscar Profile in high integrity systems”, Alan Burns, Brian Dobbing and Tullio Vardanega, at http://www.sigada.org/ada_letters/jun2004/ravenscar_article.pdf
  5. Will find that reference one day. I may have it at work.
  6. http://golang.org/
  7. “Using select with input and output statements in driver between fast producer and slow consumer” – https://groups.google.com/forum/?fromgroups#!topic/golang-nuts/qa2p0PRY0WE
  8. http://www-users.cs.york.ac.uk/~king/papers/ExtendingRavenscar-AdaEur05.pdf
  9. The Ada Jorvik Profile: See http://www.ada-auth.org/standards/2xrm/html/RM-D-13.html from “Ada Reference Manual (Ada 202x Draft 19)”