Contents
- 1 Intro
- 2 Disclaimer
- 3 The note
- 3.1 «It is widely acknowledged that concurrent programming is difficult» (Lee)
- 3.2 «Contrary to the belief out there, concurrent programming is easy» (WoTUG/CPA)
- 3.3 «The problem is that we have chosen concurrent abstractions that do not even vaguely resemble the concurrency of the physical world.» (Lee)
- 3.4 Process model
- 3.5 «Nondeterminism of threads» (Freely from Lee)
- 3.6 WYSIWYG semantics (Welch)
- 3.7 «This pattern can be made to work robustly in Java» (Lee)
- 3.8 «Fixing threads by more aggressive pruning» (Lee)
- 3.9 Use of software patterns
- 3.10 WYSIWYG semantics and timing constraints
- 3.11 Real-time Java
- 3.12 «Alternatives to threads» (Lee)
- 3.13 «Coordination languages» (Lee)
- 3.14 After Lee’s 2006 article: «The Go programming language» (Google)
- 4 Comments
- 5 Comments 2025
- 6 Log 2025
New 29Dec2010, updated 07Oct2025 (Missing references?). Moved from Blogspot at https://oyvteig.blogspot.com/2010/12/021-problems-with-threads.html 12Sep2025
This note is in group Technology.
Intro
Being year’s passover 2010/11 it may be late to comment on a paper I found in a pile of IEEE Computer magazines. The article dates back to 2006. The article is available on the net as a report from the University of California at Berkeley:
The Problem with Threads
Edward A. Lee
Electrical Engineering and Computer Sciences
University of California at Berkeley
Technical Report No. UCB/EECS-2006-1
https://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.html (pdf)
January 10, 2006
IEEE Computer 39(5):33-42, May 2006 (New: here, article)
That IEEE Computer magazine was themed «Future Operating Systems (and) Programming Paradigms» and the article was a «cover features» article.
I advice you to read the well-written article. And it’s outspoken and fun reading!
The quotes I use here are from the report (IEEE Computer has edited a little).
Disclaimer
This note’s title uses plural «problems», not to make the titles equal – and it is a mix between a review and my own personal views and experience.
Over some years I have published along the line of that article, see Publications I also have some posts here about the same theme, see Older blog notes.
The note
«It is widely acknowledged that concurrent programming is difficult» (Lee)
The magazine article says «Concurrent programming is difficult,..». This is what caused my interest, in addition to the title. But this first sentence seems to contradict the title. Why doesn’t he say «..that programming with threads is difficult»?
I have attended the WoTUG- and now CPA- series of conferences for quite some years. What I have learned over the years is the opposite:
«Contrary to the belief out there, concurrent programming is easy» (WoTUG/CPA)
Here is the semantics of that statement: «Programming with threads is difficult, but programming with processes is easy». Later in his paper, I think this is also what Lee says. But he feels bleak about the future, while the conferences (and Lee’s paper) try to do something with it.
When I programmed in occam 2 for some ten years, concurrent programming was indeed easy. I had no interrupts, only processes, so asynchronous I/O was tamed before it could overwrite common data. I had fully encapsulated processes, with parallel usage rules checked by the compiler. Processes only communicated over zero-buffered synchronous unidirectional named channels. Buffering had to be coded explicitly. Functions were side effect free, as they were not allowed to use channels. And, the compiler checked aliasing errors (so linked lists were not possible..). Occam 2 has been developed since it died as a commercial language, and now holds a plethora of new primitives.
Yes, I did have to learn the art of the trade. Some has called it Process Oriented Programming.
«The problem is that we have chosen concurrent abstractions that do not even vaguely resemble the concurrency of the physical world.» (Lee)
Even if «not even vaguely» is Lee’s falsification term here I would disagree.
The physical world is one axis, concretisations of abstract descriptions another. If my concurrent abstraction may be given a name so that I understand what it basically does, and may be seen as a cognitive entity, then I’m ok. I think this is what Wittgenstein in 1921 proposed in his Tractacus.
«Channels», «rendezvous», «port», «protocol», «process» and «alternation» are taken from daily vocabulary. These are good terms that I grasp, but they fit more or less to their physical (if any) counterpart. These are «vague» terms, really – so Lee might have his words ok.
The idea that «object» as used in «object orientation» (OO) should resemble physical world objects is not needed for me. It has been used to describe OO («object car has wheels»), but after those examples then just forget: «window has buttons» or worse «window has drop-down menu» makes little sense for real windows.
:: By the way: «thread» is perhaps a good term! I have seen the result when a cat has played with a yarn of a long single thread. Maybe they should have used better weaving terms? Weaving could be seen as quite formal! And if the terms chosen had been better, maybe the solution would have been better, recursing into Lee’s statement?
Fig.1 shows a defective and a functional Jacquard loom punchcard at Grinakervev in Norway. When functional, a loop of cards forms a «scheduling algorithm» for real threads to get in position for one «pick» of the shuttle carrying the weft, for the nice textile pattern hidden in the cards to appear. I assume that the threads may even here get into problems, especially if a card has got some holes missing or in the wrong place, or less likely, if the loom’s synchronising next step mechanism fails. (Thanks to Grinakervev for the nice visit we had there!)
Process model
In my opinion the CSP «process model» is much easier to understand than «object model». The definition of an object these days emphasises encapsulation more than before. Some years ago, inheritance and polymorphism were more «modern» to include in the object model.
«Nondeterminism of threads» (Freely from Lee)
I’ll fill in with an example. The libcsp library by R. Beton (New: here) furnished Posix light weight thread programmers with a clean CSP API. However, the struct CSP_Channel_t contained three pthread_mutex_t and two pthread_cond_t, in addition to a CSP_Alt_t that contained one of each. Reading it out aloud:
one channel at most needed 4 mutexes and 3 condition variables (*)
This, just to attempt to tame the «nondeterminism of threads» as Lee calls it.
(*) I don’t know how many of these are there to handle asynchronous I/O (like interrupt) or preemption.
You could argue that’s it’s the channel construct that’s wrong, but then bear in mind that it is possible to build a «naked» CSP library without any mutexes, condition variables or critical regions. (New: depending on a lot of factors like processor architecture core-numbers and, etc..) Again, asynchronous I/O of course needs to be handled. However, preemption has not been needed in any CSP-type embedded system I have seen, but I have learnt that it is necessary for an experimental language called Toc which handles deadline scheduling, see this paper by Korsgaard and Hendseth (New: here).
Back to nondeterminism. I learned something from Lee’s description. I have always though of nondeterminsim as what (should) happen in the occam ALT struct (as opposed to PRI ALT), or what happens with Promela’s :: operator. These nondeterminisms are wanted, and would explicitly read «if more than one choice are ready, select any one of them». If that behaviour is not what I want, I instead use the determinsitic construct.
But here Lee points out that it’s unwanted altogether in the thread’ed world. (Later on he goes on to describe the «judicious nondeterminism» – which is the one I am used to.)
(New (2012): This blog note: Nondeterminism.)
WYSIWYG semantics (Welch)
«In fact, we have to know about all other threads that might execute (something that may not itself be well defined), and we would have to analyze all possible interleavings. We conclude that with threads, there is no useful theory of equivalence.» (Lee)
…
«Threads, on the other hand, are wildly nondeterministic. The job of the programmer is to prune away that nondeterminism. We have, of course, developed tools to assist in the pruning. Semaphores, monitors, and more modern overlays on threads (discussed in the following section) offer the programmer ever more effective pruning. But pruning a wild mass of brambles rarely yields a satisfactory hedge.» (Lee)
Peter H. Welch here has introduced the term WYSIWYG semantics. Lee in effect shows that threading does not have this type semantics.
I have discussed this in note [007 – Synchronous and asynchronous] (New: here). Here is a quote from Welch et.al:
«One crucial benefit of CSP is that its thread semantics are compositional (i.e. WYSIWYG), whereas monitor thread semantics are context-sensitive (i.e. non-WYSIWYG and that’s why they hurt!). Example: to write and understand one synchronized method in a (Java) class, we need to write and understand all the synchronized methods in that class at the same time — we can’t knock them off one-by-one! This does not scale!! We have a combinatorial explosion of complexity!!!»
(Letter to Edward A. Parrish, The Editor, IEEE Computer. Peter Welch (University of Kent, UK) et al. https://www.csp-consortium.org/images/IEEE_Computer.pdf (dead url)(1997) (New: Wayback Machine / Internet Archive (2010: here)).
Observe that I in note 007 discuss this also relative to asynchronous «send and forget» message based systems. (New: Also in note WYSIWYG semantics.)
«This pattern can be made to work robustly in Java» (Lee)
Lee shows how easy it is to make erroneous multithreaded code in Java. The observer pattern for one thread, and then the obvious thread safe version is shown. Only, the «thread safe» is only an attempt at it, as Lee shows. There is a potential deadlock lurking.
A couple of memories spring to my mind.
First, before the WoTUG-19 conference I thought that it might be a good idea to present the commstime program (a very simple but effective benchmark program for concurrency) in the new Java language. I did, but concluded that the «Java 1.0’s PipedInput/OutputStream classes seem impractical to use because of an implementation error at Sun». I had my fingers burnt on trial one.
But here was a language with some built-in mechanisms for concurrency, so the community started building CSP libraries in Java. Commercial occam was dying, and so was the transputer. There was a void. So JCSP (Communicating Sequential Processes for Java) was developed by University of Kent at Canterbury. JCSP is being updated to have become a viable concurrency platform for Java; I note that last public release is from August 2010 (this is written in Dec. 2010). (New: See JCSP at 135:[JCSP – Communicating Sequential Processes for Java])
There were all these experts, learning all the intricacies of synchronized and Java’s monitor. By 1998 all seemed to work fine. Until a student some two years and who knows how many uses later found a simple case where it deadlocked. The result was that parts of JCSP was formally verified and then fixed. Read about this exiting adventure here («Formal Analysis of Concurrent Java Systems» by Peter H. Welch and Jeremy M.R. Martin). (New: read here). The error was subtle, but the fix was simple.
Second, about the same time, Per Brinch Hansen wrote a letter that floated around on the internet, later to have become seminal. It is very interesting reading, because what he’s basically saying is that the Java designer’s at Sun hadn’t done their homework. Here’s his ingress:
«The author examines the synchronization features of Java and finds that they are insecure variants of his earliest ideas in parallel programming published in 1972–73. The claim that Java supports monitors is shown to be false. The author concludes that Java ignores the last twenty-five years of research in parallel programming languages.»
(Hansen, Per Brinch, «Java’s insecure parallelism» in ACM SIGPLAN Notices, V.34(4) pp.38-45, April 1999. http://brinch-hansen.net/papers/1999b.pdf)
Per Brinch Hansen then goes on to describe exactly what’s wrong with Java. Observe that Brinch Hansen in 1993 invented the SuperPascal language, whose concurrency features are based on a subset of occam 2!-)
«Fixing threads by more aggressive pruning» (Lee)
Lee describes how they in the Ptolemy project («an ongoing project aimed at modelling, simulating, and designing concurrent, real-time, embedded systems») used several established software engineering processes like a new «code maturity rating system», design reviews, code reviews, nightly builds, regression tests, and automated code coverage metrics.
«The strategy was to use Java threads with monitors.» (Lee)
I wonder if they had had access to Per Brinch Hansen’s paper, or the Welch and Martin paper. Surely they must have been familiar with the problems.
Even with all this going on for them, Lee describes that «No problems were observed until the code deadlocked on April 26, 2004, four years later.» Lee goes on to say that:
«Regrettably, I have to conclude that testing may never reveal all the problems in nontrivial multithreaded code» (Lee)
When a new version of some software is released, there has to be thorough testing. Case closed. But are we some times testing too much? Relying on testing where relying on other means would have been better?
Lee points out that avoiding deadlocks in a system based on semaphores is just to follow some well-known rules:
«Of course, there are tantalizingly simple rules for avoiding deadlock. For example, always acquire locks in the same order [32]. However, this rule is very difficult to apply in practice because no method signature in any widely used programming language indicates what locks the method acquires. You need to examine the source code of all methods that you call, and all methods that those methods call, in order to confidently invoke a method. Even if we fix this language problem by making locks part of the method signature, this rule makes it extremely difficult to implement symmetric accesses (where interactions can originate from either end). And no such fix gets around the problem that reasoning about mutual exclusion locks is extremely difficult. If programmers cannot understand their code, then the code will not be reliable» (Lee, italics by me)
In other words semaphores don’t have (the above mentioned) WYSIWYG semantics!
Use of software patterns
Lee also discusses patterns. I have myself used patterns like the «The ‘knock-come’ deadlock free pattern», described in Note 009 (New: here). But Lee has a point when he states that:
«Programmers will make errors, and there are no scalable techniques for automatically checking compliance of implementations to patterns.» (Lee)
This is very true. But the shortest programs that function well with a pattern may become part of a repository for it. Real coding undermines patterns! Lee goes on:
«More importantly, the patterns can be difficult to combine. Their properties are not typically composable, and hence nontrivial programs that require use of more than one pattern are unlikely to be understandable» (Lee)
This is less universally true, I guess. If a pattern is based on composable components with WYSIWYG semantics, then combining these patterns should be ok semantically. Some times it’s wanted to change Master and Slave processes in knock-come. Doing this has no other side effect than wanted. In a real-time system we need to know the timing constraints. If any process decides not to listen on a channel for some time, say 100 ms, then any one sending on that channel would have to block 100 ms. This blocking is no worse than an asynchronous message not been handled by some state for 100 ms. In fact it’s better: the blocking doesn’t fiddle with the gears, since it would not even see a message that it would else have to schedule for itself some time later.
WYSIWYG semantics and timing constraints
This takes me to a point that I am not so sure about. WYSIWYG semantics will not include timing constraints? In such a system the protocol between processes describe all I need to know? Right? No, I don’t think so. I think I need to describe (as a comment or whatever) what the maximum blocking time is. Or really, maximum time between any send and response (in any message based system).
Usually one just discards this problem by stating that an event-action takes zero time. For most systems this is ok. But if we may block 100 ms, this needs to be propagated, just like with the Toc language mentioned above by Korsgaard and Hendseth.
If a protocol is decorated with timing constraints and those are propagated, then You Can See What You Get. But only then.
Time messes up any good argument!
(New: Is this when Lee’s deterministic 254:[Lingua Franca] with timestamps inside a coordination language on top of, like C, to ensure determinism – comes to the rescue? Or the XMOS XTA tool 209:[Timing analysis] where #pragma is used to set timing constraints?)
Real-time Java
Rounding the century my company (Navia/Autronica) was a member of the J Consortium – Real-Time Java™ Working Group. It was set up by NIST and backed by HP, Microsoft, Newmonics and others. These days, the www.j-consortium.com honours its ancestry, but now it’s about «real-time medical application». (New: Wayback Machine / Internet Archive (1999: here)).
I was the one doing the work for Navia/Autronica, in the Real-Time Java Working Group (RTJWG).
I remember the endless discussions about how to handle «asynchronous I/O». I had been working occam for 10 years then, and that problem as a problem was non-existing. As mentioned earlier, interrupt handlers were proper occam processes, where the interrupt vector was mapped as a channel, possible to use in an ALT (to be able to receive outputs as well). Results were preferably sent on a channel that never was in a steady state to block (by use of an «overflow buffer» process (two processes in series)). This was so nice, and there was no new thinking about an interrupt process or any process. They were born equal. The process model stayed. (New: see 250:[Overflow buffer])
«Alternatives to threads» (Lee)
Lee discusses several alternatives, but for me the most interesting is the fact that the Ptolemy II project also has a «rendezvous domain CSP» API. Here is their class hierarchy. They state that it’s based on C.A.R. Hoare‘s 1978 version of CSP. A process in the 1978 version sends to a named process, whereas in the 1985 version a process sends to a named channel, as in occam. I believe Hoare changed it after the Ada experience, in which his 1978 solution hindered building of precompiled (?) libraries. (Please send me a mail if Ptolemy II really sends on a named channel, so that I can fix.)
Using a CSP class library is A Quite Good Thing. This criticism falls in Ptolemy as well as JCSP (mentioned earlier). If a channel (or whatever well thought out concurrency communication primitive..) is not a «first rate citizen» of the language CSP (or..) will «never» succeed if this is all that’s supplied in the future.
In a way it’s «pruning of nondeterministic» cognitive thinking, where each time something is read it has to be understood through a hierarchy. Every time! Not nice to say about OO which has its benefits and uses! A compiler would not know what a process is, it can’t do parallel usage checks. At best it would be an add-on tool.
«Coordination languages» (Lee)
Finally Lee talks about coordination language, which would sit on top of a more a standard language (with new keywords?) The Ptolemy II rendezvous CSP «framework is properly viewed as a coordination language». There nondeterminism is explicit when wanted, else it’s deterministic. (New: This is the above mentioned Lingua Franca).
After Lee’s 2006 article: «The Go programming language» (Google)
But A Good Thing, invented after Lee’s paper, is Google’s Go programming language. Its concurrency is based on the CSP model, and uses channels as interprocess communication. Even if it «does not provide any built-in notion of safe or verifiable concurrency», it is a step in the right direction. I let the Go people have the final words in this note:
«Why build concurrency on the ideas of CSP?
Concurrency and multi-threaded programming have a reputation for difficulty. We believe the problem is due partly to complex designs such as pthreads and partly to overemphasis on low-level details such as mutexes, condition variables, and even memory barriers. Higher-level interfaces enable much simpler code, even if there are still mutexes and such under the covers.
One of the most successful models for providing high-level linguistic support for concurrency comes from Hoare’s Communicating Sequential Processes, or CSP. Occam and Erlang are two well known languages that stem from CSP. Go’s concurrency primitives derive from a different part of the family tree whose main contribution is the powerful notion of channels as first class objects.»
(Go programming language faq)
End
(New: Search for Golang here)
Comments
Adam Sampson 07 January, 2011 15:26
There’s a fairly well-known paper by Hans Boehm about the problems of implementing concurrency features as a library. («Threads Cannot be Implemented as a Library» (2004) and slides are available at https://dl.acm.org/doi/10.1145/1065010.1065042 (https://www.hboehm.info/misc_slides/pldi05_threads.pdf).
He’s writing from a shared-memory perspective, but his points are equally valid for message-passing approaches, if we want to be able to execute our programs in parallel.
I think it’s likely that over the next few years we’ll see languages being developed that have sufficiently powerful type systems to allow the safety properties of a concurrent program to be statically verified. We’re not there yet, though. I’m thinking here of languages like Agda that blur the distinction between conventional and type-level programming, which mean that the kinds of static checks we currently build into our compilers can instead be expressed in the language itself. As we’ve seen with Haskell, the challenge isn’t so much expressing the checks as producing useful error messages when they fail…
Go is great in terms of marketing, but I think it’s actually a step back from what previous languages in the same family (Newsqueak, Alef, Limbo) offered, in terms of concurrency features, and it’s a shame that the designers didn’t include any of the more recent innovations in other CSP/pi-calculus-derived languages. There’s some discussion of this in my thesis: https://offog.org/publications/ats-thesis.pdf
aclassifier 09 January, 2011 14:59 (me)
..however, those papers seem to point to C++0x as _the_ _alternative_ (http://en.wikipedia.org/wiki/C++0x ?→ https://en.wikipedia.org/wiki/C11_(C_standard_revision)). The working draft of Nov10 is almost 1400 pages, and 60 pages cover atomic operations and the thread support _library_. Food for my next post?
Comments 2025
Changes added here for the Sep2025 version, as moved from Blogspot. I later removed the contents there, not to keep blog note aliasing!
- Some links updated or commented, or left as dead url
- Fig.1 is new
- URLs to the IEEE 2006 stuff
- Several (New: ….) additions, or just text in blue
- Some typos fixed
Log 2025
Newest on the top:
Missing references?
I wonder why the 2003 paper [1] from the same university (University of California at Berkeley), with related curricula (Electrical Engineering and Computer Sciences vs. Computer Science Division), is not referenced in the Lee 2006 paper? Some quotes from the 2003 paper:
2 Threads vs. Events
The debate between threads and events is a very old one. Lauer and Needham attempted to end the discussion in 1978 by showing that message-passing systems and process-based systems are duals, both in terms of program structure and performance characteristics [2]. …
2.2 “Problems” with Threads
Performance. Criticism: Many attempts to use threads for high concurrency have not performed well. We don’t dispute this criticism; rather, we believe it is an artifact of poor thread implementations, at least with respect to high concurrency. None of the currently available thread packages were designed for both high concurrency and blocking operations, and thus it is not surprising that they perform poorly. …
The 2003 paper Lee might have thought of as not relevant, since it’s not mentioning deterministic timing. So «thread vs. thread» with different meanings might only have been disturbing the 2006 paper? But to me, reading the 2003 paper after the 2006 paper certainly filled my picture, it didn’t tear it apart.
[2] also has a reference to [3] (1995) (as [11] there), also not referenced in Lee’s 2006 paper. [3] complains about «timing dependencies» between threads but does not seem to be concerned about determinism, as Lee is. But [3] lists the differences between threads and events quite well.
However, with my experience from the occam on the transputer and XC on the XCore this distinction between «threads» and «events» I erroneously often map to «processes / tasks» and what happens in the ALT / select (selective choice = any combination of inputs from channels, timers and ports / pins) – also often called «events». I think that these «worlds» are somewhat orthogonal. occam on the transputer was not designed for true timing determinism, even if systems with such guarantees was «easy» to design. The XMOS architecture certainly is designed for this (more below).
(Thanks, Antonio G. for [1] and [2] first the second, then the first.)
[1] Why Events Are A Bad Idea (for high-concurrency servers), by Rob von Behren, Jeremy Condit and Eric Brewer at Computer Science Division, University of California at Berkeley (at HotOS IX: The 9th Workshop on Hot Topics in Operating Systems, May 18–21 2003. USENIX Association), see https://www.usenix.org/legacy/events/hotos03/tech/full_papers/vonbehren/vonbehren.pdf
[2] On the Duality of Operating Systems Structures, by Lauer, H.C., Needham, R.M., in Proc. Second International Symposium on Operating Systems, IRIA, Oct. 1978, reprinted in Operating Systems Review, 13,2 April 1979, pp. 3-19. See https://dl.acm.org/doi/pdf/10.1145/850657.850658
[3] Why Threads Are A Bad Idea (for most purposes) by John Ousterhout at Sun Microsystems Laboratories (1995). Presentation given at the 1996 Usenix Annual Technical Conference, January 1996. Read at https://users.cs.utah.edu/~regehr/research/ouster.pdf
Some timing-aware processors
Rather timing-aware processor architectures that I am aware of are these. I have added this chapter here simply because E. A. Lee has been much preoccupied with it – and so have I, on a non-academic scale.
XMOS xCORE
The first XMOS (company started 2005) multi core chip based on the xCORE architecture, the X1 (XS1) was shipped in 2008 (here). It has two separate «tiles» each of which has eight «logical cores» in each chip. I have used XS1 in my Aquarium project. I have also used the X2 for several projects. Also see the mentioned XTA timing tool (above).
In the 261 page book on the XS1 architecture (141[6], 2009) David May writes on page 10 that «The XCore scheduler therefore allows threads to be treated as virtual processors with performance predicted by tools. There is no possibility that the performance can be reduced below these predicted levels when virtual processors are combined.»
(Now, did XMOS change the 2009 terms «threads» and «virtual processors» for 2015 versions like «logical cores» after they discovered that Lee already in 2006 had made «threads» look rather stupid? The only reference to «logical» in May’s book is on p180: «Sets the logical network over which a channel should communicate.» The programming guide from 2015 is almost fully rid of «thread» (3 times) but «logical cores» is used all over the 108 page document (141[1], 2015). Also, in May’s 2009 book «thread» is also used to mean what in 2015 was called «task«. Maybe the 2015 book only clarified the terms better, with no influence from Lee?)
* Update: See this discussed at «Thread» in 2009 = «logical core» in 2015, shared with «task». The short answer is «no, nothing to do with Lee’s paper». Also see 221:[Comments point 12].
The FlexPRET and InterPRET papers also mention X1 / XCore:
PRET versions
I don’t know whether any of the below processors may be purchased. I think the bullet list below shows the development of this architecture, since they all are «*PRET». There is some listing up of names below, to allow any searching for those names on the blog to point to one of these three papers:
PRET: Stephen A. Edwards and Edward A. Lee, 2007. «The Case for the Precision Timed (PRET) Machine», here.
→ They summarise the problems with the current processor architectures, but also try to specify this new «PRET» architecture and how such an architecture may build upon published work. The main goal seems to be «Both compile- and run-time checks will ensure the program meets timing constraints, similar to array bounds checking.» The main question is perhaps «How do we adapt operating systems to provide timing guarantees?».
It’s a well worth, two page read.
FlexPRET: Michael Zimmer, David Broman, Chris Shaver, Edward A. Lee, 2014. «FlexPRET: A Processor Platform for Mixed-Criticality Systems», here.
→ This paper also mentions other architectures, like SPEAR, JOP, ARPRET and MERASA. Quote: «PTARM by Liu et al. and XMOS X1 are architecturally similar to FlexPRET. Both are fine-grained multithreaded 5-stage RISC processors that require at least four threads (exactly four threads for PTARM) to be round-robin interleaved in the pipeline; cycles are wasted if there are fewer than four active threads, and a single thread can only be executed at most once every four cycles. PTARM is better suited for hard real-time tasks because all threads have a constant scheduling frequency. Conversely, XMOS is better suited for soft real-time tasks because inactive tasks can be left out of round-robin scheduling, but scheduling frequency depends on the maximum number of simultaneously active threads.»
Also, «We consider FlexPRET a key contribution to a precision timed infrastructure, where languages, compilers, and architectures allow the specification and preservation of timing semantics.»
They implemented and analysed different versions of the FlexPRET architecture on Xilinx FPGAs in Chisel (to Verilog) using a subset of the RISC-V ISA and tested it with two mixed-criticality task sets.
They also discuss some referred software-based scheduling methodologies.
It is an interesting 10 page read.
InterPRET: Erling R. Jellum, Shaokai Lin, Peter Donovan, Chadlia Jerad, Edward Wang, Marten Lohstroh, Edward A. Lee, Martin Schoeberl, 2023: «InterPRET: a Time-predictable Multicore Processor», here.
→ «In this paper, we introduce InterPRET (1): an architecture consisting of FlexPRET cores interconnected via the S4NOC network-on-chip.» As related work they mention «The XCore architecture from XMOS is a commercially available multicore PRET machine. Like InterPRET, the XCore is based on fine-grained multithreading to remove variability in the execution time of instructions. It has a NoC (2), called XConnect, which uses credit-based control flow. The S4NOC (3), being based on a static schedule, does not need this type of flow control.» They also mention the T-CREST architecture, but the InterPRET uses local memory and can support multiple hard real-time tasks per core. They also mention MERASA, Vicuna and even MultiPRET. They mention the XMOS even more: «PRET machines offer such fine-grained control over timing that they are capable of implementing real-time interfaces in software. XMOS, the company behind the only commercially available PRET machine that we know of, refers to this capability as a “software-defined” System-on-Chip (SoC).» The article emphasises composable tasks, meaning that tasks running on different cores should not influence the timing of each other. (My comment: that’s what the xCore gives me, until they communicate of course, ie. until they have a data dependency.)
Credit exchanges buffer space between a sender and a receiver to control flow and is also a way to avoid overflow and deadlock. See Network-On-Chip Basics: Flow Control. But the point with InterPRET is that since they use S4NOC there is no buffereing and therefore not credit-based
(1) InterPRET = Interconnected FlexPRET CPUs, based on message passing. Synthesised on an FPGA.
(2) NoC = Network on chip = message passing.
(3) S4NOC = Statically scheduled NoC that uses precomputed schedule time- division multiplexing (TDM) at the links, the routers, and the network interfaces. This provides virtual channels between the cores and no credits needed for flow control, since buffering is not needed.
It is an interesting 6 page read.

