My SafeRTOS notes

New 29Jan2016, updated 10Feb2020 (XMOS FreeRTOS port)
This page is in group Technology

Disclaimer 1: No ads, no money, no gifts drive this note (nor any other blog notes of mine).
Disclaimer 2: This is a discussion of matters with SafeRTOS. I don’t use it, plan to use it or am associated with its use (that I’m aware of). It may be a good fit for you (or me), but there are still some interesting points worth discussing. Update 6Feb2020: Like the XMOS FreeRTOS port

An RTOS is a Real-Time Operating System (Wiki: here).

There is a nice series of development boards by ST called STM32 MCU Nucleo [1], based on an ARM. I saw that it also has “a free operating system” mentioned in the docs. This is FreeRTOS™ [2] by Real Time Engineers Ltd [3]. I eagerly looked it up and was fast pointed to WITTENSTEIN HighIntegritySystems (by The WITTENSTEIN Group) [4]. They do SafeRTOS (or SAFERTOS®), so I downloaded the manual [5]. This is a IEC 61508 [6] approved RTOS. (We shall see what that might mean). Since I work with safety critical systems (and have some blog notes here covering some aspects of 61508, like in [7]) I started to read. It spurred some questions.

[1] Nucleo. [2] FreeRTOS. [3] freertos.org. [4] HighIntegritySystems. [5] SafeRTOS manual. [6] Wikipedia: IEC 61508. [7] My note 065.

FreeRTOS at Wikipedia

Derivations

  • Amazon FreeRTOS – IoT support
  • SAFERTOS – safety-critical implementation. IEC 61508 SIL 3 development life cycle by WITTENSTEIN High Integrity Systems in C to meet requirements for certification to IEC61508
  • OPENRTOS– Amazon FreeRTOS, sold by WITTENSTEIN High Integrity Systems

Q0. How FreeRTOS to SafeRTOS?

From the Wikipedia FreeRTOS article, chapter SafeRTOS (here) I read that

SafeRTOS was constructed as a complementary offering to FreeRTOS, with common functionality but with a uniquely designed safety-critical implementation. When the FreeRTOS functional model was subjected to a full HAZOP, weakness with respect to user misuse and hardware failure within the functional model and API were identified and resolved. The resulting requirements set was put through a full IEC 61508 SIL 3 development life cycle, the highest possible for a software-only component.

I think it’s the HAZOP analysis is what I’m after here (with my questions below). The fact that the FreeRTOS has been “put through a full IEC 61508 SIL 3 development life cycle” to get SafeRTOS is fine since many problematic areas then probably have been removed, but it doesn’t mean that a product that uses the resulting SafeRTOS is plug and play SIL-anything. However, it does make it easier for a user to argue with an approval agency’s particular assessor when certification is required.

I don’t expect to actually get these answers, because it’s those answers that you in effect pay for when you buy anything 61508 compliant. They have then done all that job for you. But then, this is not always so. Have a look at Texas Instrument’s Safety Manual on the Hercules™ ARM®-Based Safety Critical Microcontrollers [1]. It’s free, rather interesting even if the hardware it describes isn’t. However, here’s what Texas Instruments write in the intro :

You, as a system and equipment manufacturer or designer, are responsible to ensure that your systems (and any TI hardware or software components incorporated in your systems) meet all applicable safety, regulatory, and system-level performance requirements. All application and safety related information in this document (including application descriptions, suggested safety measures, suggested TI products, and other materials) is provided for reference only. You understand and agree that your use of TI components in safety critical applications is entirely at your risk, and that you (as buyer) agree to defend, indemnify, and hold harmless TI from any and all damages, claims, suits, or expense resulting from such use.

It’s still not back to square one, is it? But back to SafeRTOS. Here are the questions I ask myself.

Aside (Jan2020): FreeRTOS is used in parts of the NTNU Cyborg robot, see https://www.ntnu.edu/cyborg. It is based on a Pioneer LX from MobileRobots, but it has been much modified since 2016. This is also connected to a Micro-Electrode Array (MEA) that has nerve cells growing on it. This is located at the St. Olavs hospital in Trondheim, and data is passed along over a server to and from the Cyborg that runs up the hill at the NTNU electro building.

Q1. Why preemptive scheduling only?

[5] by WITTENSTEIN aerospace & simulation ltd. has a chapter called “2.1.2 Differences Between SAFERTOS and OPENRTOS”. There they write that

While SAFERTOS and OPENRTOS share many attributes the development process has necessitated some notable differences. These are summarized below:

There are quite many points I like there, some that puzzle me, and this that I really wonder about:

OPENRTOS permits the scheduling policy to be optionally set to „cooperative‟. SAFERTOS only permits the policy to be „preemptive‟;

Why is this so? In my copy of the IEC standard (book IEC 61508-3 (Edition 2.0 2010-04)) there’s nothing that favours preemptive scheduling. In Annex F (informative) “Techniques for achieving non-interference between software elements on a single computer” scheduling is mentioned in the “Achieving temporal independence” chapter. However, it doesn’t rule out preemptive scheduling; there are criteria along another axis, like deterministic scheduling, strict priority based scheduling, time fences and certain guarantees about executability (as I read it).

But then IEC 61508 doesn’t rule out cooperative scheduling either.

Q1: What’s the rationale for the WITTENSTEIN aerospace & simulation ltd arguments here, for only allowing preemptive scheduling, or to leave cooperative scheduling?

Easier to get a deterministic system?

That’s difficult in any case (a little about this in note 062).

Something about cache partitioning and schedulability?

The systems I am used to basically use cooperative scheduling on the process level and preemption by the interrupt system. Plus perhaps process priorities. Then it’s up to the rest of the system architecture to implement such that requirements are fulfilled. The Go language (based on the same CSP principles) lives on top of some preemptive operating system. However, internally the Go scheduler would make sure that Go user code that calls blocking functions in the underlying operating system is set aside to another goroutine so that the blocking call won’t really block the Go application.

There is no right answer here. I just wondered why cooperative scheduling had been left behind. I know, when the rest is not CSP (or even SDL) based, may it then be easier to leave it?

Is it priority (only)?

HighIntegritySystems write in [2] that:

In a software project containing a priority based, pre-emptive RTOS, Tasks are executed in order of priority, i.e. the highest priority Task ready to run is always allocated processor time.

Always? What about the interrupts? And if always is too long for the others? They address these questions also. But this is difficult in any case. Even with channels that might have one priority of all Tasks but of course, the interrupt system is prioritised. Or like with XCORE and XC where logical cores and timing requirements and the compiler (and no interrupts or priority) excel on this (see note My XMOS notes).

Q2. This message wasn’t for me!

The xQueueCreate() call gives you a buffered message queue where, on the receiver side there’s QueueReceive(),  xQueueReceiveFromISR() or xQueueMessagesWaiting() to use. This is a traditional SDL type of message queue (there’s a questions & answers session in note 056). With a proper and rather complex protocol (with subscribing etc.) the pattern of my below example is also possible with a SafeRTOS queue (Turing has told that any paradigm can implement anything). However, in some respect there’s something missing, also with this paradigm (not every paradigm is meant to solve every problem out of the box, but they are meant to make common patterns easier to handle, i.e. creating fewer errors when implementing them, i.e. being safer).

The SafeRTOS message queues can’t directly be used in a server, for messages from N clients simultaneously (a message queue with the size of the number of clients with one message from each doesn’t solve the problem below), take an incoming message from one of them and then start a session with that one only. A session would probably include a reply message and possibly more new messages while these two are at it. The other (N-1 clients) haven’t at the same time assumed that the server also treats them, i.e. there’s no “send and forget”, they wait (discussed in note 092, some would say they block). Since the server itself uses other servers (i.e. is itself a client) it’s nice not to treat the other (N-1) clients that it’s a server for, not having to store their messages (that arrive as “not ok in this state”) away with a Save (SDL term) and then afterwards having to become its own scheduler to pick them up again.

All this may be done out of the box with the CSP type channels, as with the Go channels and the golang selective choice (select) (or previously occam’s CHAN and ALT).

Polling to see if and what message is at the head of the queue with xQueueMessagesWaiting() won’t solve this problem.

Q2. How is this lack of out of the box functionality seen in view of IEC 61508?

Q3. Why a need to suspend the scheduler?

The xQueueSend() may return errSCHEDULER_IS_SUSPENDED with the accompanying explanation: “The scheduler was in the Suspended state when xQueueSend() was called. As xQueueSend() can potentially cause the calling task to enter the Blocked state it cannot be called when the scheduler is suspended.”

It’s the function vTaskSuspendScheduler() with text “Transitions the scheduler from the Active state to the Suspended state” and “A context switch will not occur while the scheduler is in the Suspended state but instead be held pending until the scheduler re-enters the Active state” that will suspend the scheduler. The opposite: xTaskResumeScheduler() will have it continue.

Q3. What is the rationale for having this functionality?

There probably is a good reason or two. However, if I were an assessor I would have listened very carefully on the answer and then probably queried more. What is the really, really underlying reason for it? To make it possible to meet some hard deadline? That’s difficult in any case (a little about this in note 107).

There is an example in “4.3.2.5 Example”. It’s rather clear that one of the needs is to make a critical section, to avoid scheduling. Had they kept the cooperative scheduling and not relied only on preemptive scheduling, would this have been needed? It seems like you can’t suspend the scheduler from an interrupt, so it must be preemption-interrupts that’s the problem. Is it smarter to just disable the scheduler than disabling one, some or all interrupts? A general solution, in a way? But then, isn’t errSCHEDULER_IS_SUSPENDED into tasks a rather problematic side effect?

Remember that processors have had real preemption in hardware since the sixties. It’s called interrupt. Any operating system of any type must relate to those. Critical sections are the most important matters when dealing with them. I have seen that err so many times.

What does it do with the tasks that get the errSCHEDULER_IS_SUSPENDED? Will they loop around doing busy poll until not suspended? Because it doesn’t look like there is a message system telling such a task that now the scheduler is running again. Hmm

SAFERTOS CORE

“SAFERTOS CORE is the new RTOS for embedded systems that need to consider safety, but don’t require safety certification” (on HighIntegritySystems page, above). It’s got full SAFERTOS functionality and the same API.

Any discussion group(s)?

The short answer from 1. below is no. No forum for open discussions means less or no open discussions. Even if they are Free and Open?

  1. TwitterIs there a discussion group for SAFERTOS? No, but there is a sales mail address.
  2. There is a group called FreeRTOS on Facebook (here). When I searched for SafeRTOS I was redirected to it because it had been merged. However, it’s a mirror of the Wikipedia FreeRTOS  page (here) – including the SafeRTOS and OpenRTOS chapters. WITTENSTEIN AG on Facebook, something in German (here)
  3. Groups: Nothing on SafeRTOS (here), and OpenRTOS and FreeRTOS groups seem to be something completely different

Notepad

31Jan2017: I discovered that Intel has a “IEC 61508 SAFETY RUNTIME SYSTEM” called SAFEOS. They state that “SAFEOS is certified manufacturer-independently and can be used for SIL2 or SIL3 applications on different CPU architectures – with or without operating system.” See https://solutionsdirectory.intel.com/solutions-directory/iec-61508-safety-runtime-system. I mention it here partly because it’s interesting, but also because the names SAFEOS and SafeRTOS are easy to get mixed up.

25March2017: I read in a mail conversation from 2012 with a computer scientist that (I quote): “FreeRTOS is free and that says it all. It even disables interrupts over every service call. As far as I can see, the RTOS was not even exercised or at least it is not clearly described.” I don’t know if this still is so with FreeRTOS, or how SAFEOS is implemented.

XMOS FreeRTOS port

19Jan2020: See XMOS FreeRTOS port. I think is based on the Amazon derivation.

References

  1. Safety Manual for TMS570LS31x and TMS570LS21x Hercules™ ARM®-Based Safety Critical Microcontrollers. User’s Guide. Literature Number: SPNU511C. November 2014–Revised March 2015. Download from http://www.ti.com/lit/ug/spnu511c/spnu511c.pdf
  2. Saving Power using your RTOS by HighIntegritySystems and Wittenstein. See https://www.highintegritysystems.com/downloads/RTOS_Tutorials/Saving_Power_With_An_RTOS.pdf

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.