IEC 61508 and (safe?) concurrency

Contents

1 Background
2 Concurrency is allowed
3 But how possible?
4 Complexity explained
5 Concurrency Primitives
6 Annex F
7 Master Thesis Suggestion
8 Aside
9 References

25 April 2013 (last edit 21June2013)
This page is in group Technology.

Background

Concurrency on Wikipedia or [3].

This note is a branch from 065. 035, 023 are perhaps also relevant. My XCHAN paper is a contribution to making concurrency easier, trying to merge the how SDL and CSP may view communication (Also discussed in 056). If you checked the references and/or are still with me, great!

I started this note after corresponding with a company (Eschertech in the UK) working with industrial use of formal methods, after I saw that they mentioned 61508 [1] in a table [2]. I asked them about how they looked at 61508 and concurrency. I will not quote the reply, but mix it with my own thoughts here. I am also communicating with Altreonic (in Belgium) about such matters, and learning by the day.

The Dom Luís Bridge, Porto was built 1881-86, when its 172 m span was the longest of its type in the world. I visited there in 2005 and was impressed by the refurberishment work. Will our safety critical software be updated in 120 years? I added a grid, at first glimpse probably not discovered. I try to make some software, but where is 61508 in that picture? How many frames contain concurrency? One, some or all?

Concurrency is allowed

Let’s make it perfectly clear, concurrency is allowed! In the 61508-3 and 61508-7 © IEC:2010 (called 2nd ed.) concurrency is mentioned in these chapters:

B.2.3.3 Time Petri nets
C.2.4.2 Calculus of Communicating Systems (CCS)
C.2.4.3 Communicating Sequential Processes (CSP)
C.2.4.5 LOTOS
C.2.4.6 OBJ
C.2.4.8 VDM, VDM++ – Vienna Development Method
C.5.12.1 Model checking

It’s even stronger than this: The more SIL (Safety Integrity Level) is needed (‘SIL 1‘, ‘SIL 2‘, ‘SIL 3‘ to best ‘SIL 4‘), the more formal methods (verified models) and good methodology that’s used, the better. These tools go from ‘—‘ (not applicable) through ‘R‘ (Recommended) to ‘HR‘ (Highly Recommended). I can see no ‘NR‘ (Not Recommended). You will find these tables in 61508-3. Each technique is in 61508-7 sorted with respect to Rigour: ‘R1‘ (no objective acceptance criteria; black box testing), ‘R2‘ (objective acceptance criteria with high level of confidence; testing with metrics) and ‘R3‘ (in the direction of formal proofs; testing is not mentioned).

Concurrency and formal methods are not the same, but the above list seems to correlate more than I would have thought. Formal methods are also good to make complicated concurrency correct, but concurrency based on formal methods may implicitly become correct. Stay tuned.

As far as I read all this, all of the techniques above that relate to concurrency perform with high recommendation and rigour. This could be understood as «concurrency is bad, if you absolutely have to use it, use formal methods to make it correct» or «concurrency in a good way represents components, please use it, but do consider using formal methods for complex or high security components and their interrelations». I assume the latter interpretation in this blog note. I even assume that this is what 61508 tells me.

But how possible?

Still I hear that the highest level SIL 4 systems are seldom (never?) implemented as concurrent sw systems. It seems like time-triggered architectures are often used instead – because they are easier to verify and test.

Aside: That’s what we used at Autronica when we in 1978 designed our very first microcontroller-based product, a Start-Stop system [4]. We did not yet have any Intel ‘Intellec’ «blue box» to compile for us, only a «Prompt 48» where everything was hand coded in assembly code, even call and jump addresses. The whole system fit into 4 pages of 256 bytes of external EPROM, running on an Intel 8035 microcontroller. But after that we slowly began using concurrency, controlled by small operating systems. I actually have one of these «real-time executives» from ten years after, written in PL/M-51, descibed in [5].

SIL 2 systems in my experience seem to be ok with concurrency. SIL 4 seldom? So, with SIL 3, do we have to be alerted? Of course. But the standard does not disallow it, but lists up several techniques to help designing such systems (above). In the spirit of the standard, the list is not all inclusive. There could be other methodologies. But could an approved SIL 3 system with concurrency have the full list in the exclusion list? Wouldn’t that undermine the standard?

Of course, the goal is to make the tools to handle concurrency such that even higher SIL may be made with them. The inherent complexity of a safety-critical system will meet the controllable complexity of the application, at higher SIL levels. How?

I must admit I am stretching myself now, learning by trying to explain.

Complexity explained

Isn’t this what this is all about, how to explain a complex system? We can build a system by connecting smaller and less complex systems. In the scope of this blog note the smallest part is a process. First comes how to build a process: language, subset, tool. Next comes how to connect these to build a safety critical system: paradigm, cohesion-coupling and tool? Naïve description, but it may be of some help.

I have made a new word «cohesion-coupling» here because I think it quite important. Cohesion is a measure of internal process complexity, coupling is a measure of how much the processes have to know about each other. Cohesion should be high, but not too high. Coupling should be low, but not too low. Cohesion and coupling is treated in Annex F in F.7. of book 61508-3.

And then we have safety (nothing bad happens) and liveness (something good eventually happens) properties. Also, we have to know that an implementation is more deterministic than a specification – one way to see it. This is more or less written out in 61508-7 in chapter C.5.12. Formal proof (verification) as:

functional correctness properties, i.e. the program should exhibit a particular functionality.
safety (i.e. some bad behaviour will never occur) and liveness (i.e. some good behaviour will occur eventually ) properties.
timing properties, i.e. some behaviour will occur at a particular time.

Concurrency Primitives

I have searched for semaphore, mutex, critical region and race in both 61508-3 and 61508-7 documents, but they are clean. I notice that even if the standard praises CSP and formal methods, and allows concurrency, it’s mentioned nowhere that concurrency must be safe and done correctly! If this is deliberate it’s very strange to me, and I’d like to see the reason. (However, hazard is there, see later.)

However pointer is treated in both 61508-3 and 61508-7, and the chapter 7.4.4.13 described in Part 7 as promote code understandability (p. 64), facilitate verification and testing (p. 65), proscribe unsafe language features (p. 66) and good programming practice (p. 66). 7.4.4.13 has no heading, but 7.4.4 is Requirements for support tools, including programming languages.

What I am trying to say is that sequential code is required to be safe, but concurrent code I have failed to see tagged with almost any degree of required safeness. What have I missed?

Annex F

Maybe I have missed Annex F: Techniques for achieving non-interference between software elements on a single computer. The annex is informative in 61508-3. It concerns spatial: the data used by a one element shall not be changed by a another element and temporal (meaning race, but the word is not used) facets of concurrent programs.

Annex F seems to be able to cover all the aspects I miss, informatively and rather non-exclusively with respect to the different tools semaphore, mutex, critical region, pipe, channel and selective choice.

In F.7. the chapter Types of module coupling, lists up several ways to couple independent programs: interface coupling, encapsulation, data coupling via parameter list, structure coupling, control coupling, global coupling and content coupling. These will map down to certain use of language, operating system calls or language construcs. The standard fails to do this.

So, concurrency seems to be treated different from sequential programming.

Master Thesis Suggestion

The company I work for has sent a suggestion for a Master’s thesis to NTNU (in May 2013). Read it here: http://www.itk.ntnu.no/ansatte/Onshus_Tor/Prosjekt.htm as point 15. It’s entitled «Analysis of message-systems in (concurrent) safety-critical systems» or «Analyse av meldingssystemer i sikkerhetskritiske systemer» in Norwegian. It relates to IEC 61508.

Aside

I have participated in the SysML group somewhat with a query called «SysML and verification» where «Verification & Validation«, part verification is not like in formal verification. They point to [Boehm 84]. However, they may still <<refine>> (as CSP etc.) some specification from another. Myself as a user is «Aclassifier». I came to this by discovering that Rhapsody also supported SysML (a subset of UML). Read the thread here – it is rather interesting!

Øyvind Teig