Connecting protocols

Started 8Feb2015 – Updated 17Feb2015
This page is in group Technology and discusses a scheme for communication between computing units. The angle taken is to see when it’s enough to rely on lower level layers and not having to resort to application-receiver-answer (and application-sender-retransmit).

Connecting

Connecting protocols

Figure 1 (PDF)

Disclaimer: if you know TCP/IP or the like this might not be for you. However, often we don’t – and need to design a protocol rather ad hoc, with only so much knowledge of the matter. If you feel at home with the latter scenario, then it’s you I am writing for. But then don’t expect it to be in-depth, but perhaps expect a different angle.

Here application-Y may send “Up” to application-X and X to Y the other way “Down”. It happens via simple routers Rx and Ry, through units Lx, Lv, Lw and Ly. Every unit listens for input data, and if it’s quiet for too long it would conclude that the connecting end is not alive and try to send that info in the other direction. Also, when it has sent something out it waits for an acknowledgment from the receiver, and if there is no acknowledge after some time it tries to tell about that, also  in the other direction.

Observe that the thick green and thick red data paths, in addition to proper data packets, would also carry connection state (connection-not-up / connection-up) and flow information (timedout, ready (acknowledge, buffer available) or not ready (buffer not available)). However, connection state is shown explicitly as dotted lines between the units. If the system is buffered then ready would mean that a buffer position is available, and not ready that at least the nearest buffer is full. So, the diagram is a mix of cabling and logical meant to better (?) carry over the meaning (=semantics).

I haven’t mentioned that the dotted line between the units also contains a negative acknowledge (nack) meaning: please retransmit if you haven’t given up yet. There was something wrong with the parity, checksum or CRC. This nack would, if the layer has given up, be propagated as connection-not-up into the application.

So I have then implicitly tried to cram this rather complicated message sequence diagram into the above figure.

More semantics

The figure would allow any part to send only after it has received an acknowledge or connection-up from its closest receiver. If it times out on the receiver’s acknowledge or receives a timeout flow message or a nack given up, it would be sent on as connection-not-up.

This scheme in Figure 1 is somewhat “data driven”. This means that we would discover a problem when one of the parts try to send something. However, if we send only once per minute it might be ok to know of a broken cable also when not communicating. So if we want immediate info when something fails then we need to introduce some kind of keep-alive or pinging between each unit. This would happen transparently “underneath” the messages. We would then see connection-down fast. We would continue until connection-up, and then continue to find out if it is still up.

This is layered protocol design. You can see another layer in Figure 1: the applications X and Y talk with routers Rx and Ry. There are different arrows between X and Rx than between the other units.

Problem solved?

Now, for most products the specification would not need to require an answer back from the other application. Message Y to X, then not needed answer back from X to Y. There simply is not any need for it. Buffer overflow can’t happen, and Y can assume that X has got the message since it has not received any information to the contrary. X does not need any timeout and retransmit should it not get positive info that it went ok. We avoid “layer soup.” Or am I totally wrong?

There is another case though: if Y wants to hold the next message until X has received the previous message then it would basically require an application level reply all the way from X down to Y again. This would be fully synchronized. But then, is this really needed, with the above described scheme, since it is also synchronized, block by block. No crossroads without lights. Alternatively, the low level protocol could be designed such that the control flow info back to the sender was not sent before the message was received bythe receiver. This is a more complex low level protocol, but I think it’s possible. End-to-end acknowledge at protocol level below application.

Observe that all faults travel in both directions from the point of error. If something happens then both sender and potential receiver would get the info. Then, when the line signals connection-up again the data consumer (master or client) could send a message to the producer (server): please retransmit every state you have that’s of interest to me.

Problems lurking

There are many timeouts here, and no common clock. How would we treat the case when connection-down or connection-up reaches Rx and Ry out of phase? They will!

Should timeouts be related to each other, like this timeout should longer or shorter than that timeout?

Also, I have assumed that the code implementing the communication protocols function as specified. Many times a retransmit happens not because of EMC noise, but because of a software error. How does this influence this scheme?

No formal model!

I have not written a formal model of the figure above; like I could have in f.ex. Promela (using the Spin tool) or CSPm (using the FDR3 tool). You know, it’s faster to write a blog note than a model! I wrote “as specified” above. Provided there is a specification! Of course, this is where I should have started.

This reminds me of a chapter in Holzmann’s first book on the Spin verifier (“Design and validation of computer protocols”). There was a train crash in the 2.4 km long Clayton tunnel in London in 1841, where an unexpected combination of events (between a needle telegraph, signal men and semaphore flags) caused 21 people to die and 176 to be injured. Read the whole store here: [1].

The formal model and the specification should have been two sides of the same coin

References

Wiki-refs: OSI model

  1. Design and validation of computer protocols by Gerard J. Holzmann, see see TRAIN CRASHES at page 7