Too little else to do

Designer's notes #7 - Home - Prev - Next
Øyvind Teig,  Trondheim,  Norway (http://www.teigfam.net/oyvind/)

Watchdog restart because 'the others' were idle when 'I' was too busy

Usually, an embedded system's hardware watchdog restarts the system because one of the software processes (threads, tasks) has been live locked and spins around concentrating on itself in a malignant and  introvert manner. It does not even "kick" the watchdog on its way through the infinite loop of instructions.

By the way, I don't understand the rationale for the term "kicking the dog" to reset its downgoing counter, so that it's again long before it would reach zero. Should we anthropomorphise, then "feeding the dog" should serve better. In particular, for such a nice dog as is used in the figure, namely the Windows operating system's "search dog".

Here, the problem was process P0. It was burning EEPROM, one "sector" at a time, and then descheduled itself with no timeout by returning back to the non-preemptive run-time system's  FSM_Scheduler - which immediately rescheduled P0.

See "From message queue to ready queue" about RQue and the synchronous channel communication layer

The synchronous channel software layer does a kick_dog on each rendezvous' memcpy. So we had no kick_dog on communication and no on empty RQue. Because both of these were failing, in two seconds the watchdog timed out and restarted the processor.

Our solution was to insert a short (1 ms) delay in P0, thus making RQue empty provided the system had nothing else to do. We could have "yielded" instead, so that FSM_Scheduler would have returned even if RQue was not empty. In any case, the FSM_Drivers would now also run - to pick up lower level timer and i/o stuff.

The solution involved no tuning with priority (we have no such, neither on processes nor messages). No tuning with extra inserts of kick_dog calls, as we try to avoid spreading them. We were, however, able to find a solution without side effects - to keep the self-monitoring embedded system up and running. 

P0 continued burning EEPROM like this way several times, without any problems. Not until we happened to hit the controlling "burn" command to start it again at a time when the processor had little else to do.

The watchdog is kicked every time the main loop "for(;;)" loops. After this, the main loop runs FSM_Scheduler - which exits only when the ReadyQueue (RQue) is empty. And, in our faulty case, other processes (here represented as P1 and P2) had nothing to do. At least they did not communicate, so, there was no kick_dog from any moving of data in the CHANnel either.

Other publications at http://www.teigfam.net/oyvind/pub/pub.html