Despite the hopes and dreams of many embedded engineers, sadly, reliable code doesn’t happen by accident. It is a painstaking process that requires developers to maintain and manage every bit and byte of the system.
There is usually a sigh of relief when an application is validated “successfully” but just because the software is running correctly in that moment under controlled conditions doesn’t mean that it will tomorrow or the day after.
Tip #1 – Fill ROM with known value
Software developers tend to be a very optimistic group, at least as far as their expectations of how faithfully their microcontroller will run their code over time. The thought of the microcontroller jumping out of the application space and executing in unintended code space seems like a fairly rare case; however, the opportunity for this to occur is nothing more than a buffer overflow or maybe something minor.
It can and DOES happen! The resulting behavior of the system would be undefined since memory could have all 0xFF’s in the space by default or, since the region of memory normally isn’t written, the values could have decayed into only God knows what.
There is a pretty neat linker or IDE trick, though, that can be used to help identify and recover the system from just such an event. The trick is to use the “FILL” command to fill unused ROM with a known bit pattern.
There are many different possible combinations of what can be used to fill the unused memory with but if the intent is to build a more reliable system the obvious choice is to place an ISR fault handler in this location.
If something goes wrong and the processor starts to execute code outside of program space then the ISR will fire, providing the opportunity to store the state of the processor, registers and system before deciding on a corrective course of action.
Additional information on how to use FILL and alternative strategies for its use can be found in “Improving Code Integrity Using FILL” located here.
Tip #2 – Perform a RAM Check on Start-up
In order to build a more reliable and robust system it is important to ensure that the system hardware is functioning. After all, hardware does fail. (Thankfully software never fails; it just does what it was coded to do, whether right or wrong.)
Verifying that there are no issues with internal or external RAM on start-up is a great way to ensure that the hardware is functioning as expected.
There are many different methods that can be used to perform a RAM check but commonly what is done is write a known pattern, allow it to sit for a short period, and then read back. The result should be that what is read matches what was written.
The truth is that in most cases the RAM check will pass, which is what we want. But in the off chance that it doesn’t, this check provides an excellent opportunity for the system to flag that there is a hardware issue.
There is a memtest C module that was written back in 2000 by Michael Barr that will save an engineer time when considering a RAM test. The embedded.com link to download the module can be found here.
Tip #3 – Use a Stack Monitor
To a large number of embedded developers the stack seems to be quite the mystical force. When strange things start to happen and the engineer is finally stumped they begin to think, well maybe something is going on with the stack.
The result is blind tweaking and adjustments of the stack size, position, etc. Often enough the bug has nothing to do with the stack but how can one really be sure? After all, how many engineers actually perform a worst-case stack size analysis?
The size of the stack gets allocated statically at compile time but it is used in a dynamic way. As code is executed variables, return addresses, and other information that the application needs are stored on the stack.
This activity causes the stack to grow within its allocated memory. However, this growth can sometimes exceed the compile-time size limit, causing the stack to corrupt whatever lies in the memory region next door. One way to be absolutely sure that the stack is behaving is to implement a stack monitor as part of the systems health and wellness code (How many engineers do this?).
The stack monitor creates a buffer zone between the stack and “other” memory region, filled with a known bit pattern. The monitor then constantly watches the pattern for any changes.
If the bit pattern changes then the stack has grown too far and is on the verge of plunging the system into a dark abyss! The monitor can then log the occurrence, system states and any other useful data that can later be used to diagnose the issue.
A stack monitor is often available in most microcontroller systems that implement a memory protection unit (MPU). The part that is scary is that these are usually capabilities that are either off by default or that can be turned off by the developer.
A quick search of the internet reveals recommendations to turn off the stack monitor in an Real Time Operating Systems to save 56 bytes of flash space. Take a moment and reflect on the imprecations!