Close

Switch, Case and Hardware Call Stacks

A project log for MicroPort

USB-CDC Serial port for PIC18F, in under 1KiB. Refactored down from a USB-DFU bootloader, hand written in assembler to be light and fast

jesseJesse 12/31/2016 at 03:130 Comments

So... a lot of code is going to be coming up in the posts ahead and it'll probably easier to digest if I explain a couple facets of what the heck it is I'm doing.

First off, I'd like to dive into the DFU bootloader which has given rise to this project and hopefully many other projects to come. The basic idea, is to take what makes Arduino so great and do it better. That is, take a general purpose microcontroller and make it easy to prototype and deploy. While I started with Microchip's PIC before I'd ever heard the word Arduino, I'm among be the first to complain about their two tiered compiler offering. I chose to write my code in assembler, because it was much easier to write code that would work with both Microchip's and open source assemblers. I also had a fascination with USB, which offers to make devices that can really integrate with our existing computer systems, rather than COM ports requiring additional software.

Ultimately, the goal was a firmware platform that could handle virtually any USB standard, support open specifications for firmware updates, and stay the hell out of the programmer's way.

One of the quirks of PIC's 8-bit architecture is a hardware call stack. This means the devices can handle a depth of N function calls (in this particular case, 32) before the stack overruns and execution effectively halts. Earlier prototypes for the USB logic in the bootloader involved a few nested levels of function calls. This eats into the 32 call budget, and for the bootloader to stay out of the way a better means of segregating the logic was desirable.

The cleanest way to structure the code was to replicate the switch/case model often used in higher languages, and replicating it in assembler wasn't too much trouble. Some creative liberties are taken (break statements are implied) and implementation is tuned to the architecture.

The value being tested if first loaded into the working register, or WREG as it's called on PICS. The case statements flip bits in the working register using a bit-wise exclusive-or (XOR), testing for a zero. A conditional branch skips to the next case statement if the result doesn't match the current case statement, and the process repeats. However, using XOR makes it very intuitive. In a naive implementation, any changes made to WREG need to be undone before checking the next case statement. This would mean XORing the value we just tried to check, and then XORing the value we want to check for a match to. However, these two steps can be combined and these macros simplify the production of this mask by XORing two consecutive case values together. This produces the exact bit flips required to convert the previous case into the current case.

Little moment to blather here... I love XOR. I think mathematically it is one of the neatest transformations around. AND and OR are both very useful and have specific jobs to do, but XOR is very versatile. Taking a value and XORing it with a max value, will perform the bit-wise NOT. Any XOR operation is reversible, by repeating the same operation. And, as a consequence of this last fact, XORing doesn't destroy any data. Let me share a snippet of PIC assembly that sums up the utility of this operation. Lets say, we have a value in the working register, and we want to swap this value with a value in some memory location called regX.

xorwf	regX, F    ; WREG XOR regX => regX
xorwf	regX, W    ; WREG XOR regX => WREG
xorwf	regX, F    ; WREG XOR regX => regX

Maybe I'm going a little overboard, but this is poetry. Like moving two decks of cards through each other, having them come apart in the same order they went in. To describe the code above step by step, the first instruction takes regX and turns it into a map of the differences between regX and the working register. It then uses this map to transform the value in the working register into the value that was in regX. It then uses this copy of what was in regX (now in the working register, stay with me) and uses this value to transform the map in regX into the value that was originally in the working register. No data is lost in any of these operations, and this is as efficient in terms of instructions executed as using a temporary ram location.

So, every case statement is exactly one transformation instruction, followed by a conditional branch. Admittedly, it's not a jump table like the case statements of C, but 3-4 cycles per mismatched case is plenty fast for most purposes.

Using the swtich/case pattern in the bootloader allows calls into user code from the interrupt handler, with a worst case of 3 levels of stack occupied. Two of those levels are used by the interrupt handlers, and the remaining one for either a bootloader call or a call into user code. The rest of the stack is available to the developer of the supplemental firmware.

I'm uploading the macros for case/select to this project. They're not commented in any meaningful sense, and the notable complexity comes from supporting nested case statements. But the core concept being automated by the scripts is the application of XOR to input values being tested, and skipping non-matching blocks of logic.

Discussions