One thing I've learned from this project is that programming in C keeps the programmer from lots of trouble - it generates the tedious parts of the assembly for you such as switchcase implementations, it assigns variable addresses for you, makes wise use of the registers for you (if it's smart enough) and generally helps one focus on the logic rather than the housekeeping.
It also keeps you from a big class of bugs. I had many bugs in this project which are not possible to make using a higher language. It turns out that one can make very, um, creative bugs when assembly programming.
Debug how?
The STC15F104W has no debug peripherals. It doesn't even have a UART module (if we believe the datasheet), which leaves printf debug out unless I bitbang the UART protocol myself. So what else can one do?
One possible solution is using a simulator. SDCC comes with a simulator called uCsim. It is a rather simple command line tool that accepts hex files and can do run, step and so on. The executable is called s51. Using it may look something like this:
> s51 semyon.hex uCsim 0.6-pre54, Copyright (C) 1997 Daniel Drotos. uCsim comes with ABSOLUTELY NO WARRANTY; for details type `show w This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. 0> Loading from semyon.hex 296 words read from semyon.hex step Stop at 0x000090: (109) R0 R1 R2 R3 R4 R5 R6 R7 0x00 fa 16 bb 11 ad ae 24 88 ......$. @R0 53 S ACC= 0x00 0 . B= 0x00 @R1 0b . PSW= 0x00 CY=0 AC=0 OV=0 P=0 SP 0x07 88 24 ae ad 11 bb 16 fa .$...... DPTR= 0x0000 @DPTR= 0x5e 94 ^ 0x0090 e5 40 MOV A,40 F 0x000090 0> run Simulation started, PC=0x000090 Stop at 0x0000c5: (105) User stopped F 0x0000c5 Simulated 2010456 ticks in 1.501994 sec, rate=0.121033 0> step 2142 Stop at 0x0000c5: (109) R0 R1 R2 R3 R4 R5 R6 R7 0x00 81 75 bb 11 ad ae 24 88 .u....$. @R0 29 ) ACC= 0xff 255 . B= 0x00 @R1 a9 . PSW= 0x00 CY=0 AC=0 OV=0 P=0 SP 0x09 00 98 88 24 ae ad 11 bb ...$.... DPTR= 0x0000 @DPTR= 0x5e 94 ^ 0x00c5 08 INC R0 F 0x0000c5 Simulated 36000 ticks in 0.032018 sec, rate=0.101667 0> dump iram 0x00 0x3f 0x00 81 75 bb 11 ad ae 24 88 Vw....$. 0x08 98 00 52 db 25 43 e5 3c ..R.%C.< 0x10 f4 45 d3 d8 28 ce 0b f5 .E..(... 0x18 c5 60 59 3d 97 27 8a 59 .`Y=.'.Y 0x20 76 2d d0 c2 c9 cd 68 d4 v-....h. 0x28 49 6a 79 25 08 61 40 14 Ijy%.a@. 0x30 01 01 6a a5 11 28 c1 8c ..j..(.. 0x38 d6 a9 0b 87 97 8c 2f f1 ....../.
Using uCsim feel very spartan, because of it's crude/practical user interface. Although it should be easy to wrap uCsim in python and do complex things as the docs suggest, I look for something more user friendly. Alas, it doesn;t seem like there are any simulators which are much better.
Thus for most of the bugs, I used the LEDs as indicators for program state. A very crude printf if you'd like.
Traps for young players
The first bug took the longest time to find. I had delay loops that look something like that:
delay: mov r6, 0x00 mov r7, 0x00 sjmp delay_loop delay_loop: djnz r7, delay_loop djnz r6, delay_loop ret
The logic didn't work right, but more furstrating was that the delays were non-consistent at all, getting shorter each time, then getting long as intended and repeat ad infinitum. Can you spot the mistake?
That right, i forgot the # symbol to mark immediate values. Instead of teh immediate 0 I gave it the IRAM address of r0 which is also 0, but r0 was in use by the logic and thus got altered, making the delay really groovy.
You'll never get such a bug with C - the closest thing would be misinterpreting a pointer as a variable or vice versa, and the compiler might warn you about it.
The next big bug I had was regarding the jumptable I showed in the previous logs. The code looked like this:
mov a, r3 jmp @a+DPTR jumptable: sjmp light_rled sjmp light_yled sjmp light_gled sjmp light_bled
The other logic should have light the LEDs consequently using this routine, red-yellow-blue-green, however the red LED lighted once, then the yellow twice, and repeat.
That was very weird. I tried some things that behaved unexpectedly, most are adding conditional branches that light the green LED at strategic locations in the code.
It occurred to me that something fishy is going with the jumptebale. I tried to reverse order of the LEDs in the labels - the table stayed the same, only the relative jump addresses have implicitly changed. This time, things looked almost correct - the red LED indeed lighted up as it should, and so did the yellow, but the green one lighted yellow instead, and the blue one lighted green. Wuuuuut?
Can you spot the mistake?
SJMP is 2 bytes long. The code don't take account for it - thus the first LED always work right, the third lights the one before (get yellow instead of green) and the other two may do anything, as they get the relative address of some sjmps as their opcode.
This is quite a scary bug - depending on the jump offsets one can get some crazy logic going for him. Probably the blue LED originally altered r1 which was in use by the logic because that what the yellow sjmp offset looks like when considered as an opcode. Thus changing the offsets luckily changed it, somehow.
adding rl a before jmp solves the problem, and the code works flawlessly. Again, would never happen in C - the compiler does that for you.
More bugs
Remember the delay routine? I wanted to add several delays to the same function:
delay_debounce: mov r5, #0x01 mov r6, #0x80 mov r7, #0x00 acall delay_loop delay_display: mov r5, #0x10 mov r6, #0x00 mov r7, #0x00 acall delay_loop delay_loop: djnz r7, delay_loop djnz r6, delay_loop djnz r5, delay_loop ret
I got very long delays. what's wrong?
Calling delay loop wasn't smart. by calling delay_debounce for example, I wait shortly, and when returning from delay_loop I run into delay_display, waiting this delay too. But the worst is when returning from delay loop - I instantly get into delay_loop once again, this time with all registers equal 0x00, thus waiting for the chip to count from 2^24 down to zero.
This is dumb, but has two simple solutions - adding ret at the end of each delay routine, or replacing acall with sjmp. Both do the trick, but the last one is more elegant and I chosen it.
The first version of my state machine didn't work either. Debug seem to suggest that it only goes to the initialize state. Can you spot the mistake?
main: ;This is the state machine that controls Semyon's logic. mov a, V_STATE s_initialize: cjne a, #S_INITIALIZE, s_display_sequence lcall initialize s_display_sequence: cjne a, #S_DISPLAY_SEQUENCE, s_get_user_input lcall display_sequence s_get_user_input: cjne a, #S_GET_USER_INPUT, s_game_over lcall get_user_input s_game_over: cjne a, #S_GAME_OVER, s_invalid lcall game_over s_invalid: mov V_STATE, #S_INITIALIZE ljmp main ;lcall reset
The answer is that I had to add sjmp main after each state call. otherwise I go to the invalid state always. I could have also dumped that invalid state and call it a day.
However this bug could have happened in C. This is akin to forgetting the break statement after each case is finished, making a switch case fallthrough you have not intended to make.
Careful with the flags
Two last simple bugs I've encountered, where misuse of flags screw you hard.
The first occurred when I was trying to use the JNZ opcode. It jumps whenever the accumulator isn't clear. It is necessary for branching on comparison of two registers for example, or whenever CJNE doesn't play well with the addressing mode you desire:
subb a, r3 jnz get_user_input_game_over
The subb opcode should have been the comparison, but notice the extra b - it stands for borrow from the C flag - thus ruining your life if C is not cleared before use of subb. I replaced it with xrl to get teh same result with less fuss.
Another one is when trying to make a 16 bit wide counter:
inc r0 addc r1, #0
Turns out that this counter is only one byte wide - inc don't affect the C flag, leaving r1 untouched. Two possible fixes:
inc r0 cjne r0, #0x00, .-1 inc r1
or
add r0, #1 addc r1, #0 clr c
Am I a masochist?
Seriously, why walking into this minefield of bugs in purpose? All these headaches and nightmarish bugs could have be skipped over by simply using C.
It is true, though I would have learned much less in the process. This project made me learn many things about CPU internals, how programming toolchains look like and work - and how much more time consuming is assembly programming compared to compiled languages. I tend to learn better the hard way, it seems.
I don't intend to keep using assembly for other projects, but I would use asm if necessary - and would feel much more comfortable doing so. But that's a bonus - the insights are the real prize.
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.
STC15F104W has only 128B of RAM. SDCC assumes that MCU has 256B of RAM and stores variables in IDATA segment, it stores there some internal variables, those are not in user C program. I tried to crate a simple project with C and SDCC for STC15F104W and simple blink program was not working on STC15F104W but it was working on STC15F204EA. I found this project when I troubleshoot my problem. I created a simple blink in ASM and it worked, it proved that STCGAL (and STC-ISP) can flash program to STC15F104W. Later I used disassembler to found that SDCC uses IDATA RAM during initialization process and that is source of trouble at STC15F104W because there is no such RAM on this cute and limited MCU. Writing C code for STC15F104W is tricky, I think that some switches has to be used to limit SDCC to use only 128B of RAM...
Are you sure? yes | no