Project | Semyon | Hackaday.io

« Back to project details Sort by:

Splitting the code to files properly
08/22/2020 at 00:38 • 0 comments
When the code has grown big enough, it made sense to split it into several files. So I've split it into files which make sense together. How can it be done?

First, one must have define files. Akin to header files in C, these are used to define constants - such as SFR locations - and macros - such as SFR assignments.

These files are pure assembly files, and as it seems that ASXXXX is agnostic to file extension, I'll go with ASXXXX examples and call these "define.def" and "macro.def".

These are used in other assembly files using the .include directive, just like C preprocessor #include.

The cruedest way to get all my files together is to create a capital assembly file which includes all these files. Here is "semyon.asm" where it all goes together:
```
.module semyon

;Def file includes
.include "define.def"
.include "macro.def"

;Asm file includes
.include "main.asm"
.include "intv.asm"
.include "inth.asm"
.include "dseg.asm"
.include "io.asm"
.include "delay.asm"
.include "pwm.asm"
```
The "include" directive just copies the included files into the caller file. Unsurprisingly, it gets assembled just as good as assembling it all in one file.

But now doing it properly

This approach is not good. The conceptual problem is that it's not doing what I intended - I didn't want it to simply copy and paste my code together, but to assemble it in pieces and then assign all the addresses and tie the hex file together.

The practical problem is that ASXXXX is not smart enough to trace bugs into the included files. A bug in "io.asm" will come out as a bug in "semyon.asm" in line 12, which is where the problematic file is included. It leaves you guessing where in that file the error has occurred, and I'm not masochistic enough for that.

Nope, the proper way mandates that I assemble each file independently. It makes sense to include all the ".def" files in each ".asm" file then, but some labels are cross referenced - for example, "main.asm" calls for functions from "io.asm". This can be solved by assembling all the files with global flags.

This is the makefile:
```
build:
    as8051 -losga main.asm
    as8051 -losga intv.asm
    as8051 -losga inth.asm
    as8051 -losga io.asm
    as8051 -losga delay.asm
    as8051 -losga dseg.asm
    as8051 -losga pwm.asm
    
    aslink -f semyon
    packihx semyon.ihx > semyon.hex
```
The "los" flags are the old output files flags. The "g" and "a" flags make user-defined and undefined symbols global, respectively (see docs.). It means that the output files expect to assign these at linking time.
Linking
The linker is called aslink (sdld in the sdcc version) and is quite simple. It can get directives from a file, using the "-f" flag.
The directives are simply structured: Linker flags, output file name, list of input files, and the reminder of the flags, mostly link-time symbol value assignments.
My linking file "semyon.lnk" is looking like that:
```
-mxiu
semyon
main
intv
inth
delay
pwm
dseg
io
-b CODE = 0x0090
-e
```
the -mxiu flags means generate map file, hex base, intel hex output, and update the list files, respectively. See more.
"semyon" is the name of the output file, the rest are the input files (extensions get ignored).
"-e" is end-of-file marker flag.
About the .area directive
The mysterious .area directive which bugged me makes sense now when multi-file code is concerned - the linker should know how to put the assembled files together, at what addresses and so on.
The "intv.asm" file, which includes the interrupt vectors, should be strictly located in predefined addresses using the ".org" directive. Thus the area it got, called INTV, must be defined ABS, which means that the addresses must be manually assigned.
Most of the code however goes in the CODE area, which got the REL flag. That means each file to whom this area was assigned will be concatenated upon the other files in that area, and the ".org" directive is prohibited.
However, the REL areas must begin somewhere. Originally I wanted it to be at 0x90, after all the interrupt vectors. I can assign this to REL areas in link time, using the "-b" flag:
```
-b CODE = 0x0090
```
The code links very well this way, and function exactly as if it was a one-filer.
Using macros
07/31/2020 at 17:34 • 0 comments
Suppose you want to enable or disable external interrupts with certain configurations. You'll have to wiggle some SFR bits for the purpose, probably involving multiple SFRs.

For me, it's looking thus:
```
;this is ext_int_enable
    orl TCON, #0x05     ;IT1|IT0 - falling edge only
    orl IE, #0x05    ;EX1 | EX0
    orl AUXR2, #0x30    ;EX3 | EX2

;this is ext_int_disable
    anl AUXR2, #~0x30    ;EX3 | EX2
    anl IE, #~0x05    ;EX1 | EX0
```
This is quite ugly. I want to write it down only once, and than use it couple of times. It makes the code more readable as the purpose of code snippets is made clear to the reader, and reduces the nuisance of writing the whole thing multiple times, reducing bugs and errors introduced by non-careful typeing.

Sure I can treat these as functions, with calls to their label and returns, but it will miss the point - putting aside the overhead of the function call, which can grow quite large for configurations-rich code, this is not the conceptual Idea I wanted to use in the first place.

What I really want is that the assembler will replace each of these tags:
```
ext_int_enable
```
with the code snippet:
```
    orl TCON, #0x05     ;IT1|IT0 - falling edge only
    orl IE, #0x05    ;EX1 | EX0
    orl AUXR2, #0x30    ;EX3 | EX2
```
I want it to be replaced directly everywhere the code where the tag appears. I want it to simply delete the tag and paste the relevant snippet instead, in the source code.

For those of you familiar with C, this is akin to using preprocessor macros (conceptually you can achieve it with an inline function too, given you forcethe compiler to inline).

Good assemblers, such as ASXXXX which SDAS is based upon, support macros which act just that way. The way to use these with SDAS looks thus:
```
;ext_int
.macro ext_int_enable
    orl TCON, #0x05     ;IT1|IT0 - falling edge only
    orl IE, #0x05    ;EX1 | EX0
    orl AUXR2, #0x30    ;EX3 | EX2
.endm

.macro ext_int_disable
    anl AUXR2, #~0x30    ;EX3 | EX2
    anl IE, #~0x05    ;EX1 | EX0
.endm
```
Calling macros is almost trivial. In the last post I defined my external interrupt handler thus:
```
ext_interrupt_handler:
    anl AUXR2, #~0x30    ;EX3 | EX2
    anl IE, #~0x05        ;EX1 | EX0
    reti
```
The inside is ext_int_disable, which can simply be called as a macro defined earlier:
```
ext_interrupt_handler:
    ext_int_disable
    reti
```
The assembler replaces the ext_int_disable symbol with the internals of the macro definition above before assembling it. Quite neat IMO.

Macro arguments

Say I want to do something cleverer than static configuration of SFRs, e.g. configuring a timer to some value:
```
    mov TL0, #(0x10000-count)&0xff
    mov TH0, #((0x10000-count)>>8)&0xff
```
Where count is the number of timer cycles I want. I might want to use this in several places with different cycle count (that are constant in the code), or rather change this value upon assembly with variable flags (say, different values for different main-clock frequencies).

One must pass the value to the macro with each use, some how. Luckily, ASXXXX is smart enough to do it quite trivially, by adding the arguments with commas to the macro definition:
```
.macro t0_set_count, count
    mov TL0, #(0x10000-count)&0xff
    mov TH0, #((0x10000-count)>>8)&0xff
.endm
```
The use is similar. Possible use I had in my code:
```
delay_debounce:
    t0_set_count, 0x0010
    sjmp delay_activate

delay_display2:
    t0_set_count, 0x2000
    sjmp delay_activate

delay_display:
    t0_set_count, 0x5000
    sjmp delay_activate
```
Notice that the values are constant. They can't change on the fly, only during assembly time. To change these values midrun, one must use functions rather than macros. Other possible solution is using a macro with static variables instead of the "count" argument, and change these variables between macro calls.

Advanced macros

Assume one want a macro even more elaborate. For example, I want to choose sleep-mode upon calling some macro. I might want it to look something like this
```
.macro ext_int_get_input, pd_flag
ext_int_get_input_beginning:
    clear_ext_int_flags     ;Necessary
    ext_int_enable
    interrupt_enable
    .if pd_flag = 1
        orl PCON, #0x01     ;PD = power down
    .else
        orl PCON, #0x01     ;IDL
    .endif
    interrupt_disable
.endm
```
First, notice that I call macros from within this macro definition. This is called macro nesting, and is allowed by ASXXXX up to a depth of 20 calls-within-calls. The number is arbitrary, but should be plenty for virtually all users.

The .if and .else lines are assembly-time if/else expressions, akin to #IF/#ELSE in C preprocessor. These will choose whether the assembler will put one branch in the code, or the other.

In the example above, calling
```
ext_int_get_input, 1
```
will effectively be replaced with
```
    clear_ext_int_flags     ;Necessary
    ext_int_enable
    interrupt_enable
    orl PCON, #0x01     ;IDL
    interrupt_disable
```
so that the .if branch is ignored and not assembled.

This is fine for using only one of the options, as if pd_flag is a global assembly flag. However, using both the options in the same code will raise an error:
```
<m>   Multiple  definitions  of  the  same label, multiple
            .module directives, multiple conflicting  attributes
            in  an  .area or .bank directive or the use of .hilo
            and lohi within the same assembly.</m>
```
The wits of ASXXXX macro processor are limited, and different pd_flag values require it to assemble the macro once again, but the label "ext_int_get_input_beginning" is already in use by the other version of the macro, and it simply can't be assembled.

For us it's clear that each instance of the macro is different, and expect the assembler to assign these with different names such as "ext_int_get_input_beginning10000$" and "ext_int_get_input_beginning10001$" depending on the instance, but it really can't make this abstraction for us.

There's a manual way for doing it:
```
.macro ext_int_get_input, pd_flag, ?rand
ext_int_get_input_beginning'rand: 
    clear_ext_int_flags     ;Necessary.
    ext_int_enable
    interrupt_enable
    .if pd_flag = 1
        orl PCON, #0x01     ;PD = power down
    .else
        orl PCON, #0x01     ;IDL
    .endif
    interrupt_disable
.endm
```
The '?' in ?rand is a special letter which means that rand is getting a randomly generated value unless directly assigned a value for. This random value will change each time the macro is called.

the ' operator is for symbol name concatenation. If ?rand gets the value "10097" than "ext_int_get_input_beginning'rand" will be replaced by the label "ext_int_get_input_beginning10097$".

This is a rather nasty business to write down, but it does work very well and allow complex macros to be written.
That's about all there is to tell about macros in ASXXXX.
Peripherals and Interrupts
02/14/2020 at 23:29 • 0 comments
So after having a working version of Semyon I wanted to familiarize myself with use of the special hardware present in the device. That is, timers, external interrupts, and special power modes.

Timers

So the STC15F104W has 2 timers, called T0 and T2.

T0 is really a 16 bit auto-reload timer. One can disable auto-reload or use other timer modes like the 8051 traditional 8-bit auto-reload timer. The traditional control bits for the timer exist.

T2 is a skinnier version, only functioning as a 16 bit auto-reload timer. It is totally non-compatible with T2 present in the 8052 MCU, and has no bit controls - one has to fiddle with the whole control registers themselves.

None of these has a prescaler except for the 12 clock prescaler for legacy support, which is kinda lame. However, given the auto-reload feature, one can easily use overflow interrupts to get that exact functionality without giving up any clock cycle precision.

The first target for changes was the delay calls. The DJNZ loops are simple but this is a classic place to use a timer at. The delay functions now looked thus:
```
delay_debounce:
    mov r7, #0x01
    sjmp delay_loop        

delay_display2:
    mov r7, #0x05
    sjmp delay_loop

delay_display:
    mov r7, #0x16
    
delay_loop:
    mov TL0, #0x00
    mov TH0, #0xc0
    delay_loop_2:
        setb TR0
        jnb TF0, .
        clr TF0
        djnz r7, delay_loop_2
    clr TR0
    ret
```
This is really setting the timer, and continually polling it. The timer is set to initial value of 0xC000, which is effectively a 14-bit timer which overflows faster. The loop is repeated R7 times, and thus granularity is achieved.

The next victim must be the seed generation. As mentioned in previous logs, it incremented the LFSR, pooling user input in-between. Replacing it with a time is classic too:
```
initialize:
    ;This is the initialization phase of semyon.
    ;It should also generate the seed value for PRNG.
    mov V_LED_CNT, #1
    mov V_LED_MAX, #1
    mov TL0, #0x01
    mov TH0, #0x00
    mov TMOD, #0x00
    mov AUXR, #0x81
    setb TR0
    
    initialize_seed_loop:
        mov a, P3
        orl a, #P_LED_ALL
        cjne a, #0xff, initialize_ret
        sjmp initialize_seed_loop
        
    initialize_ret:
        mov a, P3
        orl a, #P_LED_ALL
        cpl a
        cjne a, #0x00, initialize_ret
    
    clr TR0
    clr TF0
    mov V_SEED_L, TL0
    mov V_SEED_H, TH0        
    lcall delay_display
    mov V_STATE, #S_DISPLAY_SEQUENCE
    ret
```
That is lots of timer configurations, then enableing the counter and polling user input, then waiting for user to release the buttons, and using the timer value as the seed.

This makes the seed to increment about 47 times faster. It is almost feasible to use a 24-bit LFSR!

External interrupts

In STC15 family there are 5 external interrupts - the traditional INT0 and INT1, and INT2, INT3 and INT4 which are only falling edge activated. In STC15F104W, P3.2 to P3.5 are mapped to INT0 to INT3 respectively, which means they can be used to get user input.

So I declared the relevant interrupt vectors:
```
.org 0x0003     ;ext0
_int_GLED:
    mov V_INTERRUPT_LED, #P_N_LED_G
    ljmp ext_interrupt_handler


.org 0x0013     ;ext1
_int_BLED:
    mov V_INTERRUPT_LED, #P_N_LED_B
    ljmp ext_interrupt_handler


.org 0x0053     ;ext2
_int_YLED:
    mov V_INTERRUPT_LED, #P_N_LED_Y
    ljmp ext_interrupt_handler


.org 0x005b     ;ext3
_int_RLED:
    mov V_INTERRUPT_LED, #P_N_LED_R
    ljmp ext_interrupt_handler
```
V_INTERRUPT_LED is a new variable I declared to store a value indicating which button was pressed, and is used in the game logic akin to the way the polled P3 value was used.

All these external interrupts jump to the same handler, which disables the external interrupts:
```
ext_interrupt_handler:
    anl AUXR2, #~0x30    ;EX3 | EX2
    anl IE, #~0x05        ;EX1 | EX0
    reti
```
Power and Clock Control

Waiting for external interrupts to happen using an idle loop that polls something still misses the point. What I really want is to enable external interrupts, and then halt the CPU until the interrupt happens.

There's a register that allows one to do it, called PCON.

PD and IDL bits set the MCU to Power-Down and Idle modes, respectively.

In Idle mode, the CPU is shut down, but the rest of the hardware still function - that is all the peripherals, including timers, com. and ADCs. The CPU will wake up at any interrupt set to it.

In contrast, Power-Down mode shuts down the whole device, so it can only wake up in case of external interrupts.

Idle mode is what I wanted earlier - and it replaces the idle loops with much elegance.
```
ext_int_get_input_beginning: 
    mov IE, #0x05    ;EX1 | EX0
    mov AUXR2, #0x30    ;EX3 | EX2
    setb EA

    orl PCON, #0x01     ;IDL

    clr EA
    ;Some debounce logic 
```
For user input where no timer should run in the background, one can also go Power-Down altogether:
```
    orl PCON, #0x02     ;PD = power down
```
In the delay routines, the IDL mode applies too.

Moreover, one might want to slow down the whole system clock, as the timers can't be prescaled. This can work for the delays as no computations are required to be made when delay is called.

Prescaling is done using the PCON2 register (also called CLK_DIV) which is specific to STC MCUs. It's 3 LSBs control a system clock divider up to 128.

Using the divider and idle mode make the delay much more elegant, littered only by SFR configuring:
```
delay_debounce:
    mov T2L, #0x00
    mov T2H, #0xfc
    sjmp delay_activate    

delay_display2:
    mov T2L, #0x00
    mov T2H, #0xe0
    sjmp delay_activate

delay_display:
    mov T2L, #0x00
    mov T2H, #0xb0
    ;sjmp delay_activate
    
delay_activate:
    orl IE2, #0x04        ;Enable T2 interrupt
    orl AUXR, #0x04        ;T2 is 1clk
    orl PCON2, #0x07     ;clk/128
    setb EA
    
    orl AUXR, #0x10     ;enable T2
    orl PCON, #0x01     ;IDL
    anl AUXR, #~0x10     ;disable T2
    
    clr EA
    anl PCON2, #~0x07     ;clk/1
    ret
```
Know your hardware

I might be an extremist, but IMO the hardware and real-life events are the focus of embedded design, and the programming is only a tool to get there.

Thus, peripheral configuration and use is the essence of embedded programming. It is there where the slim line between programming and real-life is.

An embedded code which makes almost no use of the existing peripherals is really missing the point. E.g. writing code that wiggles GPIOs using digitalWrite() in arduino is not embedded code - the thought process is that of general purpose computer programming rather than a real-time/control mindset. Moreover I believe that one haven't really used an MCU until he activated some of it's hardware by tweaking the SFRs directly.

Even if I sound completely nuts, using the peripherals still has one good point - it is very educational for me. Using peripherals rather than funky code solutions is the right direction towards more complex projects which shall require peripheral use, say real-time control, which is very cool.
The bugs
12/29/2019 at 18:43 • 1 comment
One thing I've learned from this project is that programming in C keeps the programmer from lots of trouble - it generates the tedious parts of the assembly for you such as switchcase implementations, it assigns variable addresses for you, makes wise use of the registers for you (if it's smart enough) and generally helps one focus on the logic rather than the housekeeping.
It also keeps you from a big class of bugs. I had many bugs in this project which are not possible to make using a higher language. It turns out that one can make very, um, creative bugs when assembly programming.
Debug how?
The STC15F104W has no debug peripherals. It doesn't even have a UART module (if we believe the datasheet), which leaves printf debug out unless I bitbang the UART protocol myself. So what else can one do?
One possible solution is using a simulator. SDCC comes with a simulator called uCsim. It is a rather simple command line tool that accepts hex files and can do run, step and so on. The executable is called s51. Using it may look something like this:
```
> s51 semyon.hex

uCsim 0.6-pre54, Copyright (C) 1997 Daniel Drotos.
uCsim comes with ABSOLUTELY NO WARRANTY; for details type `show w
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.

0> Loading from semyon.hex
296 words read from semyon.hex
step
Stop at 0x000090: (109)
     R0 R1 R2 R3 R4 R5 R6 R7
0x00 fa 16 bb 11 ad ae 24 88 ......$.
@R0 53 S  ACC= 0x00   0 .  B= 0x00
@R1 0b .  PSW= 0x00 CY=0 AC=0 OV=0 P=0
SP 0x07 88 24 ae ad 11 bb 16 fa .$......
   DPTR= 0x0000 @DPTR= 0x5e  94 ^
   0x0090 e5 40    MOV   A,40
F 0x000090

0> run
Simulation started, PC=0x000090

Stop at 0x0000c5: (105) User stopped
F 0x0000c5
Simulated 2010456 ticks in 1.501994 sec, rate=0.121033

0> step 2142
Stop at 0x0000c5: (109)
     R0 R1 R2 R3 R4 R5 R6 R7
0x00 81 75 bb 11 ad ae 24 88 .u....$.
@R0 29 )  ACC= 0xff 255 .  B= 0x00
@R1 a9 .  PSW= 0x00 CY=0 AC=0 OV=0 P=0
SP 0x09 00 98 88 24 ae ad 11 bb ...$....
   DPTR= 0x0000 @DPTR= 0x5e  94 ^
   0x00c5 08       INC   R0
F 0x0000c5
Simulated 36000 ticks in 0.032018 sec, rate=0.101667

0> dump iram 0x00 0x3f
0x00 81 75 bb 11 ad ae 24 88 Vw....$.
0x08 98 00 52 db 25 43 e5 3c ..R.%C.<
0x10 f4 45 d3 d8 28 ce 0b f5 .E..(...
0x18 c5 60 59 3d 97 27 8a 59 .`Y=.'.Y
0x20 76 2d d0 c2 c9 cd 68 d4 v-....h.
0x28 49 6a 79 25 08 61 40 14 Ijy%.a@.
0x30 01 01 6a a5 11 28 c1 8c ..j..(..
0x38 d6 a9 0b 87 97 8c 2f f1 ....../.
```
Using uCsim feel very spartan, because of it's crude/practical user interface. Although it should be easy to wrap uCsim in python and do complex things as the docs suggest, I look for something more user friendly. Alas, it doesn;t seem like there are any simulators which are much better.
Thus for most of the bugs, I used the LEDs as indicators for program state. A very crude printf if you'd like.
Traps for young players
The first bug took the longest time to find. I had delay loops that look something like that:
```
delay:
	mov r6, 0x00
	mov r7, 0x00
	sjmp delay_loop
	
delay_loop:
	djnz r7, delay_loop
	djnz r6, delay_loop
	ret
```
The logic didn't work right, but more furstrating was that the delays were non-consistent at all, getting shorter each time, then getting long as intended and repeat ad infinitum. Can you spot the mistake?
That right, i forgot the # symbol to mark immediate values. Instead of teh immediate 0 I gave it the IRAM address of r0 which is also 0, but r0 was in use by the logic and thus got altered, making the delay really groovy.
You'll never get such a bug with C - the closest thing would be misinterpreting a pointer as a variable or vice versa, and the compiler might warn you about it.
The next big bug I had was regarding the jumptable I showed in the previous logs. The code looked like this:
```
	mov a, r3
	jmp @a+DPTR
jumptable:
	sjmp light_rled
	sjmp light_yled
	sjmp light_gled
	sjmp light_bled
```
The other logic should have light the LEDs consequently using this routine, red-yellow-blue-green, however the red LED lighted once, then the yellow twice, and repeat.
That was very weird. I tried some things that behaved unexpectedly, most are adding conditional branches that light the green LED at strategic locations in the code.
It occurred to me that something fishy is going with the jumptebale. I tried to reverse order of the LEDs in the labels - the table stayed the same, only the relative jump addresses have implicitly changed. This time, things looked almost correct - the red LED indeed lighted up as it should, and so did the yellow, but the green one lighted yellow instead, and the blue one lighted green. Wuuuuut?
Can you spot the mistake?
SJMP is 2 bytes long. The code don't take account for it - thus the first LED always work right, the third lights the one before (get yellow instead of green) and the other two may do anything, as they get the relative address of some sjmps as their opcode.
This is quite a scary bug - depending on the jump offsets one can get some crazy logic going for him. Probably the blue LED originally altered r1 which was in use by the logic because that what the yellow sjmp offset looks like when considered as an opcode. Thus changing the offsets luckily changed it, somehow.
adding rl a before jmp solves the problem, and the code works flawlessly. Again, would never happen in C - the compiler does that for you.
More bugs
Remember the delay routine? I wanted to add several delays to the same function:
```
delay_debounce:
	mov r5, #0x01
	mov r6, #0x80
	mov r7, #0x00
	acall delay_loop		

delay_display:
	mov r5, #0x10
	mov r6, #0x00
	mov r7, #0x00
	acall delay_loop
	
delay_loop:
	djnz r7, delay_loop
	djnz r6, delay_loop
	djnz r5, delay_loop
	ret
```
I got very long delays. what's wrong?
Calling delay loop wasn't smart. by calling delay_debounce for example, I wait shortly, and when returning from delay_loop I run into delay_display, waiting this delay too. But the worst is when returning from delay loop - I instantly get into delay_loop once again, this time with all registers equal 0x00, thus waiting for the chip to count from 2^24 down to zero.
This is dumb, but has two simple solutions - adding ret at the end of each delay routine, or replacing acall with sjmp. Both do the trick, but the last one is more elegant and I chosen it.
The first version of my state machine didn't work either. Debug seem to suggest that it only goes to the initialize state. Can you spot the mistake?
```
main:
	;This is the state machine that controls Semyon's logic.
	mov a, V_STATE
	s_initialize:
		cjne a, #S_INITIALIZE, s_display_sequence
		lcall initialize
	s_display_sequence:
		cjne a, #S_DISPLAY_SEQUENCE, s_get_user_input
		lcall display_sequence
	s_get_user_input:
		cjne a, #S_GET_USER_INPUT, s_game_over
		lcall get_user_input
	s_game_over:
		cjne a, #S_GAME_OVER, s_invalid
		lcall game_over
		
	s_invalid:
		mov V_STATE, #S_INITIALIZE
		ljmp main
		;lcall reset
```
The answer is that I had to add sjmp main after each state call. otherwise I go to the invalid state always. I could have also dumped that invalid state and call it a day.
However this bug could have happened in C. This is akin to forgetting the break statement after each case is finished, making a switch case fallthrough you have not intended to make.
Careful with the flags
Two last simple bugs I've encountered, where misuse of flags screw you hard.
The first occurred when I was trying to use the JNZ opcode. It jumps whenever the accumulator isn't clear. It is necessary for branching on comparison of two registers for example, or whenever CJNE doesn't play well with the addressing mode you desire:
```
subb a, r3
jnz get_user_input_game_over
```
The subb opcode should have been the comparison, but notice the extra b - it stands for borrow from the C flag - thus ruining your life if C is not cleared before use of subb. I replaced it with xrl to get teh same result with less fuss.
Another one is when trying to make a 16 bit wide counter:
```
inc r0
addc r1, #0
```
Turns out that this counter is only one byte wide - inc don't affect the C flag, leaving r1 untouched. Two possible fixes:
```
inc r0
cjne r0, #0x00, .-1
inc r1
```
or
```
add r0, #1
addc r1, #0
clr c
```
Am I a masochist?
Seriously, why walking into this minefield of bugs in purpose? All these headaches and nightmarish bugs could have be skipped over by simply using C.
It is true, though I would have learned much less in the process. This project made me learn many things about CPU internals, how programming toolchains look like and work - and how much more time consuming is assembly programming compared to compiled languages. I tend to learn better the hard way, it seems.
I don't intend to keep using assembly for other projects, but I would use asm if necessary - and would feel much more comfortable doing so. But that's a bonus - the insights are the real prize.
The code - part 2 - Random colors and buttons
12/27/2019 at 17:56 • 0 comments
My name is Random, Pseudo Random

We need to create a random sequence to display to the player. Generating real random values for the LEDs is possible, though may be somewhat cumbersome as it means constantly generating random variables and storing them.

Moreover, it is probably unnecessary. This is just a game, not some crazy bitcoin e-wallet that depends on true randomness to securely store all your money or something.

Introducing pseudorandomness! We can generate a sequence that looks seemingly random to the unsuspecting eye, but is generated using some sort of deterministic algorithm.

Magical LFSRs

One such algorithm is called a Linear Feedback Shift Register, or LFSR in short. The idea is using a shift register of certain length, and shift in the XOR of several bits from the shift register itself (hence the feedback). These bits are usually referred to as the LFSR taps.

For an LFSR, initial state matters. All LFSRs output a constant stream of 0s when initially loaded with zeros. But when loaded with anything else, a sequence of 1s and 0s will flow out.

An LFSR is a finite automaton, thus can only output a finite stream of bits before it repeats itself. If the taps are chosen in a certain way, one can get the longest stream possible, which for an LFSR of n bits is 2^n - 1 states.

For further read, I can recommend the book 'mathematics - the man made universe' by Sherman k. Stein, whose 8th chapter offers a different look on the subject of such maximal bit sequences, concerning medieval indian poetry rhythms.

Are 16 bits enough?

Anyway, I've chosen to use an 16-bit LFSR, where each LED value is two consequent bits of the LFSR. It means that all the possibilities for the first 8 LEDs are possible (apart from 8 consecutive LEDS), but the ninth LED and beyond will be determined by these first 8 LEDs in a deterministic way.

How long will it take until the player would play the same game twice? According to the birthday paradox, after about 256 games there is a 50% probability that some games had identical sequences.

That result is good enough for me - I don't suspect that any user will play that many game and also remember the sequence behind that. Moreover, I personally get bored after 20 games at most, usually far less. So it must be fine I guess.

Comparing it to an 8 bit LFSR, the number of possible sequences is 256. The player will begin to see repetitions after 16 games with 50% probability, which isn't that great. The game will probably begin to feel degenerated after 15 minutes of gameplay or so.

How it really look like

The LFSR I decided to implement looks thus:

Notice that it's not what I have described before - this is a Galois LFSR, where the output get xored to multiple bits inside the shift register. I'll shortly explain why I chosen Galois LFSR, but for now it's enough to say that it's basic properties and behaviour remain the same.

The polynomial should be maximal to get a full sequence - I just took the polynomial from the table in the wikipedia article for LFSRs, and briefly made sure that it is indeed maximal by simple enumeration of the outputs.

This is how the nice picture translates into code:
```
inc_lfsr:
    ;Now with Galois LFSR of 16 bits with polynomial
    ;x^16 + x^15 + x^13 + x^4 + 1 (mask 0xa011)
    clr c
    mov a, r0
    rlc a
    mov r0, a
    mov a, r1
    rlc a
    mov r1, a
    jnc inc_lfsr_ret
    mov a, r0
    xrl a, #P_LFSRMASK_L
    mov r0, a
    mov a, r1
    xrl a, #P_LFSRMASK_H
    mov r1, a
inc_lfsr_ret:    
    ret
```
There isn't that much to it. r0 and r1 are the low and high byte of the LFSR respectively (MSB of r1 is the feedback bit). The convenient way to shift them left as one long shift register is shifting each byte, using the C flag to hold the output of low byte and pass it to the higher byte.

After we shifted them all we're left with a feedback bit, now stored in the C flag. If C is 0, no action is needed and we immediately return. However if it is 1, the relevant mask must be xored to it.

The brancing itself is made using the JNC opcode, which is quite useful for multi-byte logic. The mask is stored as parameters which are used as immediate values.

If we want to make it a little bit prettier, and we use global variables anywhere (including registers), we can ditch some accumulator MOVs:
```
inc_lfsr:
    clr c
    mov a, r0
    rlc a
    mov r0, a
    mov a, r1
    rlc a
    mov r1, a
    jnc inc_lfsr_ret
    xrl 0x00, #P_LFSRMASK_L
    xrl 0x01, #P_LFSRMASK_H
inc_lfsr_ret:    
    ret
```
Despite being elegant, notice the use of absolute IRAM addresses for r0 and r1 (0x00 and 0x01 respectively) - It might become a huge mess would I want to switch a register bank for example.

This code portrays very well why I wanted a Galois LFSR rather than the 'regular' Fibonacci one - picking bytes one by one and xoring them together is uglier and more tedious to write - and far less convenient to change the tap values for than xoring whole bytes with constant masks.

To get an LED color I'd simply call inc_lfsr twice:
```
get_led_color:
    ;Puts in r3 the value of the next LED to display.
    mov r3, #0
    lcall inc_lfsr
    jnc get_led_color_2
    inc r3
    inc r3
get_led_color_2: 
    lcall inc_lfsr
    jnc get_led_color_ret
    inc r3
get_led_color_ret:
    ret
```
Random seed

The only catch is using a random seed, an initial value for the algorithm to use. Without it, Semyon will be a deterministic version of simon, which makes a very boring game. Trust me.

It means that I must have some source of randomness. One great such source is user input, or rather input timing. The player will never be able to click the buttons just at the same time since game over as in previous games - our clicking resolution is 10ms at best, while a counter can run as fast as the clock, around 100ns for instance. That's 5 orders of magnitude, plenty for random. This is stressing it out though - a standard user will take 1-2 seconds between games, as it looks to me.

Thus come initialization state of the game:
```
initialize:
    ;This is the initialization phase of semyon.
    ;It should also generate the seed value for PRNG.
    mov V_LED_CNT, #1
    mov V_LED_MAX, #1
    mov V_SEED_L, #0xff
    mov V_SEED_H, #0xff
    

    mov r0, V_SEED_L
    mov r1, V_SEED_H
    initialize_seed_loop:
        jnb RLED, initialize_ret
        jnb YLED, initialize_ret
        jnb GLED, initialize_ret
        jnb BLED, initialize_ret
        lcall inc_lfsr
        sjmp initialize_seed_loop
        
    initialize_ret:
    mov V_SEED_L, r0
    mov V_SEED_H, r1
    lcall delay_display
    mov V_STATE, #S_DISPLAY_SEQUENCE
    ret
```
Nothing fancy here. After initializing some important values, like 0xFFFF as initial LFSR value (just something that's not all zeros) I simply wait for the user to click any button. Until he clicks, I simply increment the LFSR periodically.

Let's see if it makes sense.

The MCU runs now with a clock of 11.0592MHz, that is 1.1E7 clocks a second approximately. There probably are ways which I don't know of to automatically count cycle duration of code, but for now let's count cycles manually.

According to the datasheet (page 340), One incrementation takes:
- 1 clock X 1 JNC
- 1 clock X 4 MOV A,Rn or vice verse
- 1 clock X 2 RLC
- 3 clocks X 2 XRL iram, #Immediate
- 3 clock X 1 JNC
- 4 clocks X 1 RET
- 5 clocks X 4 JNB
- 4 clocks X 1 LCALL
- 3 clocks X 1 SJMP
Total of 47 clocks. Given 65535 period of the LFSR, one should get:

Given that after losing the user will probably take more than 1 second to start a new game, this way to update the LFSR should span the whole seed-value range very well, with uniform-enough probability to get each seed. I'll call it random!

User input and debouncing

I always hate this part of such projects. If you never used them yet, know that tactile buttons are not very reliable - the physical contact tend to jump and vibrate, which can generate multiple click signals for the MCU input.

One way to solve it is connecting capacitors in parallel to the buttons and call it a day - they will low-pass-filter any pesky button bounce.

It makes sense when using discrete logic, but that's just lame when one has a full MCU to solve his problems. Thus I pulled out the vanilla way to debounce, which is simply adding delays.

I'm not proud of code you're about to see. Have a look:
```
poll_user_input_debounce:
    jnb RLED, poll_user_input_debounce_r
    jnb YLED, poll_user_input_debounce_y
    jnb GLED, poll_user_input_debounce_g
    jnb BLED, poll_user_input_debounce_b
    sjmp poll_user_input_debounce
    poll_user_input_debounce_r:
        lcall delay_debounce
        jb RLED, poll_user_input_debounce
        mov a, #0x00
        sjmp poll_user_input_debounce_delay
    poll_user_input_debounce_y:
        lcall delay_debounce
        jb YLED, poll_user_input_debounce
        mov a, #0x01
        sjmp poll_user_input_debounce_delay
    poll_user_input_debounce_g:
        lcall delay_debounce
        jb GLED, poll_user_input_debounce
        mov a, #0x02
        sjmp poll_user_input_debounce_delay
    poll_user_input_debounce_b:
        lcall delay_debounce
        jb BLED, poll_user_input_debounce
        mov a, #0x03
        ;sjmp poll_user_input_debounce_delay
        
    poll_user_input_debounce_delay:
    lcall delay_debounce
    jnb RLED, poll_user_input_debounce_delay
    jnb YLED, poll_user_input_debounce_delay
    jnb GLED, poll_user_input_debounce_delay
    jnb BLED, poll_user_input_debounce_delay
    ret
```
At first I simply wait for some button to be clicked == become low. If one is clicked, I wait for it as much as the delay_debounce routine takes, and if the button is still being held I return it's corresponding value. before the RET itself I debounce the part of leaving the button, which is probably redundant and unnecessary, but whatever.

One can also notice that this routine is an ugly m*thaf*cka - I wanted to maximize the use of bit opcode in this project, as these are quite new to me, but it turns out that the code will be much, much more compact and sensible by simply editing the P3 SFR instead.

Additionally, though it doesn't matter for this project, conditional jumps on bits e.g. JNB take as many clock cycles as more complex comparisons such as CJNE. This makes bitwise GPIO addressing quite pointless in my eyes.

This is a lesson learned - Bit operations are worth using only in specialized cases which deserve it. It might be a time-sensitive operations regarding a port e.g. bitbanging some serial protocol, or when extensive single bit calculations are required, to name a few I can think of. One exception to this: operations on the C flag are still very useful for multi-byte arithmetic and logic.

Last note about that - the code came out like some nasty version of arduino code, using these digitalRead wraps that work on one bit of the GPIO port registers at a time. I don't like using the arduino, somewhat for the lack of elegance and compactness of exactly such situations - Now doing so in 8051 assembly feels sevenfold worse and ugly.

But this thing works, anyway.

Game over

Nothing fancy - I just wanted all the LEDs to light together:
```
game_over:
    lcall delay_display
    clr RLED
    clr YLED
    clr GLED
    clr BLED
    lcall delay_display
    setb RLED
    setb YLED
    setb GLED
    setb BLED
    lcall delay_display
    lcall delay_display
    lcall delay_display
    mov V_STATE, #S_INITIALIZE
    ret
```
Nothing to behold. The clumsy use of bit operations once again shows that it wasn't a good idea to work that way. Here's a hypothetical way to make it much better by using regular byte operations:
```
P_LEDS_ALL = 0x3c

game_over:
    lcall delay_display
    xrl P3, #P_LEDS_ALL
    lcall delay_display
    xrl P3, #P_LEDS_ALL
    lcall delay_game_over
    mov V_STATE, #S_INITIALIZE
    ret 
```
Isn't it better?

That's the code basically. These two logs got mush longer than I expected.
Anyway as mentioned earlier, the whole code can be found in the git repository.
For me, this preliminary version of Semyon was very educating, and I hope it was for you too :)
The code - part 1 - Variables and Jumptables
12/24/2019 at 23:49 • 7 comments
Finally we've got there - a working version of Semyon! :D

I've incrementally built the code, starting with simply flashing the LED's, then flashing them according to a sequence stored in the memory, and when I've had a functioning game logic I only had to add a random sequence generation. All this history (and this first working version) can be found in the git repository of the project.

Let's have a look at the code.

Variables and Parameters assignment

Higher languages such as C hide from the user many many dirty details of their work. It's probably for the better. One of these details is assigning memory addresses to variables. However, writing in assembly, I have no such luxuries. Thus, I had to manually assign addresses to all the variables I use.

This way of work may seem inherently wrong to programmers who has only worked with high level languages, but strictly speaking about 8051 architecture, these MCUs were designed to be programmed that way. This is also why there are 4 switchable register banks, which are great for assembly programming. Compilers however don't use that feature well, if at all.

8051 was designd that way because compilers weren't as widespread as they are today - assembly was just the way one would program these things. I don't suspect the designers have believed that their design will be so popular and widespread, and refuse to die even 40 years after it's invention. That's also why there aren't any good, effective compilers for 8051, say a GCC port, despite the huge popularity of the ISA.

Moreover, the way I assign variables means that my variables will be global. This might pass shivers down the body of many programmers. Though it is technically possible to assign variables on the stack in assembly - that is what the compiler does, basically - this would be quite bulky and cumbersome, and given the fact that the stack is quite short anyway (and no heap to speak of without XRAM available), this is probably the original way to code this thing.

SFRs and parameters

Now to some code. First are SFRs which I use:
```
;SFRs
PCON2 = 0x97
```
This can seem weird, as the assembler should already recognize all of 8051's SFRs. Yet, each 8051 derivative adds new SFRs to the original design, which the assembler can't possibly know about.

Furthermore, some of the more liberal 8051 derivatives dumped some original peripherals, have altered their functionality, or have other SFRs located in the same addresses of the original ones. Timers are convenient victims - for example the STC15F104W has no T1 timer.

Thus I had to add myself the SFRs that I want to use. Specifically, in STC15 series, PCON2 controls the internal clock divider, which I wanted to modify sometime during development, and it's address is 0x97.

Later come parameters:
```
;Parameter values
P_LFSRMASK_L = 0x11
P_LFSRMASK_H = 0xa0

;State values
S_INITIALIZE = 0x00
S_DISPLAY_SEQUENCE = 0x01
S_GET_USER_INPUT = 0x02
S_GAME_OVER = 0x03
S_INVALID = 0xff
```
These are akin to /#define statements in C. These are simply constants which I use in the code. They have no manifestations as IRAM addresses as variables or SFRs have, but rather that of immediate values.

The state values are no different. These are the states of the state machine, which must be enumerated someway. Thus they are no different than parameters, except that their exact value doesn't really matter to me.

Variable and where (not) to find them

Then come the variables assignments:
```
;Variable addresses
V_LED_CNT = 0x30
V_LED_MAX = 0x31
V_STATE = 0x40
V_SEED_L = 0x20
V_SEED_H = 0x21

;Bool variables bit-addresses
RLED = P3.5
YLED = P3.4
GLED = P3.2
BLED = P3.3
```
The abstraction we have of variables in our minds boil down to simply a specific address in the memory which we tagged with a name and assigned a certain purpose to. The code is a lot more sensible when writing MOV V_STATE, #S_GET_USER_INPUT compared to MOV 0x40, #0x02 which isn't very meaningful to human beings.

The specific addresses chosen are quite arbitrary, with few exceptions:
- The first is that the first 0x20 bytes in memory are the 4 register banks, And one might not want to place his variables there. I assigned there no variables at all.
- The second is that addresses 0x20-0x2F are bit addressable, making them very good places to assign variables which need a lot of bitwise-massage. V_SEED_L and V_SEED_H might be such variables, thus located there.
- The third and last exception is that in 8051 the stack is in IRAM itself, and it's base address is wherever the programmer will assign it to. One rather not assign his variables in an address where the stack my overwrite it - its the fast lane leading to bugs very, very confusing to debug and reproduce.
The last point also means that recursive routines are a big no-no for any 8051 based MCU. I wanted to assign my stack base address to somewhere near 0x50, leaving me with a stack of 48 bytes. This should be plenty for Semyon - remember that original Simon programmers had only a single level stack to work with.

The last ones are the bit addresses of the GPIOs which drive the LEDs. 8051 has a distinct memory address space dedicated to bit variables with special operations such as SETB, CLR, CPL and so on.

Half of the bit-address space is mapped to the bits of the bytes in IRAM addresses 0x20-0x2F. The other half is mapped to some of the SFRs which are bit addressable, such as the port registers.

The assembler knows the standard P3.x addresses, so I just tagged them with the name I want to use.

The State Machine

As mentioned in the previous log, I've decided that the main function of Semyon will be an explicit state machine that governs the game logic:
```
main:
    ;This is the state machine that controls Semyon's logic.
    mov a, V_STATE
    s_initialize:
        cjne a, #S_INITIALIZE, s_display_sequence
        lcall initialize
        sjmp main
    s_display_sequence:
        cjne a, #S_DISPLAY_SEQUENCE, s_get_user_input
        lcall display_sequence
        sjmp main
    s_get_user_input:
        cjne a, #S_GET_USER_INPUT, s_game_over
        lcall get_user_input
        sjmp main
    s_game_over:
        cjne a, #S_GAME_OVER, s_invalid
        lcall game_over
        sjmp main
    
    s_invalid:
        mov V_STATE, #S_INITIALIZE
        ljmp main
        ;lcall reset
```
This thing is simply a C switch/case, as it may look in assembly. This is not the most pretty switch/case you've seen, but it is probably the simplest one - compare the value in question to the first immediate in your list, jump to the second one if it wasn't equal, and so on.

The s_invalid in this case is the default case.

Display LED colors with jumptables

A more elegant way to implement a switch/case statement, given that the cases are well ordered, is using a jumptable. That is, modifying PC ourselves using the value of the switch operand.

In 8051, the Program Counter register is not memory mapped. The only standard way I know to edit it is using the JMP opcode, which loads the value of A+DPTR to the PC register.

For Semyon I wanted to use only the bit variables of the GPIOs rather than the P3 SFR itself. I might change it in the future as it turned out not to be very elegant. This means no funny bit games with the P3 SFR, so I must represent each LED in some numerical way.

I chosen to translate two bits into an LED color. I have assigned it thus: Red = 00, Yellow = 01, Green = 10, Blue = 11. By adding this value to a certain DPTR, I can use JMP and find myself in one of 4 consecutive jump operations, which then jump me to where I really want to be. Take a look:
```
display_led:
    mov dptr, #led_jumptable
    mov a, r3
    anl a, #0x03
    rl a
    jmp @a+dptr
led_jumptable:
    sjmp light_rled
    sjmp light_yled
    sjmp light_gled
    sjmp light_bled
    
    light_rled:
        clr RLED
        lcall delay_display
        setb RLED
        lcall delay_display2
        ret
    light_yled:
        clr YLED
        lcall delay_display
        setb YLED
        lcall delay_display2
        ret
    light_gled:
        clr GLED
        lcall delay_display
        setb GLED
        lcall delay_display2
        ret
    light_bled:
        clr BLED
        lcall delay_display
        setb BLED
        lcall delay_display2
        ret
```
This is a popular compiler optimization which most compilers that honor themselves can utilize. This is also very good for situations where equal timing for all branches is needed. The repetitiveness of each branch in this particular example is not nearly the best code one can get though.

In the next part I'll talk about generating the (pseudo)random sequence and how to get user input (spoiler: button debouncing is a mandatory nag to tackle in such projects).
The Logic
12/21/2019 at 20:05 • 0 comments

The logic of a Simon game is quite simple. It should save a random sequence of LEDs to light, each round appended with another random LED value.

After appending it, the new sequence should be displayed to the user, and then wait for the user to click the buttons in corresponding order. If the user succeeds in doing so, the random list increments and the show goes on the same way. But if he fails in doing so, then he has lost - the game should be abruptly stopped with some visual signal, the random list is shorted to 1 value and the game begins from zero.

The basic implementation I thought of has these 4 states, each has it's special functionality:

There only seem to be 3 variables necessary - The length of the random list, the current index of the user in that list, and the random list itself.
The first one, which I have called V_LED_MAX, is really the score of the player. The second, which I called V_LED_CNT, is an auxiliary variable used to pass on the random list.
Though using jumps in the end of each state, I ended up making each state a function, called from a main switch-case that is the state machine. A new variable called V_STATE was added to store the state.
Looking at it now, using this state machine isn't really adding anything useful apart from making the state machine explicit rather than implicit, hiding at jumps in the code. But given that it's a very simple state machine, and given that V_STATE is updated before ret commands anyhow... meh, it could have been just jumps.
Anyway, I'll present some assembly code of Semyon in the next log.
Blink with SDAS
10/14/2019 at 20:10 • 2 comments
So after I gathered enough information and examples for using SDAS, I gave it a shot.
The code should light the LEDs one after the other, then turn them off at the same order. The result came out something like this:
```
.module blink

.area INTV (ABS)
.org 0x0000
_int_reset:
	ljmp main

.area CSEG (ABS, CODE)
.org 0x0090
main:
	cpl P3.2
	acall delay
	cpl P3.3
	acall delay
	cpl P3.4
	acall delay
	cpl P3.5
	acall delay
	nop
	nop
	nop
	nop
	sjmp main
	
delay:
	mov r4, #0x00	
	mov r3, #0x00	
wait:
	djnz r4, wait
	djnz r3, wait
	ret
```
In the spirit of the usynth example, the areas are called INTV for interrupt vector and CSEG for the code segment. The code begins in address 0x90 as the interrupt vector address of INT4#, the farthest interrupt here, is 0x83.
I called the assembler in the command line:
```
sdas8051 blink.asm
```
While my code had errors, it shouted errors at me. But once the code was functioning, nothing happened. No hex file has appeared, or other output file whatsoever.
"sdas8051 -h" to the rescue! By looking at possible flags, it looks like I want to add the flags -l, -o, and -s to generate list file, object file and symbol file accordingly:
```
sdas8051 -los blink.asm
```
Runnig this generated these files, but none are hex. It seems that there's a need for linking now - although there's only one file here.
The linker is called SDLD, and it's flag list suggests that the -i flag generates an intel hex out of the arguments:
```
sdld -i semyon
```
This generated a .ihx format file. Looking at it, it looks like some gimp cousin of the intel .hex file with a weird extension. I'm not the only one who hates it, so a short google has showed me that SDCC has a utility called 'packihx' just to make these .ihx files into proper .hex files, mostly by ordering and aligning them.
Now that I have a blink.hex file I can finally download it to the chip! The lights indeed did their thing on and off, as I wanted them.
To ease the build, I made for semyon the crudest makefile you've ever seen:
```
build:
	sdas8051 -los semyon.asm
	sdld -i semyon
	packihx semyon.ihx > semyon.hex
```
Now that's I can use SDAS correctly, it's time to do write semyon's firmware!

The assembler

10/14/2019 at 18:52 • 3 comments

Now that I took care of the hardware, it's time that I'll work my toolchain.

As mentioned, I want to use an open-source toolchain, and SDCC looks like a good choice. The suite has an assembler called SDAS, a linker, and some other stuff. As I want to use assembler, I must tackle SDAS.

SDAS is said to be based on the ASXXXX suite of assemblers which supports a hell lot of architectures. Still, I found little to no examples of use, and as it raises errors for sources that work on vanilla 8051 assemblers such as A51, I had to find other kinds of information.

For more information, I found a webpage with documentation for the original ASXXXX assembler. Specifically, I found the directives page very enlighting. But alone it's not enough for me to write an assembly code from scratch.

One thing I did was to compile a C file using SDCC and look at the output .asm file. So I've written a basic blink that looks somewhat like this:

#include <stdint.h>
#include <8051.h>

void main() {
    uint16_t i;
    while(1){
        for (i = 0; i == 0xFFFF; i++){}
        P3 ^= 0x04;
    }
}

This code was able to compile, but the resulting .hex file did not blink the LED. I probably haven't done it right, as SFRs may need special attention.

Lets look at the resulting assembly:

;--------------------------------------------------------
; File Created by SDCC : free open source ANSI-C Compiler
; Version 3.9.0 #11195 (MINGW64)
;--------------------------------------------------------
    .module blink
    .optsdcc -mmcs51 --model-small
    
;--------------------------------------------------------
; Public variables in this module
;--------------------------------------------------------
    .globl _main
    .globl _CY
    .globl _AC
    .globl _F0
...

The first thing is defining a module. After it is some special comand for sdcc. Then there are a whole lot of global variables, corresponding to special bits and registers. Note how directives start with a dot sign, unlike vanilla assemblers.

Then came this:

...    
    .globl _SP
    .globl _P0
;--------------------------------------------------------
; special function registers
;--------------------------------------------------------
    .area RSEG    (ABS,DATA)
    .org 0x0000
_P0    =    0x0080
_SP    =    0x0081
_DPL    =    0x0082
_DPH    =    0x0083
_PCON    =    0x0087
_TCON    =    0x0088
...

It declares something as an area, probably calling it a registers segment, with the ABS and DATA parameters. The ABS flag means, as I have learned later, using absolute locations for the code, thus the .org 0x0000 directive after it means that this segment of code starts at 0th address. Dunno about the DATA flag though. However it doesn't seem important, as this part only looks like a '#define' section.

Lets move on. The following lines contain an awful lot of these directives, without any real code, until we find the interrupt vector:

;--------------------------------------------------------
; interrupt vector 
;--------------------------------------------------------
    .area HOME    (CODE)
__interrupt_vect:
    ljmp    __sdcc_gsinit_startup
;--------------------------------------------------------
; global & static initialisations
;--------------------------------------------------------
    .area HOME    (CODE)
    .area GSINIT  (CODE)
    .area GSFINAL (CODE)
    .area GSINIT  (CODE)
    .globl __sdcc_gsinit_startup
    .globl __sdcc_program_startup
    .globl __start__stack
    .globl __mcs51_genXINIT
    .globl __mcs51_genXRAMCLEAR
    .globl __mcs51_genRAMCLEAR
    .area GSFINAL (CODE)
    ljmp    __sdcc_program_startup
;--------------------------------------------------------
; Home
;--------------------------------------------------------
    .area HOME    (CODE)
    .area HOME    (CODE)
__sdcc_program_startup:
    ljmp    _main
;    return from main will return to caller

Behold, a reset vector! It makes an LJMP to initialisations, which came out null for this piece of code. When it's done, it LJMPs us to the '__sdcc_program_startup' label which directly jumps us to the main function. This is probably akin to the '_start()' function of GCC.

Note how this time all the .area directives say CODE, rather than DATA.

Here comes the real fun:

;--------------------------------------------------------
; code
;--------------------------------------------------------
    .area CSEG    (CODE)
;------------------------------------------------------------
;Allocation info for local variables in function 'main'
;------------------------------------------------------------
;i                         Allocated to registers r6 r7 
;------------------------------------------------------------
;    blink.c:7: void main() {
;    -----------------------------------------
;     function main
;    -----------------------------------------
_main:
    ar7 = 0x07
    ar6 = 0x06
    ar5 = 0x05
    ar4 = 0x04
    ar3 = 0x03
    ar2 = 0x02
    ar1 = 0x01
    ar0 = 0x00
;    blink.c:14: for (i = 0; i == 0xFFFF; i++){
00111$:
    mov    r6,#0x00
    mov    r7,#0x00
00106$:
    cjne    r6,#0xff,00101$
    cjne    r7,#0xff,00101$
    inc    r6
    cjne    r6,#0x00,00106$
    inc    r7
    sjmp    00106$
00101$:
;    blink.c:17: P3 ^= 0x04;
    mov    r6,_P3
    mov    r7,#0x00
    xrl    ar6,#0x04
    mov    _P3,r6
;    blink.c:20: }
    sjmp    00111$
    .area CSEG    (CODE)
    .area CONST   (CODE)
    .area XINIT   (CODE)
    .area CABS    (ABS,CODE)

Code! finally. It looks quite ugly though, using these MOVs and XORs rather than a CPL P3.2 opcode. However, the code in this assembler looks kinda like you'd expect assembly to be looking like - nothing too different than other assemblers.

I also found a single piece of source code for this thing, in the archive of what seems to be a university lab private mailing list archive. It makes a little FM synth called usynth, and seems to be written directly in assembly. This isn't the first place I'd look for information at, but I'll take whatever I can right now. It looks similar to what I already know from looking on SDCC output. Notably this part looks all familiar:

.area INTV (ABS)
.org 0x0000
_int_reset:
	ljmp _start
.org 0x0003
_int_ex0:
	reti
.org 0x000b
_int_t0:
	ljmp T0_ISR
	.ds 5

.area CSEG (ABS,CON)
.org 0x0080

_start:
	clr IE.7

This is pretty much all I need it seems. Good enough, I guess I know now how to use the assembler now. Lets get going to writing our own code :)

The hardware
10/14/2019 at 10:52 • 0 comments

Now that I can download code to the micro, it's time that I design and build the hardware.

As the MCU has only 8 pins, there is no much choice but multiplexing the LEDs and the buttons. The GPIOs of traditional 8051 are open drain with little to no internal pullup. Though newer derivatives including STC15 series have other options for the GPIO which include push-pull, it defaults to this weak pull-up configuration.

This is quite useful, as we can pull down the LEDs (with it's in series resistor) with both the GPIO and a tactile switch to ground, connected in parallel to it. Thus the following schematics:

The 4 pin header to the left is used to get both RX and TX (and GND) from the USB to serial adapted, but also to get 5V of Vcc from it. I used an 90 degrees angled header. The switch on the power rail is used to turn the device on and off, necessary also for programming.

The values of the resistors were chosen empirically to get good enough brightness, but not too much. The capacitor on the supply line is usually a good practice and the datasheet recommends using one, although you'll get with a smaller one than 1uF.

I have put it all on a protoboard. Here's the result:

I made sure that it works by pressing all the switches and see the LED lights up, and also by downloading a code that blinks them all. Good enough, the hardware is simple enough that I built it with no errors whatsoever.

But now doing it properly

Linking

About the .area directive

Macro arguments

Advanced macros

Timers

External interrupts

Power and Clock Control

Know your hardware

Debug how?

Traps for young players

More bugs

Careful with the flags

Am I a masochist?

My name is Random, Pseudo Random

Magical LFSRs

Are 16 bits enough?

How it really look like

Random seed

User input and debouncing

Game over

Variables and Parameters assignment

SFRs and parameters

Variable and where (not) to find them

The State Machine

Display LED colors with jumptables