Cornell University
Electrical and Computer Engineering 4760 AVR mega644/1284
Mixing assembler with GCC
Introduction
It is possible to mix assembler and GCC in different ways:
- Separate file. Write a pure assembler
*.S
file and link it with the main C file.
This approach has the advantage of simpler syntax in a separate file. Another advantage is that C saves/restores registers for you (see below).
It has a runtime speed disadvantage for short assembler routines due to the function linkage generated by C. - Inline assembler. Write inline assembler directly into the C code.
You have to save/restore any registers you use. You get exactly what you write unless you use register constraints.
If you use constraints the compiler can play with the register assignments, but specifiying the constraints is bewildering.
The only runtime speed penality is the save/restore overhead, but you may be able to avoid the overhead with register constraints.
Inline assembler seems to be the only way to write aNAKED
interrupt service routine. - Assembler macro. Write an assembler macro and instantiate it in C (reusable inline assembler).
This is useful if you need to use a short chunk of inline assembler many times in a program.
The runtime speed is very good and register constraints allow the GCC optimizer to play with the code.
Before you actually write any assembly code you will need to read the instruction set architecture and description of AVR opcodes, and look at a bunch of assembler examples. Some examples are below. There are some tutorials, for instance scienceprog and Mixing C and asm. I find that the best way to learn assembler is to look at the assembler output of the compiler. In AVRstudio projects, the *.lss
assembler listing file is in the default
folder (in the project folder). Code up a few lines of C, open the lss
file, search for a line of C (included as comment by the compiler), and
see what the compiler did. There is an example below of compiler output
from a video generator I wrote in C (assembler comments added for this
page). The code line is in a function with x the first (char) parameter
and y the second.
;C comment: int i = (x >> 3) + (int)y * bytes_per_line ; ldi r24, 0x14 ; bytes_per_line=20=0x14 mul r22, r24 ; since y was the second (char) parameter of a function call, it is in r22 movw r26, r0 ; 2-byte move to get product at r27:r26 eor r1, r1 ; MUST clear r1 after a mult mov r24, r30 ; The compiler had moved the first parameter from r24 to r30 lsr r24 ; 3x lsr for the >>3 lsr r24 lsr r24 add r26, r24 ; do the add adc r27, r1 ; and add the carry to the high byte using r1=0 register
The lss
file can tell you other stuff also. If you search for __vectors
,
you will get the interrupt service routine entry points. You will see
by following the undefined interrupt vectors, that GCC defaults to
resetting the MCU for any undefined interrupt. The zero entry point is
the RESET
vector where program execution starts. Searching
for that address will lead you to the MCU and C initialization code. The
first few lines of the reset code are shown below with my comments
added:
eor r1, r1 ; clear r1 (C assumes r1 equals zero) out 0x3f, r1 ; zero the SREG which is i/o register 0x3f ldi r28, 0xFF ; load the low byte of the top-of-memory address ldi r29, 0x40 ; load the top byte of the top-of-memory address out 0x3e, r29 ; store the top byte in top byte of stack pointer (i/o register 0x3e) out 0x3d, r28 ; store the low byte in low byte of stack pointer (i/o register 0x3d)
The next few lines shown in the lss
file clear memory and set up the C environment, then jump to main
. The map
file in the same folder will show you where the variables are stored in RAM.
Syntax and registers
Global variables defined in C are available to the assembler. For a global variable defined in C as volatile char vname;
Using the declared C variable name in a load/store command like those
below loads/stores the value of the variable into the register.
lds r18,vname
sts vname, r18
Integer variables declared in C as
int vname;
may be loaded using:
lds r18, vname ; lower byte
lds r19, vname+1 ; high byte
An array can be indexed by loading the base address into a
register pair r27:r26 (called the X register), or r31:r30 (called the Z
register) and adding the index. In the following case, the max index was
less than 256, so the addition was extended to 16-bits by adding zero
(with carry) to the high byte of the pointer.
The C declaration is volatile unsigned char samples[255]
.
ldi r30, lo8(samples) ; use ldi for a pointer
ldi r31, hi8(samples)
lds r18, index
add r30, r18
adc r31, r1 ; compiler enforces r1 = 0 ; curent array point now in Z
Doing a ld r18,Z
loads the data value from the stored array. Using ld r18,Z+
autoincrements the address after loading.
Parameters can be passed to functions and returned. The
first parameter is passed in r25:r24, the second in r23:r22 down to r8.
All arguments are aligned to start in even-numbered registers
(odd-sized arguments, including char, have one free register above
them). Return values: 8-bit in r24 (not r25!), 16-bit in r25:r24, up to
32 bits in r22-r25, up to 64 bits in r18-r25. The following example is
from the fixed point project descipbed in the Examples section below.
The first bit of code is from the calling program where you can see how
the input and output parameters are loaded/stored.
; prod = multfix(fix1, fix2) ;
lds r22, 0x0252 ; get low byte fix2
lds r23, 0x0253 ; get high byte fix2
lds r24, 0x024C ; get low byte fix1
lds r25, 0x024D ; get high byte fix1
call 0x19cc ; 0x19cc call the external mult routine
sts 0x024B, r25 ; store high byte prod
sts 0x024A, r24 ; store low byte prod
The second bit of code is the multfix routine called by the above assembler code. It is just the contents of the multfix.S
file with the global symbol resolved into an address. The code shows
how the parameters are used to form the fixed point product by
multiplying all 4 pairs of high and low bytes.
000019cc <multfix>: ; the resolved global symbol
;input parameters are in r23:r22(hi:lo) and r25:r24
;b aready in right place -- 2nd parameter is in r22:23
;load a -- first parameter is in r24:25 need to move it to make room for output
movw r20, r24 ; open up result return registers (notice movw is moveword)
muls r23, r21 ; (signed)ahigh * (signed)bhigh
mov r25, r0 ; only need low byte of high multiply
mul r22, r20 ; alow * blow
mov r24, r1 ; only need high byte of low mult
mulsu r23, r20 ; (signed)ahi * blo
add r24, r0 ; add product to result
adc r25, r1
mulsu r21, r22 ; (signed)bhi * alo
add r24, r0 ; add product to result
adc r25, r1
clr r1 ; required by GCC
;return values are in 25:r24 (hi:lo)
ret
Registers r18-r27 and r30-r31 can be used in a function
without saving. The compiler saves r18-r27 and r30-r31 when you enter a
function, so these can be used any way you want. Registers r2-r17 must
be saved by you. In inline code with register contraints, the compiler
will attempt to optimize your register use, but the directives are
bewildering. See the fixed-point multiply for an example macro. If you
use inline assembler code without constraints, you must save registers
that you use. Register r0 is considered a temporary register. It will be
changed by any C code (except interrupt handlers which save it), and
may be used to store a byte within one piece of assembler code.
Register r1 is assumed to be always zero in any C code. It may be used
to store a byte within one piece of assembler code, but must then be cleared after use (clr r1). This includes any use of the multiply [f]mul[s[u]]
instructions,
which return their result in r1:r0. Interrupt handlers save and clear
r1 on entry, and restore r1 on exit (in case it was non-zero). This
paragraph taken largely from the nongnu docs.
Registers r2 to r7 can be locked to global variable names. The syntax to bind variable var_name
to register 3 is:
register unsigned char var_name asm("r3");
Be very careful using this feature. It is easy to generate register
conflicts and you may restrict the optimization that the compiler can
perform. See also nongnu docs.
Examples
- External assembler function and inline assembler ISRThis example program (needs uart.c and uart.h)
) generates a 1 millsecond time base using timer0 in compare-with-clear
mode.
It also uses the timer0 interrupt service routine (ISR) to
update three virtual timers. In this program the three virtual timers
are used to schedule three different C functions. The circuit for this
code is shown in Fall 2012 lab 1. The example was converted partly to assembler to show how to code a function as an external file and how to code a naked assembler ISR. The
NAKED
flag
tells the compiler not to generate any of the usual state-saving code
when entering the ISR. The function Task1 in the the C code was
converted to a void assembler function. Note that the entry point of the
function is declared global
in the assembler file. The ISR
was converted to inline assember. In an ISR, you must save SREG before
you do anything which might change it. The four instructions in the
following code store SREG (i/o register 0x3f) and open up two registers
to use in the ISR.
"push r18 ; save state\n\t"
"in r18, 0x3f ;save SREG\n\t"
"push r18 \n\t"
"push r19 \n\t"
Obviously, you must pop the stack before exiting, and you must exit an ISR using reti
.
"pop r19\n\t"
"pop r18\n\t"
"out 0x3f, r18\n\t"
"pop r18\n\t"
"reti\n\t"
- External fixed point multiply assembler function
This program and multiply routine contain multiply, divide, and square root for 8:8 fixed point numbers. Detailed descriiptions are here.
The assembler multiply routine takes two integer parameters and returns
an integer value. Note that the compiler assumes that r1 is ALWAYS
zero. Since multiply commands modifiy r1, it must be cleared before
exit.
- Assembler Macro fixed point multiply
This C program
contains multiply, divide, and square root for 8:8 numbers and an
assembler multiply macro. Detailed descriiptions of the arithmetic are here. The suffix zero items (A0, B0) refer to the lower byte (A0) and high byte of parameter zero, which is the output
of the macro. Similarly items A1 and B1 are bytes of parameter one,
which is the first input, and items A2 and B2 are bytes of parameter
two, which is the second input. The constraint : "=&d" (prod)
inidcates to the compiler that whatever register it decides to use should be output only. The : "a" (val1), "a" (val2)
constraint limites the compiler to use only simple upper registers r16 to r23 for the input values. There is much more in the GNU docs.
// Fast fixed point multiply
#define multfix(a,b) \
({ \
int prod, val1=a, val2=b ; \
__asm__ __volatile__ ( \
"muls %B1, %B2 \n\t" \
"mov %B0, r0 \n\t" \
"mul %A1, %A2\n\t" \
"mov %A0, r1 \n\t" \
"mulsu %B1, %A2 \n\t" \
"add %A0, r0 \n\t" \
"adc %B0, r1 \n\t" \
"mulsu %B2, %A1 \n\t" \
"add %A0, r0 \n\t" \
"adc %B0, r1 \n\t" \
"clr r1 \n\t" \
: "=&d" (prod) \
: "a" (val1), "a" (val2) \
); \
prod; \
})
- Saving an ADC conversion in an array and starting a new conversion.
Before this code is executed, the Z register is set up to point to the current array position for storing the ADC value.
v_index
(which is a char) holds the current index for range checking.
; check for v_index>160 ; the size of the array
lds r18, v_index ; current array index
cpi r18, 160 ; array size
brsh no_sample ; we reached the end of the array
; if not at array end, update sample count v_index
inc r18
sts v_index, r18 ; and store the incremented index
;
; get ADCH and store it
lds r18, 0x0079 ; get the ADCH i/o register
st Z, r18 ; Z has the array pointer
; increment array pointer
adiw r30, 1 ; r31:r30 is Z register
; start new ADC conversion
no_sample:
ldi r18, 0xc3 ; 0xc3value = ADEN, start ADC, prescalar 3
sts 0x007A, r18 ; 0x7a is ADCSRA
- Timer ISR-driven periodic sampling and R2R parallel DAC output
This program uses an R2R DAC connected to PORTB to produce an analog output (see ADC page)
of an ADC input. Connect an audio source to input A.0. Timer 1 clocks
the ADC input to run at 62,500 samples/sec and updates the DAC at that
rate. You can loop the audio source through an ADC input to the DAC
output and then to speakers. Low pass the DAC output at about 100,000
radians/sec. A minor variation on the program converts to ISR to assembler. See also ADC page for circuit details.
- Floating point multiply. Some much longer assembler functions.
See also
http://www.nongnu.org/avr-libc/user-manual/assembler.html
http://www.nongnu.org/avr-libc/user-manual/inline_asm.html
http://www.nongnu.org/avr-libc/user-manual/FAQ.html#faq_reg_usage
http://www.nongnu.org/avr-libc/user-manual/assembler.html#ass_pseudoops
http://www.nongnu.org/avr-libc/user-manual/FAQ.html#faq_asmconst