Dr. Dobb’s Journal, December 1986, page 48.
A table-driven assembler that can be modified for other processors.
Series 32000 Cross-Assembler
by Richard Rodman, 1923 Anderson Rd., Falls Church, VA 22043
The 32000 processor features generalized addressing modes available in almost all instructions.
The National Semiconductor series 32000 microprocessor line includes the 32-bit 32032 and the 16-bit 32016 (formerly called 16032) microprocessors. As part of a project to build a board using a 32032, I wrote an assembler in Software Toolworks’ C/80; adaptation to any other variant of C should be easy.
Although most people lump the 68000 and the 32016 together, these processors are radically different. The differences have been summed up as "the 68000 is PDP-11-like, whereas the 32000 is VAX-like." The 32000 includes bit-field, translate, procedure enter/return, and other high-level instructions in its instruction set.
Basic Program Design
This program works in a brute-force fashion, but it is easy to understand, modify, and debug. Each instruction’s binary equivalent is stored in a string, with x
s where operands need to be inserted. A string matcher, match()
, matches the opcodes against lines in the source file, keeping matches to wildcards in the buffer ambig_buffer
. Each opcode has an option character, opopt
, associated with it that controls special-case logic for some instructions. The data is output in Intel absolute hex format. Table 1, page 49, shows the definitions for the opopt
characters and the instruction table format. Table 2, page 49, shows some examples of instruction formats defined using the structure in Table 1.
The 32000 processor, although allowing absolute addressing, features generalized addressing modes available in almost all instructions. Two’s-complement offsets can be used in three different sizes — 7, 14, or 30 bits long — as needed. Because these offsets could refer to areas not yet defined, and the length of the code varies with the offset, three passes are necessary. The first pass gets a coarse value of all symbols, the second pass then makes the variable offsets the right length and corrects the symbol values, and the third pass actually generates the code. After the first pass, the symbol table is sorted; then in the second and third passes, a binary search is used to find entries more quickly.
Assembler Syntax
- Symbols — This assembler limits line length to 128 characters; symbols can be up to a whole line long. Labels must be followed by a colon and can not be reused. The colon must be omitted on equates. Values assigned with
equ
can be redefined, however. - Pseudo-ops —
org
must be followed by a value. Although the 32000 does not require word alignment of code or data, it does make some operations faster, so an even pseudo-op is provided to force the code address to an even boundary.Define byte, word, double (
db
,dw
,dd
) must be only one value per line. Currently, character-string constants are not supported.Numeric constants must begin with a digit. Default radix is decimal, or the value can be followed with an
h
,q
, orb
for hexadecimal, octal, or binary, respectively. The code address is known as as"."
, and the assembly address (which may be different) as".."
. - Opcodes — All 32000 opcodes are supported. The assembly instructions must conform to the NS16000 Instruction Set Reference Manual — for example, arguments to the SAVE instruction must be enclosed in square brackets. You can include multiple instructions on a line as long as all operands to each instruction are provided.
- Comments — Comments begin with a semicolon (
;
) and continue to the end of the line. Some programmers have the bad habit of omitting the beginning*
or;
. That won't work here. - Assembly-time arithmetic — Only
"+"
,"-"
, and "/
"are supported. A look at the listing shows it would be trivial to add more operators, however. Formulas are allowed anywhere a value is required, but they must be enclosed in parentheses. Within parentheses, values must be separated from operators with spaces. This is because the program uses the spaces to tell where words end, and math operators are considered words. Spaces are not needed between the parentheses and the words enclosed. Note the spaces around+
and/
. An example will best illustrate:(FEN + 1) + (GUG / 3))
Commas also separate words; in fact, commas and spaces are interchangeable, although human readers may consider commas out of place in some instances.
- Listing — The assembler produces a listing on the final pass. This listing is sent to the screen but can be redirected into a file or to the printer. It is a traditional listing, with address, bytes of code or data, and opcodes and comments on the right.
Table 3, below, shows the error messages produced by the assembler.
Future Enhancements
Unless I get some 32000 hardware to play with, it’s unlikely I'll work on this program further. If you'd like to work on it, however, some items on your list should be:
- Multivalue
db/dw/dd
and character-string constants. - Global/external object format and linker.
The 32000 instructions are already relocatable; any absolute values that would be present would presumably be entry points or I/O addresses. In fact, even the global/external isn’t really necessary because of the
cxp/rxp
instructions. - Cseg/dseg pseudo-ops.
If you send your changes to me, I'll be happy to make them available to others. Anyone wanting a copy of the source code may send me $8 for materials and effort Please specify 8-inch CP/M, 5¼-inch PC, or other (inquire) or 3½-inch Atari ST.
For those lucky people who are in a position to make use of this program, why not let readers know what you're doing? Is the 32000 really the programmer's dream some say it is? And for those who are in a position to do so, how about some inexpensive 32000 hardware — a singleboard computer perhaps — so people can get a hands-on feel for what the processor can do?
Even if you don't have a 32000 processor to play with, you may be able to make use of routines from this program. The style exemplifies my belief that C should be written to be readable both by computers and by humans. Cryptic C is bad C.
DDJ
Vote for your favorite feature/article. Circle Reader Service No. 5.
The listing for this article is presented in a machine-readable form — Soft-strips produced by Cauzin Systems. The strips begin on page 83. The text of the listing is available for downloading in the DDJ Electronic Edition on CompuServe. A disk with this listing and others is also available — see the ad on page 115. The text of the listing will be published next month.
#define MAXOP 149 /* The binary value should be a string of bits e.g. 0111xxxxx00b The opcode opopt character is used to specify special operands, etc. */ /* opopts used here for the 32000 are: blank nothing special a gen b gen short c gen gen d 00000 short e gen gen reg f reglist save/enter h reglist restore/exit h 00000 gen (sfsr) i inss/exts j movs/skps/cmps k setcfg l procreg, gen for lpr/spr m index (operand order) n ret/rett — postbyte o movm p exp (disp after instruction) */ struct { char *onam; /* opcode name */ _ int oent; /* operand count, negative if PC-relative */ char obin; /* opcode binary value */ char oopt; /* opcode opopt char */ }
Table 1: Definitions of opopt characters
"bsr", -1, "02h", "save", 1, "62h", "svc", 0 "0e2h", "bne", -1, "1ah", "addq?", 2, "xxxxxxxxx00011iib", "sgt?", 1, "xxxxx011001111iib", "jump" 1, "xxxxx01001111111b", "jsr", 1, "xxxxx11001111111b", "addl", 2, "xxxxxxxxxxxxx00000010111110b", "mulf", 2, "xxxxxxxxxx11000110111110b", "and?", 2, "xxxxxxxxxx1010iib", "not?", 2, "xxxxxxxxxx10011101001110b",
Table 2: Selected instruction formats from the opcode table
? unknown item — syntax error x unimplemented instruction (bad instruction database) l no length modifier (bad instruction database) or expression too complex e address extensions missing p illegal register/pr/spr [ brackets required v syntax error in value o unknown arithmetic operator u undefined symbol
Table 3: Error messages
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.