Nakano, K., et.al.(refs below) describe two versions of a small stack
machine suitable for implementation on an FPGA and they give the Verilog
source code on their web site.
The design was ported to the DE2 board and extended to have a richer
set of opcodes and i/o ports. I wrote a simple assembler and compiler
for the architecture and implemented serial communication routines. The
compiler supports inline macros, functions, one dimensional array
variables, and the usual if-then-else-endif and while-do-endwhile
structured programming. Supplied functions allow you to send and receive
integers to the serial interface and to send points, strings and integers to
the VGA.
A simple compiler, named Syrup, was written in matlab (also
runs in Octave) to make programming easier.
A source code is assembled into a Altera mif file, which can be read by the
ram block. The link between the mif file and the Pancake cpu ram block
is implemented with a synthesis directive in the same statement as the
memory declaration and before the semicolon which terminates the declaration.
reg [DWIDTH-1:0] mem [WORDS-1:0] /* synthesis ram_init_file = " test_stack_machine.mif" */ ;
Three processors were hooked up to SRAM to control the VGA. A
hardware SRAM memory multiplexer was built to give priority to reset,
then to the VGA controller, then each of the three cpus. The source code
has to signal that it wants SRAM access, then wait for SRAM available,
then read/write and then signal completion. SRAM access is interleaved
between the VGA controller and the three cpus. The VGA controller gets
access on every VGA clock high, while the cpus share every VGA clock
low. This works because memory is being clocked twice as fast as the VGA
clock. On every VGA clock high, an address is set up based on the VGA
address generator. On the VGA clock low, the SRAM data for the VGA is
buffered into a register, while the address for the cpu read/write is
set up. On the next VGA clock high, the SRAM data is buffered into a
register for each cpu, while the next VGA controller read is set up.
A ROM character generator for VGA was built, based on the data from ECE 320 at BYU. The file from BYU is here, and the matlab program to convert it to an Altera mif file is here, and the mif file is here. The ascii character code is multiplied by 16 to from the base index for a character. The data at the base index location is the top byte (of 16) of the character image. The high order bit of the byte is the left-most pixel of the top line of the character. The ROM was connected to i/o ports on the stack processor, cpu 1, where a small routine reads the ROM and outputs colors to the VGA SRAM interface.
The SRAM interface to the VGA display actually has over 100,000 unused bytes which are not displayed, but the unused memory is in small chunks. The biggest piece of available memory is from address 246,400 to 262,144, or about 16 kbytes. These unused locations can be used to share non-graphics data between processors. We need 16-bit read/write functions and a mutex to lock memory. The SRAM switch used in the graphics functions above was extended with new functions to allow 16-bits to be written (the graphics interface writes only single bytes). The mutex is implemented using hardware test-and-set, clear, and read instructions. The hardware switch prioritizes memory access first, then mutex operations. On the processor side, the program must: (1) set up an sram read, write, or mutex operation, (2) assert a request, (3) wait for access achnowledgment, (4) do the read/write (5) de-assert request.
References:
Nakano, K.; Ito, Y., Processor, Assembler, and Compiler Design Education Using an FPGA, Parallel and Distributed Systems, 2008. ICPADS '08. 14th IEEE International Conference on; 8-10 Dec. 2008 pages: 723 - 728 (Nakano, K.; Ito, Y.; Dept. of Inf. Eng., Hiroshima Univ., Higashi-Hiroshima, Japan)
Nakano, K.; Kawakami, K.; Shigemoto, K.; Kamada, Y.; Ito, Y. A Tiny Processing System for Education and Small Embedded Systems on the FPGAs, Embedded and Ubiquitous Computing, 2008. EUC '08. IEEE/IFIP International Conference, Dec. 2008 pages: 472 - 479
John S. Loomis, Digital Labs using the Altera DE2 Board, http://www.johnloomis.org/digitallab/, Electrical and Computer Engineering, University of Dayton, Dayton, OH 45469-0232