-
fig-FORTH 1.0
12/17/2014 at 12:57 • 0 commentsfig-FORTH 1.0 now running on the Chameleon. The source for this interpreter was provided by user enso of 6502.org.
All that is required to get fig-FORTH 1.0 to run is to edit the configuration data at the beginning. In particular, only two variables need to be modified to adapt fig-FORTH to a target such as the Chameleon: ORIG and MEM. The remainder of the variables that define the fig-FORTH environment are offsets from ORIG and MEM, or have default settings compatible with most 6502/65C02 implementations.
In addition to modifying ORIG and MEM, it is necessary to provide three serial I/O routines for use by fig-FORTH: cin (getch), cout (putch), and crout (putcrlf). fig-FORTH assumes that 7-bit ASCII is the terminal default. Since fig-FORTH sets the msb of the last character in a FORTH word's name field, I modified the fig-FORTH routine that uses cout to mask off the msb of the output character. This allows the use of the VLIST word to see the dictionary without extended ASCII characters at the end of every name.
The following is a small capture of a fig-FORTH 1.0 session on the Chameleon:
0020 - 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00 >0400G PC=3A1D A=2A X=B4 Y=FF S=FF P=73 (NVRBDIZC)=01110011 >D000G fig-FORTH 1.0a OK OK : star 42 emit ; OK : stars 0 do star loop ; OK : margin cr 30 spaces ; OK : blip margin star ; OK : bar margin 5 stars ; OK : f bar blip bar blip blip cr ; OK f ***** * ***** * * OK
First, Klaus Dormann's 6502 functional test program is run, i.e. >400G. It completes successfully, and then fig-FORTH is booted, i.e. >D000G. After a couple of simple null line entries, a simple FORTH program (from Leo Brodie's "Starting FORTH") is entered and executed.With fig-FORTH in place and working, it can be further modified in two ways for the Chameleon: (1) disk support using the serial Flash can be added, and (2) the primitives can be improved using 65C02 instructions and M65C02A FORTH instructions. The first task will be to incorporate support for the fig-FORTH file system. The second task will be to incorporate the 5 planned FORTH-specific instructions into the M65C02A and update the fig-FORTH core to use those instructions.
-
M65C02A Instruction Set Enhancements
08/16/2014 at 16:04 • 0 commentsThis log describes the additions to the 65C02 instruction set added by the M65C02A core. As indicated in the previous project log, Checkout, the M65C02A core is a significant enhancement to the microarchitecture of the M65C02 core. The changes in the core were primarily focused on two objectives: (1) reducing the size of the core, and (2) improving the speed of the core. The M65C02A core meets both of these objectives.
The reduction in the size of the M65C02A core makes the core smaller than some cores which only support the base 6502 instruction set. That is, the resource requirements (LUTs/Slices/FFs) of the M65C02A (481/358/125) core are less than those of cores such as M65C02 (755/464/225), verilog-6502 (510/337/195), and ag_6502 (978/512/93). Note that each core must be evaluated separately in any particular application. When a core is synthesized in isolation, simplistic resource comparisons will not provide an accurate view of a core's resource requirements in a specific application. In other words, many internal and external factors can affect the final resource utilization values reported for any particular core. Thus, when considering the M65C02A/M65C02 cores, their use of 2 BRAMs to hold their microprograms must be seriously considered; BRAMs are a very limited resource in most FPGAs.
The M65C02A address generator has been significantly enhanced, and several recent additions to the data path logic, allow the M65C02A to support several addressing modes found only in the 16-bit enhanced version of the WDC 65C02, the W65C816 microprocessor. Specifically, the M65C02A supports the two stack relative addressing modes of the '816: sp,S and (sp,S),Y. The M65C02A core, therefore, currently supports the 8-bit mode versions of the ORA/AND/EOR/ADC/LDA/STA/CMP/SBC sp,S instructions and the ORA/AND/EOR/ADC/LDA/STA/CMP/SBC (sp,S),Y instructions.
In addition, the M65C02A supports the three '816 instructions which push 16-bit addresses/constants onto the processor stack: (1) PEA abs, (2) PEI zp, and PER rel16. In actual practice, the operand of the '816 PEA instruction is a 16-bit immediate value. That is, a 16-bit constant is pushed onto the processor stack, and a standard assembler will accept either a 16-bit absolute address or a 16-bit immediate value. The PEI instruction pushes the 16-bit value stored in locations {zp+1, zp}, which like the PEA instruction operand, may be considered to be an address or a constant. The PER instruction pushes the absolute address of the sum of the rel16 parameter and the program counter (PC) value of the next instruction. In the '816, the PER instruction is used to support position independent, PC-relative subroutines and data. With the PER instruction, the '816 uses the instruction sequence of a PER rel16 instruction followed by an BRA rel16 instruction to implement a position independent branch to subroutine (BSR).
The M65C02A implements these three '816-specific instructions, but renames them to PHW #imm16, PHW zp, and PHR rel16, respectively. This is done in order to better convey the actual implementation of the instructions, and to allow the incorporation of an additional M65C02A-only instruction: PHW abs. This new M65C02A-only instruction behaves like the PHW zp (PEI zp) instruction except that it uses the absolute addressing mode rather than the zero page direct addressing mode. To complement these 16-bit push operations, the 16-bit M65C02A-only pull instructions PLW zp and PLW abs are included in the instruction set. These instructions pull a 16-bit value from the processor stack and store it in location specified by the addressing mode.
The M65C02A also implements the '816 BRA rel16 instruction to support PC-relative, position independent software/firmware. In addition, the M65C02A-only BSR rel16 instruction provides the capability to access PC-relative subroutines directly instead of having to use the PER rel16; BRA rel16 instruction sequence. The M65C02A core also provides the means to use 16-bit values pushed onto the processor stack as software/firmware addresses by providing the JMP (sp,S),Y instruction.
The M65C02A also implements the '816 COProcessor instruction. Like the '816 COP instruction, the M65C02A COP instruction is implemented as a processor trap like the standard BRK instruction. Unlike the 6502/65C02 BRK and the '816 COP, the M65C02A COP loads the X register with the "signature" byte that follows the instruction. The M65C02A implementation of the COP instruction can be easily gleaned from the definition: COP #imm. The 8-bit immediate operand is loaded into the X register, and then a trap is taken through a vector defined in high memory.
Beyond these instructions, work is continuing on the M65C02A core to support several other enhancements: (1) escape codes, (2)using Y as a stack pointer in zero page, (3) supporting Kernel (default) and User operating modes, and (4) virtual memory support. The first of these enhancements, instruction escape codes, is expected to enable significant enhancements to the instruction set.
The first escape code, IND, converts direct addressing modes such as zero page direct or absolute into zero page indirect or absolute indirect when it precedes an instruction using either of these two direct addressing modes. A potential application of this escape code is to allow the Rockwell bit-oriented instructions, RMBx/SMBx zp and BBRx/BBSx zp,rel, to use zero page indirect:RMBx/SMBx (zp) and BBRx/BBSx (zp),rel. This enhancement would potentially extend the applicability of these 32 instructions to the entire 64kB address space of the M65C02A core. A similar application could be the application of the IND escape code to the TRB/TSB/BIT zp and TRB/TSB/BIT abs instructions: TRB/TSB/BIT (zp) and TRB/TSB/BIT (abs).
IND has been implemented for all addressing modes. It may be applied to all zero page direct and direct absolute addressing modes. IND cannot be applied to instructions which provide both zero page direct and zero page indirect addressing modes. IND also converts indexed zero page direct addressing modes to indirect addressing modes which are consistent with the 6502/65C02 instruction set architecture. That is, pre-indexed zero page direct, zp,X, is converted to pre-indexed zero page indirect, (zp,X); post-indexed zero page direct, zp,Y, is converted to post-indexed zero page indirect, (zp),Y. A similar conversion applies to the indexed absolute direct addressing modes.
Three other escape/prefix codes have been implemented: OAX, OAY, and OSY. The OAX and OAY escape/prefix codes are mutually exclusive, but may be combined with IND (and SIZ, discussed below). OAX and OAY exchange A with X, or A with Y, respectively. As such, the X and Y index registers can be used as accumulators, and A becomes an index register. The OSY prefix/escape code makes Y function as a stack pointer in zero page. Unlike OAX and OAY, OSY does not perform a complete exchange of the registers and their functions. Thus, S does not replace Y as an index register. OSY is mutually exclusive with OAY, but may be combined with OAX.
These three prefix/escape codes have been implemented and significantly enhance the basic instruction set of the M65C02A. When combined with IND and the stack relative addressing modes, these prefix/escape codes can be used to implement Forth VMs and other HLLs much more efficiently. Being able to treat X and Y as accumulators makes it much easier to implement HLL addressing of data structures. Furthermore, although S does not become an index register when OSY is applied to an instruction, it does allow the use of all Y-specific instructions with S. Thus, although OSY does not give S the functionality of an accumulator, it does convert the STY/LDY/CPY/INY/DEY/PHY/PLY/TYA/TAY instructions to use S instead of Y: STS/LDS/CPS/INS/DES/PHS/PLS/TSA/TAS.
Not yet implemented are the SIZ and ISZ prefix/escape codes. The SIZ prefix code will extend an operation from 8 bits to 16 bits. The ISZ prefix provides a single opcode for combining IND and SIZ.
To maintain compatibility with standard 6502/65C02 assemblers and compilers, the M65C02A prefix/escape codes have a finite lifetime: 1 instruction cycle. This is in contrast to the '816 whose m and x bits in the native mode P register maintain the accumulator and X/Y index register sizes as programmed until explicitly changed by the programmer. The M65C02A prefix codes are taken from the unused opcodes of the W65C02S, and therefore operate as single cycle NOPs when applied to instructions that don't support the requested operations. One important characteristic is that the prefix/escape codes are not interruptable.
A number of additional instructions are planned for the remaining 12 free opcodes. There are five instructions defined to support implementation of Indirect and Direct Threaded Code (ITC/DTC) Forth VMs: (1) ENT - Enter, (2) NXT - Next, (3) PHI - Push IP, (4) PLI - Pull IP, and (5) INI - Increment IP. NXT provides direct support for a DTC Forth VM, and IND NXT provides support for an ITC Forth VM.
Three other opcodes are reserved for implementing and manipulating three level internal register stacks for A, X and/or Y: (1) DUP, (2) SWP, and (3) ROT. Off special note is that LDA/STA (and LDX/STX/ LDY/STY) do not automatically push and pop these internal register stacks. This approach keeps the behavior of these registers consistent with what a programmer may expect from a standard 6502/65C02. Therefore, a DUP instruction must precede LDA/LDX/LDY instruction in order to push a value onto the register stack. Similarly, a ROT instruction must follow STA/STX/STY instruction in order to pop a value from the register stack. Otherwise, LDx and STx only affect the top location of the register stack.
These register stacks are intended to provide some additional internal registers in a manner compatible with the 6502/65C02 architecture. In accumulator based architectures such as those of the 6502/65C02, 6800/68HC11, and 8080/8085/Z80 microprocessors, the accumulator becomes a choke point and limits performance. With only X and Y on board, the 6502/65C02 requires more loading and storing of intermediate values in external memory when compared to the 8080/8085/Z80 processors. Thus, equipping the M65C02A A, X, and/or Y registers with a simple three level stack will allow stack-based processing techniques to be used for arithmetic and pointer calculations without moving intermediate values to/from external memory. Finally, if not explicitly used, the register stacks will be invisible to the programmer.
The remaining four free opcodes are reserved for future use.
-
Check out
04/19/2014 at 20:28 • 0 comments19-Apr-14
The Chameleon was initially released after the M65C02 project was ported to the board. Simulation of the modified RTL files had not been performed at that time, but a custom FPGA build that toggled all of the output lines was running. The output signals were toggling as expected.
Since that time, took the M16C5x project, a PIC16C5x-compatible SoC implementation previously released, and ported it to the Chameleon. This process simply entailed removing the Xilinx clock generator and connecting the internal processor clock net to the external 48 MHz oscillator on the Chameleon board. Also attached the external 29.4912 MHz oscillator as the M16C5x UART clock.
With these changes, and the UCF updated to match the pinout of the Chameleon board, the M16C5x SoC project was used to successfully transfer an ASCII file to the Chameleon using a baud rate of 921.6 kbps. (The Quad USB-Serial adapter from Keyspan only supports operation to that speed on a Xindows XP laptop. The UARTs are capable of being operated (in the RS-485 mode) at rates in excess of 3 Mbps with a 48 MHz UART clock.)
In moving the Chameleon to an Arduino MEGA2560 compatible prototyping card, found an error in the symbol for the Arduino UNO power connector. Updated the schematic and PCB, and posted new PDFs of these files in the project's github repository. Took the opportunity to change the part numbers given for the Arduino stacking connectors. The previously specified stacking connectors resulted in physical interference between the top of the UNO's USB Type B connector and the through hole pins of the serial port's connector. The new stacking connectors specify a connector pin length that should eliminate this interference. (Using a paper business card as an insulating separator is also an option.) Finally, also determined that the part number of the SRAM specified in the BOM was incorrect. The part previously specified is a +5V-only SRAM, and the Chameleon needs +3.3V devices for it serial and parallel memories. The schematics and BOM were updated with the part number of a +3.3V SRAM, and an alternate part number was also added.
20-Apr-14
Converted the M16C5x test program, M16C5x_Tst4.asm, to C using the CCS tool suite. The result is that the C code only requires an additional 10 program memory words, and one of those words is added by CCS to handle reset vectors at the top end or at address 0x000, and the other, SLEEP, is added at the end of each user's program. The additional 8 words are a result of using a temporary register to perform the arithmetic needed to isolate upper or lower case ASCII from the data stream. The four comparisons load a temporary register and use that register to perform the subtraction. This approach results in 8 more operations in the C version of the test program. Otherwise, the C version is almost a one-to-one map to the assembler version of the test program.
Updated the github repository with the new files. Being able to use CCS C to support the high speed SPI master and UARTs of the Chameleon's M16C5x soft-core should prove beneficial. Can now see a reason to add support for interrupts and additional stack space to the P16C5x soft-core processor used in the M16C5x SoC project.
25-Apr-14
Changed the test program to support both SSP UARTs previously incorporated into the FPGA project. Found a minor bug in the reset logic of the SSP Slave module which prevented the first bit from the SPI Master from being captured. This prevented the address of the second UART's registers from being used during that transfer cycle. Fixed the issue and updated the github repository with the modified SSPx_Slv module. The fix allows the simultaneous processing of both UARTs at 921.6 kbaud.
Next will be the check out of the external SPI Serial Flash and Serial MRAM/FRAM devices. The FPGA boots from the SPI Serial Flash, so it's not in question whether the interface works or not. Instead, it's a matter of determining if the SPI Master (SPIxIF) module is properly coded and integrated to support the external SPI devices while also accessing the internal UARTs.
25-Jul-14
Completed the verification of the M65C02A core. Using Klaus Dormann's 6502_Functional_Test program, the last few bugs were tracked down and eliminated from the new M65C02A soft-core. The M65C02A core is an improved version of the M65C02 microarchitecture. The microprogram is expanded from 64 to 72 bits. Several control fields which were previously implemented in an encoded manner have been replaced by unencoded fields. These changes allow a reduction in the number of levels in the control logic, which results in increased speed. The increased speed provides single cycle operation.
Thus, the 4 cycle memory interface of the M65C02 core has been replaced by a single cycle interface. The result is a 65C02-compatible core which easily operates at 30 MHz in the -4 speed grade Spartan 3A FPGA of the Chameleon board. (Speeds exceeding 40 MHz have been reported by the Xilinx tools when the M65C02A core is re-targeted to a -3 Spartan 6 XC6SLX9 FPGA.)
In the XC3S200A-4VQG100I versions of the M65C02/M16C5x Development Board or the Chameleon Board, the M65C02A core is currently implemented using the internal Block RAMs. There is a total of 28kB of available program/data memory for the core. The remaining 4kB of BRAM is used to implement the microprogram.
The M65C02A microarchitecture supports the implementation of more complex microroutines. The fixed microprogram ROM can now be controlled directly by the microprogram in the variable microprogram ROM. This is expected to allow the M65C02A to support virtual machines such as a FORTH VM by implementing some FORTH primitives in the microprogram. As it sits currently, the M65C02A microarchitecture supports the incorporation of stack relative instructions as found on the W65C816 microprocessor.