This project is an open source implementation of the Motorola MC68881 floating-point unit written in VHDL
To make the experience fit your profile, pick a username and tell us what interests you.
We found and based on your interests.
Screenshot 2026-02-13 165504.pngPortable Network Graphics (PNG) - 436.17 kB - 02/13/2026 at 17:07 |
|
|
Screenshot 2026-02-09 114005.pngVivado Simulation - I have implemented SIN/COS/TAN functions plus a huge refactor. I now have 75% LUT usage on an xca200t - down from 169%Portable Network Graphics (PNG) - 421.93 kB - 02/09/2026 at 11:40 |
|
|
The FPU now works with EMUTOS and SFP004 libraries. It produces an 8x performance improvement compared to the SOFT-FP code. This is using an unmodified set of utilities from 1991. I obtained the utilities from here: https://www.atari-forum.com/viewtopic.php?p=478222#p478222
The test board is an Alinx AXU3EG using musashi on the ARM PS communicating via AXI to the PL hardware FPGA FPU
The next stage is to build a hardware PCB using a 68sec000 for true validation. PCBWAY have kindly sponsored this hardware due to it's unique design. I will provide more details when my board arrives from them.
Just merged a major release. It achieves timing at a 50mhz with full 68882 feature set including 2 concurrent operations. It has been tested on a custom EMUTOS build on an ALINX ZynqMP build. More details to follow. The LITE 68040 subset is also now available at 50mhz.
MC68881/82 FPGA — CIR Coprocessor Interface Update
What's New
This update brings the FPGA's Coprocessor Interface Register (CIR) protocol in line with the real MC68881/82 hardware
encoding, fixes several bus interface bugs, and adds a comprehensive hardware diagnostic test.
It has also been tested in hardware using a custom EMUTOS build running existing FPU .prg tests. Next release will add SFP004 compatibility
MC68881 Native Opcode Encoding
The CIR command word now uses Motorola's native opcode encoding directly — the same encoding a real MC68020/030 CPU
sends over the coprocessor bus. Previously the FPGA used an internal numbering scheme that required a software
translation layer. This means:
- FMOVE = $00, FSQRT = $04, FSIN = $0E, FADD = $22, FMUL = $23, FDIV = $20, FSUB = $28 — exactly as documented in the
MC68881 User Manual
- The translation switch in the F-line handler has been eliminated
- Software that talks directly to the CIR registers (like Atari TT FPU test utilities) can use standard Motorola
opcodes without any remapping
CIR Bus Interface Fixes
- 32-bit operand transfers: Fixed a bug where long-word writes to the CIR operand register were split into two 16-bit
writes by the emulation layer, causing the FPU to receive zero instead of the actual value. Operand transfers now go
through as single 32-bit AXI transactions, matching real hardware behaviour.
- 68882 pending instruction pipeline: Fixed a race condition where back-to-back instructions could lose the second
operation if its OpWord arrived while the first was completing. The pending instruction flags are now only cleared
when actually consumed.
Hardware Diagnostic (cirtest)
New standalone 68K test program (cirtest.c) that exercises the full CIR dialog protocol from the Merlin2 monitor. Runs
12 tests covering:
- Integer load/store round-trips (positive and negative)
- Memory-to-register arithmetic: FADD, FMUL, FDIV, FSUB
- Register-to-register arithmetic: FADD FPn,FPm
- Unary operations: FSQRT, FNEG, FABS
- Double-precision transcendentals: FSIN(1.0), FSQRT(2.0)
All 12 tests pass on hardware. All 13 GHDL simulation testbenches pass.
Known Issue: SFP004 Peripheral Protocol
Disassembly of FPU_HARD.PRG (Quidnunc 1991, commonly used on Atari ST/SFP004 boards) revealed it uses a different
register protocol than the MC68020 CIR standard — it writes commands without an OpWord and expects different response
encodings. This will be addressed in the next update to support the SFP004 peripheral access pattern alongside the
standard CIR protocol.
Files Changed
19 files across VHDL RTL, testbenches, C firmware, and documentation. No changes to the ALU computation logic — only
the bus interface encoding and protocol handling.
Demo of fireworks using the FPU on xilinx hardware with 68000 processor in arm emulation
#include <stdio.h>
#include <math.h>
#include "../lib/merlin2_gfx.h"
#include "../lib/merlin2_rand.h"
/* 640x480 viewport centred on 1280x720 framebuffer */
#define VP_W 640
#define VP_H 480
#define VP_X 320
#define VP_Y 120
_Static_assert(VP_X + VP_W <= GFX_SCREEN_W, "Viewport exceeds screen width");
_Static_assert(VP_Y + VP_H <= GFX_SCREEN_H, "Viewport exceeds screen height");
/* Physics */
#define GRAVITY 0.06f
#define PARTICLES_PER_BURST 32
/* Rocket states */
#define ROCKET_DEAD 0
#define ROCKET_RISING 1
#define ROCKET_BURST 2
typedef struct {
float x, y;
float dy;
float target_y;
uint32_t colour;
int state;
} rocket_t;
typedef struct {
float x, y;
float old_x, old_y;
float vx, vy;
uint32_t colour;
int life;
int max_life;
} particle_t;
static rocket_t rocket;
static particle_t particles[PARTICLES_PER_BURST];
static const uint32_t palette[] = {
0xFFFF2020, /* red */
0xFFFFD700, /* gold */
0xFF20FF20, /* green */
0xFF4080FF, /* blue */
0xFFFFFFFF, /* white */
0xFFFF40FF, /* magenta */
};
#define PALETTE_SIZE (sizeof(palette) / sizeof(palette[0]))
static inline void put_pixel(int x, int y, uint32_t argb)
{
if ((unsigned)x < VP_W && (unsigned)y < VP_H)
*gfx_fb_ptr(x + VP_X, y + VP_Y) = argb;
}
static uint32_t dim_colour(uint32_t colour, int life, int max_life)
{
if (max_life <= 0 || life <= 0)
return 0xFF000000;
unsigned r = (colour >> 16) & 0xFF;
unsigned g = (colour >> 8) & 0xFF;
unsigned b = colour & 0xFF;
r = r * (unsigned)life / (unsigned)max_life;
g = g * (unsigned)life / (unsigned)max_life;
b = b * (unsigned)life / (unsigned)max_life;
return 0xFF000000 | (r << 16) | (g << 8) | b;
}
static void spawn_rocket(void)
{
rocket.x = (float)rand_range(120, VP_W - 120);
rocket.y = (float)(VP_H - 1);
rocket.dy = -4.0f - (float)rand_range(0, 15) * 0.1f;
rocket.target_y = (float)rand_range(80, VP_H / 3);
rocket.colour = palette[rand_range(0, (int)PALETTE_SIZE - 1)];
rocket.state = ROCKET_RISING;
}
static void burst_rocket(void)
{
float speed_base = 1.5f;
int i;
for (i = 0; i < PARTICLES_PER_BURST; i++) {
particle_t *p = &particles[i];
float angle = (float)i * (2.0f * (float)M_PI / (float)PARTICLES_PER_BURST);
float speed = speed_base + (float)rand_range(0, 10) * 0.1f;
p->x = rocket.x;
p->y = rocket.y;
p->old_x = rocket.x;
p->old_y = rocket.y;
p->vx = cosf(angle) * speed;
p->vy = sinf(angle) * speed;
p->colour = rocket.colour;
p->max_life = rand_range(30, 50);
p->life = p->max_life;
}
rocket.state = ROCKET_BURST;
}
static int update_and_draw(void)
{
int alive = 0;
int i;
/* Update rocket */
if (rocket.state == ROCKET_RISING) {
/* Erase old position */
put_pixel((int)rocket.x, (int)rocket.y + 1, 0xFF000000);
rocket.y += rocket.dy;
/* Draw rocket */
put_pixel((int)rocket.x, (int)rocket.y, 0xFFFFFFFF);
if (rocket.y <= rocket.target_y)
burst_rocket();
}
/* Update particles */
for (i = 0; i < PARTICLES_PER_BURST; i++) {
particle_t *p = &particles[i];
if (p->life <= 0)
continue;
/* Erase at old position */
put_pixel((int)p->old_x, (int)p->old_y, 0xFF000000);
/* Physics */
p->vy += GRAVITY;
p->old_x = p->x;
p->old_y = p->y;
p->x += p->vx;
p->y += p->vy;
p->life--;
/* Bounds check */
if ((int)p->x < 0 || (int)p->x >= VP_W ||
(int)p->y < 0 || (int)p->y >= VP_H) {
p->life = 0;
continue;
}
/* Draw at new position with fading colour */
uint32_t c = dim_colour(p->colour, p->life, p->max_life);
put_pixel((int)p->x, (int)p->y, c);
alive++;
}
return alive;
}
int main(void)
{
int i;
printf("Fireworks demo - press any key to exit\n");
rand_seed(gfx_get_time());
gfx_set_mode(1);
gfx_clear(0xFF000000);
rocket.state = ROCKET_DEAD;
for (i = 0; i < PARTICLES_PER_BURST; i++)
particles[i].life = 0;
spawn_rocket();
while (!gfx_char_ready()) {
int alive = update_and_draw();
/* Spawn next rocket when current burst dies */
if (rocket.state != ROCKET_RISING...
Read more »

Mandelbrot set generated using assembler on hardware.
*------------------------------------------------------------------------ * Mandelbrot Set Renderer * * Renders a 640x640 pixel Mandelbrot set using MC68881 FPU F-line * instructions. Centred on the 1280x720 display at offset (320, 40). * * View window: real [-2.0, +0.5], imag [-1.25, +1.25] * Scale: 2.5 / 640 = 0.00390625 * Max iters: 32 * Palette: 16 colours, cycled via (iter & 15) * * Usage: Load via S-record (L command), execute with G 2000. * * Assemble: * vasmm68k_mot -Fsrec -m68000 -m68881 -o mandelbrot.srec mandelbrot.s *------------------------------------------------------------------------ IMG_W EQU 640 IMG_H EQU 640 SCREEN_W EQU 1280 FB_BASE EQU $800000 OFF_X EQU 320 * horizontal offset on 1280-wide display OFF_Y EQU 40 * vertical offset on 720-high display MAX_ITER EQU 32 ROW_BYTES EQU SCREEN_W*4 * 5120 bytes per display row IMG_ROW EQU IMG_W*4 * 2560 bytes per image row ROW_SKIP EQU ROW_BYTES-IMG_ROW * 2560 bytes to skip between rows ORG $2000 *------------------------------------------------------------------------ * Entry point *------------------------------------------------------------------------ START LEA msgTitle,A1 MOVEQ #13,D0 TRAP #15 * Switch to graphics mode MOVEQ #17,D0 MOVEQ #1,D1 TRAP #15 * Clear to black MOVEQ #18,D0 MOVE.L #$FF000000,D1 TRAP #15 LEA msgRender,A1 MOVEQ #13,D0 TRAP #15 * Record start time MOVEQ #8,D0 TRAP #15 MOVE.L D1,START_TIME * Set up FP constants * FP7 = scale = 0.00390625 (2.5/640) FMOVE.S #$3B800000,FP7 * 0.00390625 * Compute framebuffer start address * FB_START = FB_BASE + OFF_Y * ROW_BYTES + OFF_X * 4 LEA FB_BASE,A3 ADDA.L #OFF_Y*ROW_BYTES+OFF_X*4,A3 * Outer loop: Y pixels (D5 = 0..639) CLR.W D5 * D5 = pixel_y YLOOP * ci = -1.25 + pixel_y * scale FMOVE.W D5,FP3 FMUL FP7,FP3 * FP3 = pixel_y * scale FSUB.S #$3FA00000,FP3 * FP3 -= 1.25 => ci = y*scale - 1.25 * Progress output every 64 rows MOVE.W D5,D0 ANDI.W #63,D0 BNE.S NOPROG LEA msgRow,A1 MOVEQ #14,D0 TRAP #15 CLR.L D1 MOVE.W D5,D1 MOVEQ #10,D2 MOVEQ #15,D0 TRAP #15 LEA msgOf,A1 MOVEQ #14,D0 TRAP #15 MOVE.L #IMG_H,D1 MOVEQ #10,D2 MOVEQ #15,D0 TRAP #15 LEA msgNewline,A1 MOVEQ #13,D0 TRAP #15 NOPROG * Inner loop: X pixels (D4 = 0..639) CLR.W D4 * D4 = pixel_x XLOOP * cr = -2.0 + pixel_x * scale FMOVE.W D4,FP2 FMUL FP7,FP2 * FP2 = pixel_x * scale FSUB.S #$40000000,FP2 * FP2 -= 2.0 => cr = x*scale - 2.0 * z = 0 + 0i FMOVE.L #0,FP0 * FP0 = zr = 0 FMOVE.L #0,FP1 * FP1 = zi = 0 * Iteration loop (D6 = iteration counter) MOVEQ #0,D6 ITERLOOP * zr_sq = zr * zr FMOVE FP0,FP4 FMUL FP0,FP4 * FP4 = zr^2 * zi_sq = zi * zi FMOVE FP1,FP5 FMUL FP1,FP5 * FP5 = zi^2 * Check escape: zr^2 + zi^2 > 4.0 ? FMOVE FP4,FP6 FADD FP5,FP6 * FP6 = zr^2 + zi^2 FCMP.S #$40800000,FP6 * compare with 4.0 FBGT ESCAPED * zi_new = 2 * zr * zi + ci (compute before zr, since we need old zr) FMUL FP0,FP1 * FP1 = zr * zi (old zi destroyed, but zi^2 safe in FP5) FADD FP1,FP1 * FP1 = 2 * zr * zi FADD FP3,FP1 * FP1 = 2*zr*zi + ci (new zi) * zr_new = zr^2 - zi^2 + cr FMOVE FP4,FP0 * FP0 = zr^2 FSUB FP5,FP0 * FP0 = zr^2 - zi^2 FADD FP2,FP0 * FP0 = zr^2 - zi^2 + cr (new zr) ADDQ.W #1,D6 CMP.W #MAX_ITER,D6 BLT.S ITERLOOP * Reached max iterations — pixel is in the set (black) MOVE.L #$FF000000,(A3)+ * opaque black (ARGB) BRA.S NEXTX ESCAPED * Pick colour from palette: index = (iter - 1) & 15 MOVE.W D6,D0 SUBQ.W #1,D0 ANDI.W #15,D0 ASL.W #2,D0 * D0 = palette offset (4 bytes each) LEA PALETTE,A0 MOVE.L (A0,D0.W),(A3)+ * write pixel colour NEXTX ADDQ.W #1,D4 CMP.W #IMG_W,D4 BLT XLOOP * End of row — advance A3 past the unused portion of the display row ADDA.L #ROW_SKIP,A3 ADDQ.W #1,D5 CMP.W #IMG_H,D5 BLT YLOOP *------------------------------------------------------------------------ * Done — flush, print timing, wait for keypress *------------------------------------------------------------------------ MOVE.B #2,$FD0041...Read more »
=== Whetstone Benchmark === M2 (array)... OK M3 (proc array)... OK M4 (conditionals)... OK M6 (log/exp/sqrt)... OK M7 (proc calls)... OK M8 (trig)... OK Passes: 10 Elapsed: 2191 ms KWIPS: 4564 Whetstone complete. The first Benchmarks run on an xilinx axu3eg - using musashi on arm and the FPU running on the PL fabric.
ver 2.1 build 001 MATT PEARCE 2024-2026 | MC68000 + MC68881 FPGA
>a 1000
001000 00000000 OR.B #0,D0 >FADD.L #1,FP0
001000 F23C402200000001 FADD.L #1,FP0
001008 00000000 OR.B #0,D0 >FADD.S #2.35,FP1
001008 F23C44A240166666 FADD.S #2.35,FP1
001010 00000000 OR.B #0,D0 >FADD FP0,FP1
001010 F20000A2 FADD FP0,FP1
001014 00000000 OR.B #0,D0 >RTS
001014 4E75 RTS
001016 00000000 OR.B #0,D0 >X
MC68901 Multifunction Peripheral Initialized
================================================================================
888b d888 888 d8b .d8888b.
8888b d8888 888 Y8P d88P Y88b
88888b.d88888 888 888 888
888Y88888P888 .d88b. 888d888 888 888 88888b. d88P
888 Y888P 888 d8P Y8b 888P" 888 888 888 "88b .od888P"
888 Y8P 888 88888888 888 888 888 888 888 d88P"
888 " 888 Y8b. 888 888 888 888 888 888"
888 888 "Y8888 888 888 888 888 888 888888888
FPU
================================================================================
ver 2.1 build 001 MATT PEARCE 2024-2026 | MC68000 + MC68881 FPGA
>G 1000
>R
D0=00001000 D1=0000FFFF D2=00000000 D3=00000000
D4=00000030 D5=0000002C D6=00000004 D7=000001FD
A0=00001000 A1=00FE0070 A2=00000830 A3=00000000
A4=00001016 A5=000005BF A6=000005EE SP=00000FF0
PC=00001000 SR=2700
FP0=3FFF0000 80000000 00000000
FP1=40000000 D6666600 00000000
FP2=00000000 00000000 00000000
FP3=00000000 00000000 00000000
FP4=00000000 00000000 00000000
FP5=00000000 00000000 00000000
FP6=00000000 00000000 00000000
FP7=00000000 00000000 00000000
>
The FPU has now been fully validated on an AXU3EG board. Musashi running on arm and the fpu running on the PL fabric. See the github readme for details
A VHDL-2008 implementation of a Motorola MC68881-compatible floating-point coprocessor targeting Xilinx 7-series FPGAs. The design implements the full MC68881 instruction set including all arithmetic, transcendental, program-control, system-control, and packed-decimal operations. It uses DSP-pipelined sequential FP units for the core arithmetic datapath with multi-cycle path constraints for timing closure.
The current plan and progress tracking live in docs/fpu-progress-checklist.md.
.P), FMOVEM (register lists and control registers), FMOVECR (ROM constants).| Resource | Used | Available | Util>#/th### |
|---|---|---|---|
| Slice LUTs | 52,361 | 133,800 | 39.13>#/td### |
| Registers | 13,131 | 267,600 | 4.91>#/td### |
| Block RAM | 5 tiles | 365 | 1.37>#/td### |
| DSP48E1 | 33 | 740 | 4.46>#/td### |
Non-incremental synthesis + implementation, Vivado 2025.2, xc7a200tfbg676-1. Date: 2026-03-05. Includes Section 7 CIR coprocessor interface with FSAVE/FRESTORE Busy frame support and full exception dialog paths; see "CIR feature gating" below.
The design fits on several FPGA families. With CIR disabled (ENABLE_CIR_g => false), the core is ~58K LUTs and fits comfortably on smaller devices:
| Device | LUTs | DSPs | Fit (full)? | Fit (no CIR)? |
|---|---|---|---|---|
| Xilinx Artix-7 200T | 134,600 | 740 | Yes (39%) | Yes (34%) |
| Xilinx Artix-7 100T | 63,400 | 240 | Yes (~83%) | Yes (~72%) |
| Xilinx Zynq UltraScale+ ZU3EG | ~71,000 | 360 | Yes (~74%) | Yes (~64%) |
| Intel Cyclone V 5CEBA7 | 150,720 ALMs | 156 | Yes | Yes |
All RTL is vendor-portable (inferred DSP/BRAM, no Xilinx IP cores). Porting to Intel/Quartus requires XDC-to-SDC constraint conversion and minor DSP inference adjustments.
Create an account to leave a comment. Already have an account? Log In.
Become a member to follow this project and never miss any updates
Arnov Sharma
Carl Strathearn