MC68881 VHDL

Project Logs

Collapse

Atari SFP004 Validation complete on ARM/FPGA
Matthew Pearce • 03/24/2026 at 18:46 • 0 comments

The FPU now works with EMUTOS and SFP004 libraries. It produces an 8x performance improvement compared to the SOFT-FP code. This is using an unmodified set of utilities from 1991. I obtained the utilities from here: https://www.atari-forum.com/viewtopic.php?p=478222#p478222

The test board is an Alinx AXU3EG using musashi on the ARM PS communicating via AXI to the PL hardware FPGA FPU
The next stage is to build a hardware PCB using a 68sec000 for true validation. PCBWAY have kindly sponsored this hardware due to it's unique design. I will provide more details when my board arrives from them.
50mhz 68882 Compatibility added
Matthew Pearce • 03/24/2026 at 08:58 • 0 comments

Just merged a major release. It achieves timing at a 50mhz with full 68882 feature set including 2 concurrent operations. It has been tested on a custom EMUTOS build on an ALINX ZynqMP build. More details to follow. The LITE 68040 subset is also now available at 50mhz.

MC68881/82 FPGA — CIR Coprocessor Interface Update

What's New

This update brings the FPGA's Coprocessor Interface Register (CIR) protocol in line with the real MC68881/82 hardware
encoding, fixes several bus interface bugs, and adds a comprehensive hardware diagnostic test.
It has also been tested in hardware using a custom EMUTOS build running existing FPU .prg tests. Next release will add SFP004 compatibility

MC68881 Native Opcode Encoding

The CIR command word now uses Motorola's native opcode encoding directly — the same encoding a real MC68020/030 CPU
sends over the coprocessor bus. Previously the FPGA used an internal numbering scheme that required a software
translation layer. This means:

- FMOVE = $00, FSQRT = $04, FSIN = $0E, FADD = $22, FMUL = $23, FDIV = $20, FSUB = $28 — exactly as documented in the
MC68881 User Manual
- The translation switch in the F-line handler has been eliminated
- Software that talks directly to the CIR registers (like Atari TT FPU test utilities) can use standard Motorola
opcodes without any remapping

CIR Bus Interface Fixes

- 32-bit operand transfers: Fixed a bug where long-word writes to the CIR operand register were split into two 16-bit
writes by the emulation layer, causing the FPU to receive zero instead of the actual value. Operand transfers now go
through as single 32-bit AXI transactions, matching real hardware behaviour.
- 68882 pending instruction pipeline: Fixed a race condition where back-to-back instructions could lose the second
operation if its OpWord arrived while the first was completing. The pending instruction flags are now only cleared
when actually consumed.

Hardware Diagnostic (cirtest)

New standalone 68K test program (cirtest.c) that exercises the full CIR dialog protocol from the Merlin2 monitor. Runs
12 tests covering:

- Integer load/store round-trips (positive and negative)
- Memory-to-register arithmetic: FADD, FMUL, FDIV, FSUB
- Register-to-register arithmetic: FADD FPn,FPm
- Unary operations: FSQRT, FNEG, FABS
- Double-precision transcendentals: FSIN(1.0), FSQRT(2.0)

All 12 tests pass on hardware. All 13 GHDL simulation testbenches pass.

Known Issue: SFP004 Peripheral Protocol

Disassembly of FPU_HARD.PRG (Quidnunc 1991, commonly used on Atari ST/SFP004 boards) revealed it uses a different
register protocol than the MC68020 CIR standard — it writes commands without an OpWord and expects different response
encodings. This will be addressed in the next update to support the SFP004 peripheral access pattern alongside the
standard CIR protocol.

Files Changed

19 files across VHDL RTL, testbenches, C firmware, and documentation. No changes to the ALU computation logic — only
the bus interface encoding and protocol handling.

FPU Particle Animation demo

Matthew Pearce • 03/19/2026 at 10:35 • 0 comments

Demo of fireworks using the FPU on xilinx hardware with 68000 processor in arm emulation

#include <stdio.h>
#include <math.h>
#include "../lib/merlin2_gfx.h"
#include "../lib/merlin2_rand.h"

/* 640x480 viewport centred on 1280x720 framebuffer */
#define VP_W    640
#define VP_H    480
#define VP_X    320
#define VP_Y    120
_Static_assert(VP_X + VP_W <= GFX_SCREEN_W, "Viewport exceeds screen width");
_Static_assert(VP_Y + VP_H <= GFX_SCREEN_H, "Viewport exceeds screen height");

/* Physics */
#define GRAVITY     0.06f
#define PARTICLES_PER_BURST 32

/* Rocket states */
#define ROCKET_DEAD    0
#define ROCKET_RISING  1
#define ROCKET_BURST   2

typedef struct {
    float x, y;
    float dy;
    float target_y;
    uint32_t colour;
    int state;
} rocket_t;

typedef struct {
    float x, y;
    float old_x, old_y;
    float vx, vy;
    uint32_t colour;
    int life;
    int max_life;
} particle_t;

static rocket_t rocket;
static particle_t particles[PARTICLES_PER_BURST];

static const uint32_t palette[] = {
    0xFFFF2020,  /* red */
    0xFFFFD700,  /* gold */
    0xFF20FF20,  /* green */
    0xFF4080FF,  /* blue */
    0xFFFFFFFF,  /* white */
    0xFFFF40FF,  /* magenta */
};
#define PALETTE_SIZE (sizeof(palette) / sizeof(palette[0]))

static inline void put_pixel(int x, int y, uint32_t argb)
{
    if ((unsigned)x < VP_W && (unsigned)y < VP_H)
        *gfx_fb_ptr(x + VP_X, y + VP_Y) = argb;
}

static uint32_t dim_colour(uint32_t colour, int life, int max_life)
{
    if (max_life <= 0 || life <= 0)
        return 0xFF000000;
    unsigned r = (colour >> 16) & 0xFF;
    unsigned g = (colour >> 8) & 0xFF;
    unsigned b = colour & 0xFF;
    r = r * (unsigned)life / (unsigned)max_life;
    g = g * (unsigned)life / (unsigned)max_life;
    b = b * (unsigned)life / (unsigned)max_life;
    return 0xFF000000 | (r << 16) | (g << 8) | b;
}

static void spawn_rocket(void)
{
    rocket.x = (float)rand_range(120, VP_W - 120);
    rocket.y = (float)(VP_H - 1);
    rocket.dy = -4.0f - (float)rand_range(0, 15) * 0.1f;
    rocket.target_y = (float)rand_range(80, VP_H / 3);
    rocket.colour = palette[rand_range(0, (int)PALETTE_SIZE - 1)];
    rocket.state = ROCKET_RISING;
}

static void burst_rocket(void)
{
    float speed_base = 1.5f;
    int i;
    for (i = 0; i < PARTICLES_PER_BURST; i++) {
        particle_t *p = &particles[i];
        float angle = (float)i * (2.0f * (float)M_PI / (float)PARTICLES_PER_BURST);
        float speed = speed_base + (float)rand_range(0, 10) * 0.1f;
        p->x = rocket.x;
        p->y = rocket.y;
        p->old_x = rocket.x;
        p->old_y = rocket.y;
        p->vx = cosf(angle) * speed;
        p->vy = sinf(angle) * speed;
        p->colour = rocket.colour;
        p->max_life = rand_range(30, 50);
        p->life = p->max_life;
    }
    rocket.state = ROCKET_BURST;
}

static int update_and_draw(void)
{
    int alive = 0;
    int i;

    /* Update rocket */
    if (rocket.state == ROCKET_RISING) {
        /* Erase old position */
        put_pixel((int)rocket.x, (int)rocket.y + 1, 0xFF000000);
        rocket.y += rocket.dy;
        /* Draw rocket */
        put_pixel((int)rocket.x, (int)rocket.y, 0xFFFFFFFF);
        if (rocket.y <= rocket.target_y)
            burst_rocket();
    }

    /* Update particles */
    for (i = 0; i < PARTICLES_PER_BURST; i++) {
        particle_t *p = &particles[i];
        if (p->life <= 0)
            continue;

        /* Erase at old position */
        put_pixel((int)p->old_x, (int)p->old_y, 0xFF000000);

        /* Physics */
        p->vy += GRAVITY;
        p->old_x = p->x;
        p->old_y = p->y;
        p->x += p->vx;
        p->y += p->vy;
        p->life--;

        /* Bounds check */
        if ((int)p->x < 0 || (int)p->x >= VP_W ||
            (int)p->y < 0 || (int)p->y >= VP_H) {
            p->life = 0;
            continue;
        }

        /* Draw at new position with fading colour */
        uint32_t c = dim_colour(p->colour, p->life, p->max_life);
        put_pixel((int)p->x, (int)p->y, c);
        alive++;
    }

    return alive;
}

int main(void)
{
    int i;

    printf("Fireworks demo - press any key to exit\n");

    rand_seed(gfx_get_time());
    gfx_set_mode(1);
    gfx_clear(0xFF000000);

    rocket.state = ROCKET_DEAD;
    for (i = 0; i < PARTICLES_PER_BURST; i++)
        particles[i].life = 0;

    spawn_rocket();

    while (!gfx_char_ready()) {
        int alive = update_and_draw();

        /* Spawn next rocket when current burst dies */
        if (rocket.state != ROCKET_RISING...

Mandelbrot demo

Matthew Pearce • 03/16/2026 at 12:52 • 0 comments

Mandelbrot set generated using assembler on hardware.

*------------------------------------------------------------------------
* Mandelbrot Set Renderer
*
* Renders a 640x640 pixel Mandelbrot set using MC68881 FPU F-line
* instructions.  Centred on the 1280x720 display at offset (320, 40).
*
* View window:  real [-2.0, +0.5], imag [-1.25, +1.25]
* Scale:        2.5 / 640 = 0.00390625
* Max iters:    32
* Palette:      16 colours, cycled via (iter & 15)
*
* Usage: Load via S-record (L command), execute with G 2000.
*
* Assemble:
*   vasmm68k_mot -Fsrec -m68000 -m68881 -o mandelbrot.srec mandelbrot.s
*------------------------------------------------------------------------

IMG_W       EQU  640
IMG_H       EQU  640
SCREEN_W    EQU  1280
FB_BASE     EQU  $800000
OFF_X       EQU  320           * horizontal offset on 1280-wide display
OFF_Y       EQU  40            * vertical offset on 720-high display
MAX_ITER    EQU  32
ROW_BYTES   EQU  SCREEN_W*4   * 5120 bytes per display row
IMG_ROW     EQU  IMG_W*4       * 2560 bytes per image row
ROW_SKIP    EQU  ROW_BYTES-IMG_ROW  * 2560 bytes to skip between rows

            ORG     $2000

*------------------------------------------------------------------------
* Entry point
*------------------------------------------------------------------------
START
            LEA     msgTitle,A1
            MOVEQ   #13,D0
            TRAP    #15

* Switch to graphics mode
            MOVEQ   #17,D0
            MOVEQ   #1,D1
            TRAP    #15

* Clear to black
            MOVEQ   #18,D0
            MOVE.L  #$FF000000,D1
            TRAP    #15

            LEA     msgRender,A1
            MOVEQ   #13,D0
            TRAP    #15

* Record start time
            MOVEQ   #8,D0
            TRAP    #15
            MOVE.L  D1,START_TIME

* Set up FP constants
* FP7 = scale = 0.00390625 (2.5/640)
            FMOVE.S #$3B800000,FP7     * 0.00390625

* Compute framebuffer start address
* FB_START = FB_BASE + OFF_Y * ROW_BYTES + OFF_X * 4
            LEA     FB_BASE,A3
            ADDA.L  #OFF_Y*ROW_BYTES+OFF_X*4,A3

* Outer loop: Y pixels (D5 = 0..639)
            CLR.W   D5                 * D5 = pixel_y

YLOOP
* ci = -1.25 + pixel_y * scale
            FMOVE.W D5,FP3
            FMUL    FP7,FP3            * FP3 = pixel_y * scale
            FSUB.S  #$3FA00000,FP3     * FP3 -= 1.25  =>  ci = y*scale - 1.25

* Progress output every 64 rows
            MOVE.W  D5,D0
            ANDI.W  #63,D0
            BNE.S   NOPROG
            LEA     msgRow,A1
            MOVEQ   #14,D0
            TRAP    #15
            CLR.L   D1
            MOVE.W  D5,D1
            MOVEQ   #10,D2
            MOVEQ   #15,D0
            TRAP    #15
            LEA     msgOf,A1
            MOVEQ   #14,D0
            TRAP    #15
            MOVE.L  #IMG_H,D1
            MOVEQ   #10,D2
            MOVEQ   #15,D0
            TRAP    #15
            LEA     msgNewline,A1
            MOVEQ   #13,D0
            TRAP    #15

NOPROG
* Inner loop: X pixels (D4 = 0..639)
            CLR.W   D4                 * D4 = pixel_x

XLOOP
* cr = -2.0 + pixel_x * scale
            FMOVE.W D4,FP2
            FMUL    FP7,FP2            * FP2 = pixel_x * scale
            FSUB.S  #$40000000,FP2     * FP2 -= 2.0  =>  cr = x*scale - 2.0

* z = 0 + 0i
            FMOVE.L #0,FP0              * FP0 = zr = 0
            FMOVE.L #0,FP1              * FP1 = zi = 0

* Iteration loop (D6 = iteration counter)
            MOVEQ   #0,D6

ITERLOOP
* zr_sq = zr * zr
            FMOVE   FP0,FP4
            FMUL    FP0,FP4            * FP4 = zr^2

* zi_sq = zi * zi
            FMOVE   FP1,FP5
            FMUL    FP1,FP5            * FP5 = zi^2

* Check escape: zr^2 + zi^2 > 4.0 ?
            FMOVE   FP4,FP6
            FADD    FP5,FP6            * FP6 = zr^2 + zi^2
            FCMP.S  #$40800000,FP6     * compare with 4.0
            FBGT    ESCAPED

* zi_new = 2 * zr * zi + ci  (compute before zr, since we need old zr)
            FMUL    FP0,FP1            * FP1 = zr * zi  (old zi destroyed, but zi^2 safe in FP5)
            FADD    FP1,FP1            * FP1 = 2 * zr * zi
            FADD    FP3,FP1            * FP1 = 2*zr*zi + ci  (new zi)

* zr_new = zr^2 - zi^2 + cr
            FMOVE   FP4,FP0            * FP0 = zr^2
            FSUB    FP5,FP0            * FP0 = zr^2 - zi^2
            FADD    FP2,FP0            * FP0 = zr^2 - zi^2 + cr  (new zr)

            ADDQ.W  #1,D6
            CMP.W   #MAX_ITER,D6
            BLT.S   ITERLOOP

* Reached max iterations — pixel is in the set (black)
            MOVE.L  #$FF000000,(A3)+   * opaque black (ARGB)
            BRA.S   NEXTX

ESCAPED
* Pick colour from palette: index = (iter - 1) & 15
            MOVE.W  D6,D0
            SUBQ.W  #1,D0
            ANDI.W  #15,D0
            ASL.W   #2,D0              * D0 = palette offset (4 bytes each)
            LEA     PALETTE,A0
            MOVE.L  (A0,D0.W),(A3)+    * write pixel colour

NEXTX
            ADDQ.W  #1,D4
            CMP.W   #IMG_W,D4
            BLT     XLOOP

* End of row — advance A3 past the unused portion of the display row
            ADDA.L  #ROW_SKIP,A3

            ADDQ.W  #1,D5
            CMP.W   #IMG_H,D5
            BLT     YLOOP

*------------------------------------------------------------------------
* Done — flush, print timing, wait for keypress
*------------------------------------------------------------------------
 MOVE.B #2,$FD0041...

First Benchmarks
Matthew Pearce • 03/15/2026 at 10:42 • 0 comments

=== Whetstone Benchmark === M2 (array)... OK M3 (proc array)... OK M4 (conditionals)... OK M6 (log/exp/sqrt)... OK M7 (proc calls)... OK M8 (trig)... OK Passes: 10 Elapsed: 2191 ms KWIPS: 4564 Whetstone complete. The first Benchmarks run on an xilinx axu3eg - using musashi on arm and the FPU running on the PL fabric.

Fully Working FPGA MC68881

Matthew Pearce • 03/13/2026 at 19:50 • 0 comments

ver 2.1 build 001  MATT PEARCE 2024-2026  |  MC68000 + MC68881 FPGA
>a 1000
001000    00000000             OR.B    #0,D0  >FADD.L #1,FP0
001000    F23C402200000001     FADD.L  #1,FP0
001008    00000000             OR.B    #0,D0  >FADD.S #2.35,FP1
001008    F23C44A240166666     FADD.S  #2.35,FP1
001010    00000000             OR.B    #0,D0  >FADD FP0,FP1
001010    F20000A2             FADD    FP0,FP1
001014    00000000             OR.B    #0,D0  >RTS
001014    4E75                 RTS
001016    00000000             OR.B    #0,D0  >X
MC68901 Multifunction Peripheral Initialized

================================================================================
 888b     d888                  888 d8b            .d8888b.
 8888b   d8888                  888 Y8P           d88P  Y88b
 88888b.d88888                  888               888    888
 888Y88888P888  .d88b.  888d888 888 888 88888b.        d88P
 888 Y888P 888 d8P  Y8b 888P"   888 888 888 "88b   .od888P"
 888  Y8P  888 88888888 888     888 888 888  888  d88P"
 888   "   888 Y8b.     888     888 888 888  888 888"
 888       888  "Y8888  888     888 888 888  888 888888888
                                                  FPU
================================================================================
ver 2.1 build 001  MATT PEARCE 2024-2026  |  MC68000 + MC68881 FPGA
>G 1000
>R
D0=00001000 D1=0000FFFF D2=00000000 D3=00000000
D4=00000030 D5=0000002C D6=00000004 D7=000001FD
A0=00001000 A1=00FE0070 A2=00000830 A3=00000000
A4=00001016 A5=000005BF A6=000005EE   SP=00000FF0
  PC=00001000 SR=2700
 FP0=3FFF0000 80000000 00000000
 FP1=40000000 D6666600 00000000
 FP2=00000000 00000000 00000000
 FP3=00000000 00000000 00000000
 FP4=00000000 00000000 00000000
 FP5=00000000 00000000 00000000
 FP6=00000000 00000000 00000000
 FP7=00000000 00000000 00000000
>

The FPU has now been fully validated on an AXU3EG board. Musashi running on arm and the fpu running on the PL fabric. See the github readme for details

Nearly Complete

Matthew Pearce • 03/05/2026 at 15:33 • 0 comments

LUT Usage now fits comfortably on Artix a7-100t - now with room for a small processor as well as the fpu.

Overview

A VHDL-2008 implementation of a Motorola MC68881-compatible floating-point coprocessor targeting Xilinx 7-series FPGAs. The design implements the full MC68881 instruction set including all arithmetic, transcendental, program-control, system-control, and packed-decimal operations. It uses DSP-pipelined sequential FP units for the core arithmetic datapath with multi-cycle path constraints for timing closure.

The current plan and progress tracking live in docs/fpu-progress-checklist.md.

Features

Full instruction set: FADD, FSUB, FMUL, FDIV, FSQRT, FMOD, FREM, FSCALE, FSGLDIV, FSGLMUL, FABS, FNEG, FINT, FINTRZ, FGETEXP, FGETMAN, FTST, FCMP.
Transcendental engine: FSIN, FCOS, FTAN, FSINCOS, FASIN, FACOS, FATAN, FATANH, FSINH, FCOSH, FTANH, FETOX, FETOXM1, FTWOTOX, FTENTOX, FLOGN, FLOGNP1, FLOG2, FLOG10. BRAM-based seed tables with Taylor/CORDIC iteration.
Data movement: FMOVE (all formats including packed decimal .P), FMOVEM (register lists and control registers), FMOVECR (ROM constants).
Program control: FScc, FBcc, FDBcc, FTRAPcc, FNOP with BSUN trap gating.
System control: FSAVE/FRESTORE with Null/Idle/Busy frame support (45-word Busy frame with full sub-unit save/restore hierarchy).
IEEE 754 compliance: NaN propagation (SNaN/QNaN discrimination, payload preservation), infinity handling, signed zero, gradual underflow, all four rounding modes (nearest, zero, +inf, -inf), single/double/extended precision.
Exception handling: Per-operation FPSR exception policies, FPCR trap enable, accrued exception accumulation.
Peripheral interface: Register-mapped bus interface with DSACK handshake, suitable for M68000/M68010 peripheral-mode operation.

Utilization (Xilinx Artix-7 200T, post-place)

Resource	Used	Available	Util>#/th###
Slice LUTs	52,361	133,800	39.13>#/td###
Registers	13,131	267,600	4.91>#/td###
Block RAM	5 tiles	365	1.37>#/td###
DSP48E1	33	740	4.46>#/td###

Non-incremental synthesis + implementation, Vivado 2025.2, xc7a200tfbg676-1. Date: 2026-03-05. Includes Section 7 CIR coprocessor interface with FSAVE/FRESTORE Busy frame support and full exception dialog paths; see "CIR feature gating" below.

Timing

Target clock: 10 MHz (100 ns period) — matches MC68881 bus timing.
Multi-cycle path constraints on sequential FP units (mul: 4 cycles, addsub: 6 cycles, div: 6 cycles) and trig engine hold states.
Post-route WNS: +16.631 ns (83% slack margin at 100 ns period; effective Fmax ~12 MHz).
WHS (hold): no violations.

Target device compatibility

The design fits on several FPGA families. With CIR disabled (ENABLE_CIR_g => false), the core is ~58K LUTs and fits comfortably on smaller devices:

Device	LUTs	DSPs	Fit (full)?	Fit (no CIR)?
Xilinx Artix-7 200T	134,600	740	Yes (39%)	Yes (34%)
Xilinx Artix-7 100T	63,400	240	Yes (~83%)	Yes (~72%)
Xilinx Zynq UltraScale+ ZU3EG	~71,000	360	Yes (~74%)	Yes (~64%)
Intel Cyclone V 5CEBA7	150,720 ALMs	156	Yes	Yes

All RTL is vendor-portable (inferred DSP/BRAM, no Xilinx IP cores). Porting to Intel/Quartus requires XDC-to-SDC constraint conversion and minor DSP inference adjustments.

View all 7 project logs

Screenshot 2026-02-13 165504.png Portable Network Graphics (PNG) - 436.17 kB - 02/13/2026 at 17:07		Preview

Screenshot 2026-02-09 114005.png Vivado Simulation - I have implemented SIN/COS/TAN functions plus a huge refactor. I now have 75% LUT usage on an xca200t - down from 169% Portable Network Graphics (PNG) - 421.93 kB - 02/09/2026 at 11:40		Preview

MC68881 VHDL

Description

Files

Screenshot 2026-02-13 165504.png

Screenshot 2026-02-09 114005.png

Project Logs

Collapse

Atari SFP004 Validation complete on ARM/FPGA

50mhz 68882 Compatibility added

FPU Particle Animation demo

Mandelbrot demo

First Benchmarks

Fully Working FPGA MC68881

Nearly Complete

LUT Usage now fits comfortably on Artix a7-100t - now with room for a small processor as well as the fpu.

Overview

Features

Utilization (Xilinx Artix-7 200T, post-place)

Timing

Target device compatibility

Discussions

Similar Projects

SerpenTime

EmotiGlass

Egor V.2 Robo-Animatronic

Nand2Tetris in Verilog Part3 - Verilator and SDL2

MC68881 VHDL

Become a Hackaday.io member

Just one more thing

Description

Files

Screenshot 2026-02-13 165504.png

Screenshot 2026-02-09 114005.png

Project Logs Collapse

LUT Usage now fits comfortably on Artix a7-100t - now with room for a small processor as well as the fpu.Overview

Features

Utilization (Xilinx Artix-7 200T, post-place)

Timing

Target device compatibility

Enjoy this project?

Discussions

Become a Hackaday.io Member

Similar Projects

Does this project spark your interest?

Report project as inappropriate

Send message

Remove Member

Project Logs

Collapse

LUT Usage now fits comfortably on Artix a7-100t - now with room for a small processor as well as the fpu.

Overview