Close

Another Word On DRAM

A project log for Mackerel 68k Linux Computer

A series of m68k computers designed to run Linux

colin-mColin M 10/25/2024 at 04:150 Comments

Getting a DRAM controller working at all feels like a great accomplishment, and while it has been stable and functional, there were some situations I couldn't explain. For example, it was not possible to run the DRAM controller at anything other than twice the CPU speed, even running them at the same frequency failed completely. I was not satisfied with my understanding of my own design. I also wanted the option to run the DRAM on its own independent clock to completely free up the choice of oscillator for the CPU.

With the goal of better understanding and more flexibility, I took the lessons learned from my first iteration and went back to the drawing board, starting with the datasheet. The simplest place to start is the CAS-before-RAS refresh.

CAS-before-RAS Refresh

CAS-before-RAS refresh timing diagram from the TMS417400 datasheet

The refresh process is not complicated: pull CAS low, then pull RAS low, raise CAS, and then raise RAS again. One thing worth noting here is that the WE pin has to be HIGH by the time RAS is lowered. Since the state of the WE pin is "don't care" for the rest of the refresh cycle, I chose to pull it HIGH in the first state of the refresh state machine. Note: Mackerel-10 has four 30-pin SIMMs in two 16-bit pairs, A and B. RAS is shared between SIMMs in a pair, but the CAS lines are all independent, thus two RAS pins and four CAS pins in my controller.

REFRESH1: begin
    // Acknowledge the refresh request
    refresh_ack <= 1'b1;

    // Lower CAS
    CASA0 <= 1'b0;
    CASA1 <= 1'b0;
    CASB0 <= 1'b0;
    CASB1 <= 1'b0;
    WRA <= 1'b1;
    WRB <= 1'b1;
    state <= REFRESH2;
end

REFRESH2: begin
    // Lower RAS
    RASA <= 1'b0;
    RASB <= 1'b0;
    state <= REFRESH3;
end

REFRESH3: begin
    // Raise CAS
    CASA0 <= 1'b1;
    CASA1 <= 1'b1;
    CASB0 <= 1'b1;
    CASB1 <= 1'b1;
    state <= REFRESH4;
end

REFRESH4: begin
    // Raise RAS
    RASA <= 1'b1;
    RASB <= 1'b1;
    state <= PRECHARGE;
end

The final piece of the DRAM refresh cycle is determining how often it needs to happen. According to the datasheet, all 2048 rows need to be refreshed every 32 ms. If we refresh each cell incrementally with CBR, that means we need to refresh a cell every 32 ms / 2048 = 0.015625 ms. That equates to 64 kHz. Finally, the DRAM controller is running from a 50 MHz oscillator, so 50 MHz / 64 kHz = 781 cycles between refreshes.

The Verilog for counting cycles is basic, but I'll include it here for reference. The two refresh_ registers are used to pass the refresh state back and forth between this generator code and the main state machine. REFRESH_CYCLE_CNT is set to 781.

// ==== Periodic refresh generator
reg refresh_request = 1'b0;
reg refresh_ack = 1'b0;
reg [11:0] cycle_count = 12'b0;

always @(posedge CLK_ALT) begin
    if (~RST) cycle_count <= 12'b0;
    else begin
        cycle_count <= cycle_count + 12'b1;

        if (cycle_count == REFRESH_CYCLE_CNT) begin
            refresh_request <= 1'b1;
            cycle_count <= 12'b0;
        end
        
        if (refresh_ack) refresh_request <= 1'b0;
    end
end
CAS-before-RAS refresh cycle running at 64 kHz as calculated

Read/Write Cycles

With the CBR refresh behavior confirmed, I started to revamp the rest of the state machine, i.e. the process of actually reading and writing memory. As mentioned, my first implementation worked, but just barely. One of the issues I had was a dozen or more compiler warnings in Quartus that looked something like this: Warning (163076): Macrocell buffer inserted after node. I could not track down an exact cause, but the little information I found online and my own testing seemed to indicate that this error basically means "you're trying to do much work at once". By breaking up my state machine into more smaller states and removing highly parallel pieces of code, I was able to get rid of all all these warnings. It seems like the key is not to change too many register values per clock cycle, but to instead pipeline the design.

DRAM read cycle timing diagram from the TMS417400 datasheet

The actual logic of the DRAM read and write cycles hasn't changed. It's still a multi-step process where the controller multiplexes the CPU address bus to the row address of the DRAM, asserts /RAS, multiplexes the column address, then asserts /CAS and /DTACK until the CPU finishes the bus cycle. Here's a snippet of the state machine showing this piece:

IDLE: begin
    if (refresh_request) begin
        // Start CAS-before-RAS refresh cycle
        state <= REFRESH1;
    end
    else if (~CS2 && ~AS2) begin
        // DRAM selected, start normal R/W cycle
        state <= RW1;
    end
end

RW1: begin
    // Mux in the address
    ADDR_OUT <= ADDR_IN[11:1];
    state <= RW2;
end

RW2: begin
    // Row address is valid, lower RAS
    if (BANK_A) RASA <= 1'b0;
    else RASB <= 1'b0;
    state <= RW3;
end

RW3: begin
    // Mux in the column address
    ADDR_OUT <= ADDR_IN[22:12];

    // Set the WE line
    if (BANK_A) WRA <= RW;
    else WRB <= RW;

    state <= RW4;
end

RW4: begin
    // Column address is valid, lower CAS
    if (BANK_A) begin
        CASA0 <= LDS;
        CASA1 <= UDS;
    end
    else begin
        CASB0 <= LDS;
        CASB1 <= UDS;
    end
    state <= RW5;
end

RW5: begin
    // Data is valid, lower DTACK
    DTACK_DRAM <= 1'b0;

    // When AS returns high, the bus cycle is complete
    if (AS) state <= PRECHARGE;
end

And here's what it looks like in simulation:

Simulation of the DRAM controller reading memory

There are more stages than in my previous version, but each stage is doing a small and obvious thing. It's tempting to try to combine some of these steps together, and there's probably room for optimization, but clarity and stability are the priorities at the moment.

Crossing Clock Domains

The final piece I wanted to tackle was having the ability to run the DRAM controller at any speed, not having it tied to a multiple of the CPU frequency. Because DRAM takes more cycles to access than SRAM, the whole system is slower clock-for-clock. It's not a dramatic difference, but those extra clock cycles add up. One way to alleviate some of this delay is to run the DRAM controller at a faster clock than the CPU. This shouldn't be too hard. Most 68000s are only rated to 10 MHz or so. The CPLD running the DRAM controller can easily handle 50 MHz. With this arrangement, most or all of the extra cycles taken up by DRAM access happen between the slower CPU cycles.

In a perfect world, this change would be as simple as connecting a second faster oscillator to the DRAM controller and updating the CLK pin. In reality, this leads to metastability. I won't try to explain that concept here as I'm just coming to terms with it myself, but the outcome is that there needs to be a bit of a handoff when referencing the slow CPU signals from the fast DRAM clock cycles. This is called crossing clock domains and it's accomplished by double registering the slower signals before using them in the faster domain. Fortunately, Mackerel only has two input signals that fit that description: CS and AS.

reg AS1 = 1;
reg CS1 = 1;
reg AS2 = 1;
reg CS2 = 1;

always @(posedge CLK_ALT) begin
    AS1 <= AS;
    CS1 <= CS;
    AS2 <= AS1;
    CS2 <= CS1;
end

Double-flopping the DRAM chip-select pin and the CPU's /AS pin like this virtually guarantees that the DRAM controller won't sample them during a transition (the cause of metastability). CS2 and AS2 are now nice and stable in the DRAM's clock domain and they can be used to kick off the DRAM access process (see the IDLE state in the Verilog above).

We've now removed the link between the CPU clock and the DRAM controller. This does not scale infinitely. There are some limitations on the differences between the clocks, but it's dramatically more flexible than my last attempt. In testing, I was able to run the DRAM controller at 50 MHz with the CPU clock anywhere between 9 and 20 MHz. It's also possible to remove the double-flopping and run on one synchronized clock, something I could not do previously.

Wrapping Up

Implementing a DRAM controller for a 40 year old CPU on a 20 year old CPLD is quite a niche subject, but this is the information I wish I had when I started working on this. Hopefully this is helpful to somebody. If that's you, share your project. I'd love to hear what you're working on!

Here is the full Verilog code for the DRAM controller: https://github.com/crmaykish/mackerel-68k/blob/master/pld/mackerel-10/dram_controller/dram_controller.v

Discussions