-
v1.0.0 Firmware Done!
04/11/2023 at 18:28 • 0 commentsYesterday I uploaded a stable firmware build to the github releases page. Running the rp2040s at 266MHz with a qspi clock divider of 4. This means the PSRAM can be accessed at 66MHz, which with all the testing I did with DMA and where to place the statements for fetching data, means I can comfortably hit N64 PI bus speeds of 0x2040, which is a bit away from the stock speed of 0x1240 but functions well enough for most games.
Currently only EEPROM (both 4KB and 16KB) are saved/loaded from the SD card. FlashRAM support (1Mbit saves) aren't yet supported nor is SRAM. SRAM works while the the cart is powered but I'm still figuring out the best timing to send that data for save. I thought that the reset button on the N64 would give me some kind of alert so I could use that to dump saves before the user powered off. It might, but I haven't been able to figure that out yet.
-
Now Shipping!
04/06/2023 at 18:49 • 0 commentsIt's Happening!
I have all the parts needed to start producing the first batch of Dreamdrive64s! If you want to pick one up you can do so here https://dreamcraftindustries.com/products/dreamdrive64
Firmware v1.0 is ready and will be up on github before anyone receives their Dreamdrive64s. All units will ship with the latest firmware. All users need to do is unzip a folder into the root of their SD Card that contains some extra info for the cart to read to add eeprom save support as well as loading thumbnails for highlighted roms in the menu.
There is still some process to figure out if I ever need to scale up. There is still a bit of manual work involved with getting the product together. Cutting of the cart shell to accommodate the SD Card and usb plugs. A small hole needs to be drilled through the back so the user can poke a button while plugging in the usb in order to update the firmware.
Once I have shipped a few of these things I'll better understand some of the challenges that come from manufacturing something like this. I'm also still working on developing my own injection molds for the shells so I can manufacture those in the workshop without having to buy something off the shelf. Stay tuned for more updates about that process involving 3d printing molds and building a CNC mill.
-
Conquering Clocks and Gremlins
03/29/2023 at 21:05 • 0 commentsPSRAM Challenges
The PSRAM presented numerous challenges throughout development. One issue arose from the SSI clock divider, which could only be set to even integers. When the RP2040s were originally clocked at 266MHz to meet N64 SRAM timings and the divider was set to 2 (resulting in a 133MHz QSPI), games experienced erratic errors and often failed to boot.
A 266/4 configuration (66MHz QSPI) was insufficient for achieving "stock" N64 bus speeds. On the PicoCart-Lite ("v1"), this wasn't a problem, as the 266/2 setting (133MHz QSPI) with short data lines to flash proved reliable and allowed for stock speeds.
This project marked a turning point in my learning journey, as my background in software engineering had previously led me to treat hardware as a weekend project. In the past, I had the luxury of "ignoring air resistance," but this endeavor demanded a thorough consideration of such variables.
Reflections, line impedance, terminating resistors, and topography all became familiar terms as a fellow Discord member and I grappled with hardware gremlins. This helpful individual even went so far as to reroute the PSRAM data lines, add terminating resistors, and assist me in resolving hardware issues over several months.
After attempting to overclock the RP2040s to a 360/4 setting (90MHz QSPI), I achieved stock speeds with mostly stable data reliability. Another hardware revision incorporating termination resistors and the 360/4 configuration appeared to be the solution. However, when testing additional boards from my batch, I discovered that at least one of them failed to operate at these frequencies.
The Quest For Data
So some notes on how the n64 bus works. The n64 sends a 32 bit address: upper 16 bits sampled on the falling edge of ALEH (Address line high), then lower 16 bits sampled on the falling edge of the ALEL (address line low). There is then a delay before the read line goes low which is the cart's cue to fetch the data and have 16 bits of data ready on the data lines when read line is asserted.
The time after ALEL -> low and read low depends on the n64 bus speed but is as quick as 1us. This gives us some time to "prefetch" a half-word of data in anticipation of the read line going low.
Read line low and the pulse that follows to latch the read data are also affected by the n64 bus speed.
- Stock speeds are 0x12 = 18 n64 cycles @ 62.5MHz.
- The read line is low for roughly 300ns
- The n64 fetches 32bits, the read line pulses for about 60ns then goes low again.
- Reads are finished once ALEH is asserted.
- So at the stock speeds, this should give us about 300ns to fetch and get data ready.
- The qspi hardware fetches 32bits of data.
- For the psram chips this means
- 22 clocks to send, 8bit command, 24bit address, 6 wait cycles, 32 bits of data.
- For the psram chips this means
Once we have the address we have 1us to prefetch the first half-word. In that 1us time:
- Set the right chip to access via the demux
- Setup the DMA to read from the xip pointer address at the appropriate transformed location
- Wait for the read to complete
We then wait for the read line to go low:
- Put the data from the dma buffer into the PIO tx fifo
- Start the DMA read for the next address
- we assume there will be another read as the n64 can read up to 256 words before sending a new address, although it can also be less.
Here is what that code looks like
if (last_addr >= 0x10000000 && last_addr <= 0x1FBFFFFF) { // Domain 1, Address 2 Cartridge ROM // Change the banked memory chip if needed tempChip = ((last_addr >> 23) & 0x7) + 1;// psram_addr_to_chip(last_addr); if (tempChip != g_currentMemoryArrayChip) { g_currentMemoryArrayChip = tempChip; // Set the new chip psram_set_cs(g_currentMemoryArrayChip); } // Set the correct read address (&dma_hw->ch[dma_chan])->al3_read_addr_trig = (uintptr_t)(ptr16 + (((last_addr - g_addressModifierTable[g_currentMemoryArrayChip]) & 0xFFFFFF) >> 1)); do { // Wait for value from psram while(!!(dma_hw->ch[dma_chan].al1_ctrl & DMA_CH0_CTRL_TRIG_BUSY_BITS)) { tight_loop_contents(); } // Move the value out of the buffer so we can kick off the next fetch next_word = dmaValue; // Kick off next value fetch in the background dma_hw->multi_channel_trigger = 1u << dma_chan; // Wait for pio to see read line go low or ALEH happened while((pio->fstat & 0x100) != 0) tight_loop_contents(); addr = pio->rxf[0]; if (addr == 0) { // if read line was low // READ pio->txf[0] = next_word; last_addr += 2; } else if (addr & 0x00000001) { // WRITE // Ignore data since we're asked to write to the ROM. last_addr += 2; } else { // New address, ALEH is asserted break; } } while (1); }
While this process seems simple enough, it was difficult to pin down when to make the next dma fetch to maximize the n64's bus speed (e.g. as close to 0x12 as possible).
For slow rp2040 clock speeds, and thusly a faster qspi bus as we can use a smaller divider (e.g. 200/2) I found that fetching the next word gave better timings if done AFTER we set `pio->txf[0] = next_word`. The code posted is for 266/4 and comfortably hits 0x20 timings.
Here are my notes while I was testing clock/divider settings and finding the tightest timings that allowed games to be played.
300/4 -> boots 0x1540(336ns) (Moved DMA fetch)-> (0x1C40=448ns) (112ns diff) 22 * qclk = 293.333 13 * pclk = 42ns 210/2 -> boots 0x2040(512ns) (Moved DMA fetch)-> (0x1C40=448ns) (64ns diff) 22 * qclk = 209.524ns 64 * pclk = 302ns 180/2 -> boots 0x2E40(736ns) (Moved DMA fetch)-> (0x2240=544ns) (192ns diff) 22 * qclk = 245ns 89 * pclk = 491ns 160/2 -> boots 0x3D40(976ns) (Moved DMA fetch)-> (0x2740=624ns) (352ns diff) 22 * qclk = 275ns 113 * pclk = 701ns 140/2 -> boots 0x4D40(1232ns) (Moved DMA fetch)-> (0x3340=816ns) (416ns diff) 22 * qclk = 315ns 129 * pclk = 917ns
I still haven't figured out exactly where all my clock cycles are being spent when cases like 336/4 and even 330/4 should theoretically have enough time to make the latches. The pclk calculations are guesses based on the known time to fetch data from the psram chips and the tightest n64 bus timings.
I tried DMA'ing into an array using a full word instead of using 16bit DMA reads and consuming the array as the dma wrote to it. That resulted in even slower bus patch speeds likely due to memory contention.
When attempting to allow for increased sram read/write timings, I discovered it takes the rp2040 37ns at 360MHz to read from a statically allocated array. I wrote a small test function that read from the array 1 million times in a loop `word = array[0];` Timed using `time_us_32()` at start of loop and diff taken once finished. This seems like a very long time to read data from an array.
- Stock speeds are 0x12 = 18 n64 cycles @ 62.5MHz.