The memory systems of Altera Cyclone5 FPGAs have various features and
limitations. I will not talk about the HPS side here, only the FPGA
side. Memory systems include:
- M10K blocks on Cyclone5 SE A5
There are about 390 blocks, each capable of holding:
1-bit x 8K, 2-bit x 4K, 4-bit x 2K, 8-bit x 1K, 16-bit x 512, 20-bit x 512, 32-bit x 256, 40-bit x 256.
There are pipeline registers on data, address, write-enable, so a M10K block read takes 2-cycles.
Dual port read/write is supported. - MLAB blocks
Up to about 480 blocks, each holding 16, 18 or 20 words of 32-bit data. - Logic Element Registers
Up to 128,000 bits of memory, but this uses general logic elements very quickly. - Qsys-attached startic RAM (M10K blocks)
Easy to use, bus attached memory, which can be accessed from FPGA and HPS
Size is configured in Qsys and uses the pool of available M10K blocks. - Qsys-attached external SDRAM
Easy to use, bus attached memory, which can be accessed from FPGA and HPS.
Configured as 32 Mwords of 16-bit memory (64 MB)
These are explained in several documents.
I have tried here to show specific examples of memory use in realistic state machine schemes. Refer to:
- Altera Recommended HDL Coding Styles section on Inferring Memory Functions from HDL Code.
- Recommended Design Practices for FPGAs: Latches, clocks, metastability.
- Advanced Synthesis Cookbook All kinds of cool mathmatical and logical structures.
- CycloneV Memory Blocks: M10K memory
- Internal Memory (RAM and ROM) Users Guide
- On-Chip Memory Implementations Using Cyclone Memory Blocks
M10k/MLAB on Cyclone 5.
The HDL style Guide suggest the following code to infer M10K or MLAB blocks
Timing diagrams:
You can set up M10K blocks in at least 3 ways:
- Infer the block memory directly from Verilog
- See HDL style Guide
- Mohammad Dohadwala wrote the following to construct a 512 word 18-bit memory in one M10K block.
This code simulates correctly in Modelsim because he wrote a single-cycle delay in the read enable signal.
For synthesis you would use the version in HDL style guide 12.16 above, not this code.module RAM_512_18( output reg signed [17:0] q, input signed [17:0] data, input [8:0] wraddress, rdaddress, input wren, rden, clock ); reg [8:0] read_address_reg; reg signed [17:0] mem [511:0]; reg rden_reg; always @ (posedge clock) begin if (wren) mem[wraddress] <= data; end always @ (posedge clock) begin if (rden_reg) q <= mem[read_address_reg]; read_address_reg <= rdaddress; rden_reg <= rden; end endmodule
- Use synthesis comments in Verilog to force memory allocation
- Use the Altera IP library Memory -- users guide. There is some evidence that the simulation code
generated by the IP handler does not correctly handle the one-cycle read delay in M10K blocks.
Memory block Example -- Qsys sram and MLAB
This example simultaneously tests the floating point routines, shares a M10K between HPS and FPGA, and uses a MLAB block to increment a counter. The M10K block is instantiated in Qsys as dual port memory with two clocks, system clock, and an 50MHz clock. One slave port is hooked to the HPS bus and the other port is exported to the FPGA fabric. These exported signals appear in the computer system template as:
// SRAM shared block with HPS .onchip_sram_s1_address (sram_address), .onchip_sram_s1_clken (sram_clken), .onchip_sram_s1_chipselect (sram_chipselect), .onchip_sram_s1_write (sram_write), .onchip_sram_s1_readdata (sram_readdata), .onchip_sram_s1_writedata (sram_writedata), .onchip_sram_s1_byteenable (4'b1111),
The state machine in Verilog can read/write to the same block as the HPS, which thus acts as a communication channel. The program running on the HPS writes floating point values into the sram. The sram state machine reads the memory location in sram, then write back the value to another address, which is read by the HPS...
Read more »