There are a lot of howto articles about building VGA clocks in FPGAs. My favorite is the Pong Game. More complex for me was how to take that VGA clock and use it to build a true framebuffer for a CPU. This introduces two new challenges related to memory bandwidth and multiple clock domains. I'll describe how I implemented my framebuffer in Verilog in a later article, but here I thought I'd share one of the interesting things I did when porting DOOM that was made trivial on an FPGA platform.
The native DOOM code uses a display 320x200 in 8-bit pseudocolor. What this means is that there is a palette of 256 slots that have the actual 24-bit RGB values in them, and the framebuffer memory only stores the index into that palette map. The framebuffer uses that index to determine what color to display.
My initial framebuffer design was to have a 640x480 display, and that's still what is presented on the VGA port. The challenge I ran into with DOOM was that when it wrote to the framebuffer, the output wouldn't look right because it was drawing two lines of output to every line on the screen.
The solution in the DOOM code is that they have logic to do pixel doubling - they in effect copy the pixel value for all even numbered pixels (x,y) to (x+1,y), (x,y+1), and (x+1, y+1). That's great when you have really fast memory, but in effect you are sending 4x the amount of data you need to the frame buffer, which is pretty inefficient.
The solution was to add a new video mode for the framebuffer so that it did the pixel doubling in hardware. It was a fairly easy change, essentially just ignoring the low two bits of the cursor position when pulling data from the framebuffer. This made screen updates much faster, and the net result was exactly the same.
I'll post a video about the VGA framebuffer and do a walkthrough of the code soon. The relevant code is vga_master.
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.
What you describe is essentially the method I've implemented. The VGA controller is a bus master, and an arbiter allows the SSRAM bus to be shared with the CPU. I kept this video bus separate from the primary system bus so that contention would be kept to a minimum. I'll put a system block diagram up under the Bexkat1 project later tonight that will give you a better picture of the system architecture.
There's a little complexity syncing on the horizontal refresh as a trigger to load the FIFO, but that's about it. The FIFO is loaded with two copies for each input to do the doubling, and it only increments the external memory base address every other horizontal line.
Once I get sufficient materials up so that someone else can build the code, I'll expand on the design, get some videos and diagrams up, etc.
Are you sure? yes | no
Thank you :-)
Are you sure? yes | no
Hmmm I'm not sure to understand everything. A block diagram would help :-)
My approach would be to have the 320*200 framebuffer in main RAM, have some DMA poll it to fill a FIFO (8-bits wide), feed the FIFO on the other clock domain to the 256*18 LUT (yup, VGA limits to 6bits per component) and that's it...
Line doubling would be a compromise between FIFO space and bandwidth use. In the case of ample RAM bandwidth, the line can be scanned twice (DRAM banks are still open). Otherwise, scan a line once and send the 320 bytes to two parallel FIFOs, and read one FIFO then another (this repeats the value).
Of course, many options are possible (my double FIFO idea wastes space) and I don't know what you can do.
Are you sure? yes | no