Need hardware rasterization... Architecture needs an upgrade
10/11/2020 at 02:30 • 3 commentsAt 640x480 pixels, at 60 FPS, you have to write about 18 million pixels per second.
So each pixel needs to be rasterized and written to RAM in only 5 clock cycles (if clk=100Mhz). There is absolutely no way I can do that in software. Having 64 ALUs will help me calculate the positions of all the fragments in the needed time, but not actually write the fragments to RAM.
Suppose hardware rasterization is supported. The CPU + ALUs are computing the next set of fragments while the rasterizer writes pixels to RAM. It would barely be fast enough to handle two triangles covering the whole screen. Of course, you could have more triangles occupying all of the screen, or several triangles in a confined space overlapping. But what if you have a 3D world with many overlapping triangles? It's possible for more fragments to be created than there are pixels on the screen - and all of them need to be compared with the depth buffer. Although only a subset get written to RAM, the comparison takes time...
There are a few options:
- Decrease framerate
- Decrease resolution
- Increase clock speed
- Probably requires new, faster external RAM (hard to add to my board)
- Highly parallel comparing and writing
- Potentially requires more memory to be used in parallel
- Increase clock speed, but increase memory bandwidth by adding external RAM
- Certainly requires more memory
I can add more external RAM to the Mercury board, but it may end up being slower or more expensive, because it would have to go through a 5v level-shifter built into the board.
I am not interested in rolling my own FPGA board because I lack the production capabilities. I could make a carrier board for the Mercury 2 though, and I probably will.
Proposed Rendering Pipelines
10/10/2020 at 04:31 • 0 commentsWhy would I look at this before having a fully functional processor?
I need to know what operations I will be doing the most, and perhaps even write the software that will be rendering, before I design the instruction set. By understanding the program I'll be running I can optimize the processor for that program.
There are three main ways I want to be able to render:
- Rasterizing my 3D primitives
- Ray tracing 3D primitives
- Rendering 2D textures/graphics
Here is my idea:
- Create a depth buffer somewhere, storing distance from the camera
- For each triangle, project the three vertices
- For each pixel in that triangle compare with the depth buffer, overwrite if closer to camera
- Also compute fragment color and write to color buffer if closer to camera
- Swap the video generator's address with that of the color buffer
- Old color buffer is new image, old image is new color buffer
- This achieves double buffering
Ray Tracing
Not sure if I ever want to really program this, but I want the option.
- For each pixel
- Find the nearest triangle (if any) that would render on that pixel
- Compute the color of that pixel using the triangle (or lack thereof)
- Write the color to the pixel
2D Images
Here we need the option of rotating images and computation of depth (think 2D games like Starbound). Possibly smooth lighting or cool graphical effects.
- Create a depth buffer and color buffer
- For each sprite instance
- For each pixel in the sprite compute a rotated position for it
- Shade and write color to the color buffer
- For each pixel in the sprite compute a rotated position for it
- Swap the video generator's address with that of the color buffer
- Old color buffer is new image, old image is new color buffer
- This achieves double buffering
This is surprisingly similar to the Rasterization pipeline... Am I doing something wrong? Please let me know if I am, I'm somewhat new to this :P
Main Processor finally works!
10/09/2020 at 15:53 • 0 commentsI started this project a few weeks ago, so I already have a lot working. Let me catch you up...
It took me most of the time to make all of the components in VHDL and test them in the simulator. When I finally got to flash the board and see the VGA output, the video worked the first time, but there was no sign of the processor working. Which was particularly odd because it worked fine in the simulator...
My program (written in binary :P, so the code is not intelligible) essentially boiled down to this:
Load 0xFF002000 into every 32-bit block in 256 bytes somewhere in video RAM while(1);
That first line essentially told the ALUs to load that constant 0xFF002000 and store it into RAM 64 times. It didn't work because the processor basically told the ALU to cancel the operation before it started :(
After fixing it, some order can be seen in the sea of random pixels:
Now we can write in large blocks to video memory, we can *probably* do any math operation we want, but the design is not turing complete. Next time I want to make something more interesting happen.