-
Some more design changes
12/21/2016 at 06:36 • 0 commentsSo I once again was not happy with my design, so I tweaked it a little.
- I have enough information to use the PIII as the CPU, so that's staying.
- I'm upping the RAM to 128 bit, but lowering the frequency to 667MHz. This still increases the bandwidth to 10.6 GB/S.
- I'm ditching the CD drive entirely and having games only boot from SD cards or the hard drive if one is installed.
- I'm making the console somewhat modular. The power supply will be removable and the AV out will be VGA format on a custom connector, but you can use adapter cords to convert it to any other format such as composite or HDMI.
- I'm beefing up the GPU a bit. I'm now going to use an Artix 7 FPGA and upping the texture cache to 4MB. I'm also considering to remove having half the cores run out of phase from the others.
- The fan is now a 5V laptop fan.
-
GPU architecture Part 1
04/19/2016 at 00:04 • 0 commentsI got more on my GPU architecture planned out, and I have come up with (in my opinion) a very efficient design. Instead of having all 8 cores running at the rising edge of the clock, half of them will operate at rising edge, and the other half at the falling edge. This way the GPU is always doing something. Next up, recall that each core has it's own code cache. Well that cache will operate at the opposite edge of the core's clock. So when the core sends a data request to the cache, the cache will process that request when the core settles into the next clock, so the data is ready when the next cycle begins. The texture cache will do the same thing, but I'll have to find a way to compensate for the fact that half the cores are running out of phase. Also, due to chip limitations, there will be 2MB of texture cache instead of 4.
The GPU will also have two sets of FPUs. One will be a big FPU that processes less common operations. Then each core will have their own smaller personal FPU to handle more common operations. Both FPUs will be pipelined to allow for high frequencies, and the architecture for the FPU will allow an operation to run multiple times at once. So you could spam the add instruction, and the data will come out in the order you put the arguments in (as long as you don't overflow the pipeline). The pipeline registers will also load data at the opposite edge of the clock as the core. So when the core clock goes high, the register feeds data to the next stage. When the clock goes low, the register stores data from the previous stage. I will also try to give each core branch predicting and out-of-order execution, making them superscalar.
I have also decided to not lock the GPU speed to 400 MHz, but instead experiment to see how hard I can push the Spartan 6 FPGA before I get graphical glitches. If anyone has anything other ideas to improve the architecture, then by all means let me know and I may implement it.
-
Reconsidering possible CPU options, and changing RAM setup.
02/26/2016 at 05:39 • 0 commentsI have done some thinking, and realize that I don't have the required docs to interface an FPGA with a PIII correctly. So I had an idea: if I really can't use the PIII, I'll just save it for later and implement my own CPU on a separate FPGA, or I could use one FPGA and make some sort of SoC. That way, I can work with a familiar architecture and an instruction set I know, because I made them. But I'll try my best to use the PIII. Perhaps I could strike a deal with Intel. Or maybe I could use a PowerPC cpu, or even an ARM core.
As for the RAM, having it be socketed would not give the required memory bandwidth I need. So the RAM will instead be 512MB of DDR400 RAM, split across 8 32bit chips. This will give a bandwidth of 12.8 GB/s. That'll be able to handle a GPU and CPU at the same time no problem.