Close

CGIA/VDP Thoughts

A project log for Kestrel Computer Project

The Kestrel project is all about freedom of computing and the freedom of learning using a completely open hardware and software design.

samuel-a-falvo-iiSamuel A. Falvo II 05/15/2016 at 06:080 Comments

While I let the CPU requirements bake a little bit, I guess I'll spend the remainder of my weekend thinking on the CGIA. When I asked if I should pursue VDP or CGIA here on Hackaday roughly three days ago, I got no response at all. When I asked the same question in a Twitter poll, I received exactly one vote for VDP and one vote for CGIA. So, with everyone literally divided equally or simply not caring, I guess I'll just have to play with each design to see which I like better and go forward with that.

After some thought on the subject, I think I have an idea of how to add VDP-like sprite pre-processing to the CGIA.

According to VGA timing specifications, a 640-pixel display actually has 800 pixels edge to edge. (The unused pixels are border and horizontal sync times.) These pixels are clocked at 25MHz in the Nexys-2 version of the Kestrel-3. The Nexys-2 RAM has a 14MHz bandwidth limitation, so accessing it at 12.5MHz is best we can do without bizarro clocking or PLL tricks. We still maintain 25MBps throughput, though, since the path to RAM is 16-bits wide. For this reason, every access to external RAM (a "transfer") takes the same amount of time as two pixels on the screen. (Ironically, this is exactly the case for TI's VDP as well!) Put another way, one horizontal scanline on the monitor corresponds to 400 transfers to external memory.

Best Case: Monochrome Without Sprites

To display a line of monochrome video, we need to fetch 640 / 16 = 40 half-words of memory from RAM. This leaves a total of 400 - 40 = 360 transfer slots available for the host CPU or other bus masters I may add later on. This is how the MGIA works today.

Worst Base: 256-color Display Without Sprites

With monochrome requiring only 40 transfers out of 400, we can actually get by with a 640 pixel-wide 256-color display. That would require 320 transfers, leaving 80 transfer slots available to the CPU or other devices.

As with the Commodore-Amiga computers, the deeper your color depth, the more drag will exist on the CPU.

Adding Sprites

I think it's prudent to allow for at least two sprites even when the video bandwidth is maximally consumed. These sprites would be used for a mouse pointer and a text cursor, respectively. Let's further simplify the problem and say these are monochromatic sprites, just like the TMS9918A VDP and VIC-II have (when sprites are not in multicolor-mode).

Assuming we have 80 transfers left on a horizontal line, and assuming our sprites are 16-pixels wide, and we have two sprites on the same line presently, it follows that fetching the video data for these sprites will require two additional transfer slots. Our worst-case budget is now down to 78 transfers.

Of course, to even decide whether or not a sprite is visible on this scanline, the CGIA would need to fetch a set of sprite Y coordinates from memory. This activity would also consume two transfer slots, thus leaving us at 76 transfers left.

We need to know where on the line the sprite appears, and its preferred color. To accomplish this, we need two additional transfers per sprite, leaving us with 72 transfers left.

Or, to put it a different way, each sprite that is visible on a scanline would require four transfers: one for the Y coordinate, one for the X coordinate, one for the color to show the sprite in, and one for the sprite's raster data. With a worst-case budget of 80 transfer slots available, it follows that we can actually allow up to 20 sprites on a single horizontal line at once.

Obviously, if we don't drive the display with a 256-color screen, that leaves a whole lot more time slots available. For example, with an 16-color backdrop, we're left with a budget of 400 - 4*40 = 240 transfers left over for sprite handling. That's enough room to show up to 60 sprites on a single scanline.

Coordinating Bus Accesses

The CGIA fetches data in bursts. For the backdrop, it fetches an entire line's worth of data in a single request for the bus. The length of this transfer is configurable; let's say N half-words for now. After reading N half-words, it stores its findings in an internal line buffer.

Sprites would have to work the same way. The VDP works by interleaving different kinds of transfers. This will not work with the CGIA; interleaving individual memory accesses will require very sophisticated state machines. Treating each phase of the display as a separate bus master is much, much easier.

After the background fetch, the next step is sprite pre-processing. This phase involves reading M half-words from memory, each corresponding to a sprite's Y coordinate on the display. Like N, M would also be configurable by the user. To turn off sprite processing entirely, one would simply set M to zero. For each sprite whose Y coordinate potentially makes that sprite visible on this scanline, we queue the sprite number for later processing.

So far, we've used N+M time slots. If no sprites are visible on this scanline, then that's all the time we actually use. Otherwise, for each sprite that sits in the queue, we need to now load the sprite data, color, and horizontal position registers. This step is not under programmer control, since the number of cycles consumed in this phase will be determined by how many visible sprites were discovered above.

If 0 <= V <= M, then the total number of time slots taken is N+M+3*V.

Conclusion

I've shown how one computes time budgets for video display subsystems. This knowledge gives the programmer the know-how to decide ideal color depth, resolution (although I didn't explain it), and number of sprites on the screen at once. As long as N+M+3*V <= 400, the CGIA should have no problems displaying your images.

Support for pattern graphics is not considered. The proper way of supporting pattern graphics without a lot of undue overhead remains elusive. Maybe after a good night's rest, I'll come up with some ideas. Otherwise, if worse comes to worse, there's always the VDP approach to doing things. :)

Discussions