Tile loading is the heart of the demo & a piece of diabolical assembly language. Made some diagrams to try to predict the performance of different algorithms.
A 6 tile cache has to choose between loading 2 extra vertical or 2 extra horizontal tiles, based on the player's heading. The player can still turn 180 without stuttering.
If the player turns left in this moment, it'll stutter as it loads 2 horizontal tiles.
A 9 tile cache is the minimum for stutter free scrolling in all directions. If scrolling is limited to 5fps, it'll have 4 seconds to load 2 new tiles for X movement & 2 seconds to load 2 new tiles for Y movement. It still has to prioritize loading order based on heading.
There could be a way to RLE compress each row independently. 9 tiles would be stored compressed in roughly 9000 bytes. The center 25 rows & 120 columns would be decompressed into 7500 bytes to allow fast X movement. Y movement would decompress 1 row at a time.
It would require a 25 byte index in each tile, containing the compressed size of each row. There is a memory fragmentation problem from using variable sized tiles. Each tile would have to be allocated the size of the largest tile, probably 1000 bytes. Concessions will always have to be made in the artwork to achieve reasonable compression. That gives 16500 for the world map, 20000 for the bitmaps, 29000 bytes free.
The original idea of always having 6 uncompressed tiles would burn 15000 bytes, be prone to stuttering, & couldn't do diagonal movement.
If it had 9 uncompressed tiles, it would burn 22500 bytes & leave 23000 bytes free. 9 seems to be the minimum for stutter free movement in all directions. The on the fly decompression would be so complex, it's worth knowing how much free memory is really needed.
To free up 10000 bytes, the screen could be shrunk to 38x23 to make single buffer scrolling less garish.
---------------------------------------------------------------------------------------------------------------------------------------------------
Like young lion, division & multiplication operations in 6502 assembly have proven daunting. There are easily searchable solutions, the basic ROM has math functions, but it's still more memory efficient to use loop subtracting & loop adding.
RLE decompression is definitely a buster in assembly language. Young lion never heard of it or envisioned using it. He did know some kind of encoding was used to compress redundancy in newer PCX files. The speed requirements definitely require it. Multicolor bitmap mode was not in his plans either. Young lion would have needed a higher level language for the RLE compression, not that any of the world creation tools were attainable.
In a modern development environment printing all the output to the console, the commodore looks like an ordinary UNIX box. That's the same kind of output the GEOS developers would have seen 40 years ago.
It prints out the [tile number]-[assigned buffer] in the 9 tile cache positions. After much debugging, it was able to load the 9 tiles of the cache in 15.5 seconds or 1.7 seconds per tile with no concurrent scrolling. Not quite fast enough to scroll vertically at 5fps. Maximum vertical speed would have to be 3.6fps & horizontal speed would be 5.8fps.
It takes a lot of instructions to load tiles into shadow RAM above $d000. It has to disable interrupts & write the port register to write every byte, then re-enable interrupts & write the port register to read the next byte from the I/O addresses above $d000. It might be faster to load a complete sector into a 256 byte scratch area before decompressing it. There's a 1528 byte gap above $c000, imposed by the VIC-II bank boundary.
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.