Project | Unlimited tile world on a Commodore 64

« Back to project details Sort by:

Concessions for a space fighter game
09/24/2024 at 06:00 • 0 comments

Seem to recall the sprite API being young lion's least favorite thing to program & character set API being his most favorite thing to program 40 years ago, even if every effort was just scrap. Sprites just don't leave much room to be hacked, other than multiplexing. Sprite multiplexing was actually rarely used in the best looking games.

Another game demo used a scrolling character background purely as eye candy.

Character sets had all the diabolical algorithms & were manageable from BASIC. Despite hardware acceleration being seen as increasing over the last 40 years, the C64 really was the most hardware accelerated thing through many more years of later PC's. Stuff was pretty dictated by the hardware API's & sprites were about the most managed API. It was like the apollo program before taking a big step back.

Multicolor sprites actually shared 2 global colors. It's surprising young lion found this more appealing than CGA programming, but the 8086 only had 4 colors & he just didn't have an 8086 assembler or documentation. The lack of tools was appreciated less than the lack of colors & the lack of hardware acceleration at the time. By age 20, young lion began appreciating software defined graphics more than the hardware accelerated graphics of years earlier because of portability.

8 bit guy had a nugget about retro computing clubs. Maybe there was a retro computing club which could run these demos on period hardware, but the bay area has become so focused on just getting rich, no-one does this stuff anymore.

---------------------------------------------------------------------------------------------------------------------------------------------------

It came to pass that a game effort would have to scrap some of the scrolling demo. In order to solve the flickering, the $d800 color memory would have to be static. It didn't provide that many more colors. In order to draw a score panel, the bitmap needs to be 24 rows. A raster interrupt while accessing the disk would require checking the raster line ($d012) before trying any disk I/O. This would further slow down the scrolling.

Without the color memory, RLE compression only buys 500 bytes or 2 sectors. It's probably not worth it.

A maximally enjoyable game would have to go back to character set graphics to speed up the scrolling & offer some sort of obstacle course. It couldn't justify the disk loading algorithm anymore. It's not necessary yet.

Because of the page flipping, the sprites have to be copied to 2 VIC banks. Ideally they would be copied to the same pointer values in both VIC banks but in 1 bank, we have $9000-$a000 hitting the same pointers as $5000-$6000 in the other bank. $9000 is the character ROM. That leaves $4e80-$5000 & $8e80-$9000 for 6 sprites. The $4e80 cuts another 3452 from the program memory.

Another 3 sprites can fit in $7f40-$8000 & $bf40-$c000. That way it can show all 8 sprites with 1 more sprite as a temporary for copying in new data.

Things would be better if the 2 tile buffers were fragmented. They could be fragmented by looking up the high bytes in a table instead of doing a straight 16 bit increment but that increment is done in the big unrolled loops so it would be super slow.

Storing the screen memory separately from the bitmap memory would free up another 1000 bytes for the program, but so far is unnecessary.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------

The difficulty of basing a space fighter game on the scrolling demo & the reality that any real demo using this system wouldn't work on 99% of the 1541 emulators & would quickly wear out any real floppy disks led to an idea to break out the space fighter game into a new program. This new program would just be a tried & true sprite demo with no background. It would just have the scoreboard. Enemies would fly from right to left. The player would be stationary. The investment would be low enough to be worth doing & it could be a recaptured experience in sprite programming. If that worked, a 2nd step could be a simple character based scroller for the background/obstacle course. Old lion concluded what young lion found: the C64 didn't have enough horsepower to use its own bitmap mode most of the time.
Sprite concepts
09/09/2024 at 01:58 • 0 comments

Sprites had 24x21 resolution. 2 abreast multicolor mode is still 24x21. A flame costs 2 more, but isn't always on. Memory accounting is a difficult problem. In both bitmap banks, the sprite pointers can access 7168 + 192 bytes. It takes more memory but is simpler to have a copy of every possible sprite in both banks. It's a lot more efficient but more complicated to memcopy the sprites to static pointers in realtime. Disk I/O would pay for the memcpy. The memcpy option could perform most rotations at runtime. The dynamic pointer option requires every possible rotation to be stored twice in RAM so it almost has to use memcpy. It needs 16 sprites or 1024 bytes in each bank since they all have to be page flipped.

Rough concepts were pretty bad. It's kind of intelligible when zoomed out. The journey began with screencapping the website animation with rotations. It was decided to not reverse the lighting when it was upside down to save memory & because it didn't add to the intelligibility. It is just a matter of mirroring & rotating the 4 base images to get flat spins, landing burns, strafing. It also needs RUDs. This stuff is very labor intensive for the result. It would be easier to not shade it at all & not roll it. A low fidelity version might be in order for starting.

Young lion was similarly surprised by how bad sprites really looked in 1985.

A lot of the delusion might have been from assuming the C64 ports of these games were faithful to the arcade versions, but they might have also looked a lot better on a CRT. Young lion also had some idea game programmers were working within budget constraints so they couldn't max out the machine. Aligning players horizontally was particularly crippling. There's no way to multiplex horizontal sprites.

Some fake CRT blurring made it slightly more bearable.

The last time a ship flame was photographed was many years ago. The flame needs to animate faster than the ship. It's easier to use 2 double height sprites than a single double width. The only times it can realistically have a flame are the landing flip & the flat spin.

The big problem is developing a sprite engine. It's mane feature is a table of the desired image address, transformation, x, y, visibility, colors of each sprite. It needs to compare a table of new sprite states with a table of current sprite states, copy the desired image data into the 2 banks outside the raster interrupt, copy the new state table to the current state table inside the raster interrupt, then use the current state table to update the VIC II registers inside every raster IRQ.

A general purpose sprite engine was another snag encountered by 10 year old lion. In this case, there's no multiplexing. It's a memory management & queueing problem. As much as young lion believed multiplexing was essential in making any game look good, it actually would never have been practical for his ideas.

On top of the sprite engine, it needs a higher state table for the animation frames. If only real starships didn't do so many flips.

Cinelerra can screencap an emulator & apply blurring, bobbing in realtime, at 60fps. It definitely improves the photo rendition. Bobbing is imperceptible unless it's 4x & the colors are high contrast. The memory is lost of the actual blurriness of a TV. There is an intermittent sheering problem with the bobbing. Unfortunately, trying to debug the bobbing effect with a flashing white image damages the monitor. It now has permanent flashing lines.

The problem isn't temporal. Bobbing works properly when all the rows are a repeating pattern. It's a spatial problem when the rows are unique.
Enhancing the bitmap conversion
08/20/2024 at 18:57 • 0 comments

There were some attempts to make the world map more legible & wider. Grey scale looked the best of all methods but still terrible. Shades of red were worse. There could be pseudo colors made of checkboard patterns. A checkerboard pattern can compress if every other byte is a ROL.

It's pretty hard to make the world bigger in 1 direction. It seems if the tile buffers underrun, it currently just crashes. It tries to start a new tile read every time it scrolls 40 columns, but there's no way to interrupt a disk read so it crashes.

A downtown map was similarly bad. Saturating the image helps, but it seems either the palette is too limited or the color selection is just horrible.

The image conversion is the biggest area which needs improvement. A problem with the nearest color algorithm could be making grey look better than red.

For a mars based game, another idea is monochrome bitmap mode, using dithering to get 3 shades of red. The starship would be shades of grey. Helas, that looked really ugly in "the great escape".

Some guys used genetic algorithms to improve the color matching. https://www.syntiac.com/tech_ga_c64.html

For a static mars landscape with shades of red, manual palette matching would be best. If the palette is just 4 shades of red, color memory can be static. The commodore had 5 shades of red including black & 5 shades of grey including B & W. Static color memory could define only 4 colors & would only save 500 bytes. 5 colors could be defined in 500 bytes of color memory if 3 of the colors were fixed. The non reusable aspect, the limited gain & the loss of color substitutions make the full 1500 byte color memory preferable. RLE compression would do most of the optimizing. Experiments would have to be done with 5 & 4 shades of red.
Compared the hardware palette with a custom palette & random dithering. The custom palette gave the most intelligible results by having less contrast. Random dithering is required for anything bearable. The softer pixels of a real CRT might help.
The compression was still 8% in the worst case & 39% for the manely blank tiles.

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

There was another idea where this type of passive scrolling background could serve a purpose as a timing indicator for a simple racing game. At certain points in the map, enemies would appear & the player would need to take certain actions. There would be a finish line indicated by a landing pad. For now, dodging the enemies & doing a landing flip at precisely the right time would be the key maneuvers. The enemies would try to push you off the map or ram you. You'd have a flat spin & landing flip to knock down enemies, sideways movement, drifting, & a final landing flip. Knockdowns would shake the screen.

Vertical scrolling would have more sideways range & look more intuitive. It would still be slower. Lions dream of a liftoff level & a landing level.

It's a tantalizing idea, but lions are just no good at creating game graphics. Converting a photo to a bitmap might be attainable but player art was a disaster 40 years ago as now. There's no AI for commodore player graphics. Getting enough colors would limit each player to single sprites. Most games got away with single sprite players.

A spitter post had some low fidelity player graphics.

https://x.com/Flight5starship/status/1830332428206108775/photo/1

That's about all the detail it could have. It would take 4 multicolor sprites in 2x height mode to define the player & flame. Another 4 would define an enemy & flame. 2x width mode didn't have useful resolution in multicolor mode. That leaves just 1 enemy at a time. It would be a kind of fight game like street fighter, but with spaceships. Space fighter.

Smooth, sheer free scrolling is the only reason it needs 20,000 bytes for onscreen bitmaps & all the interrupt disables. It could fit a pretty big 4 tile world in RAM with lower quality scrolling.

---------------------------------------------------------------------------------------------------------------------------
Useless side scrolling demo
08/08/2024 at 17:55 • 0 comments

As a pure useless demo, lions still think a hard coded, single direction scrolling bitmap with concurrent disk reads feeding the scrolling would be damn impressive looking. Don't think it can ever be detailed or fast enough for any kind of game but it could be an interesting visual if the detail was maxed out & it moved at the maximum IO speed. Thinking the demo would pan over valles marinares. It would need more than 25 row tiles & a script specifying all the tile locations, scrolling keyframes to support diagonal scrolling. For a 2D race track game, it could work.

In reality, it would have worn down a real disk pretty fast. Nowadays, 5.25" disks are irreplaceable so it might not be very practical except as a thought experiment. The drive head & worm gear could probably last forever. It could store the map in multiple locations to level the wear.

Some example photo conversions from https://www.digartroks.be/img2c64mc/index.php show what it would look like. The undithered version seems more playable in a game since the players show up more. Not sure what the yellow on the canyon floor is. The converter almost needs some kind of color balance sliders to try to aid the color mapping.

A simple map compressed into 5 tiles with 9500 bytes compressing down to 7526 bytes in the worst case. A test load of the worst case took 13 seconds or 3 columns/sec maximum scrolling speed.

Still somewhat readable with the colors converted to commodore.

As far as contiguous memory for offscreen bitmap tiles, there's a 9500 byte block at $dae0 & $35e0 which leaves 11744 at 0x0800 for the program & a few bits elsewhere.

Much hacking yielded a 3fps, 1 cell at a time scroller that concurrently read tiles from disk. Definitely potato cam in the emulator. This was 3k of program memory & 29440 bytes of world memory on disk to store 5 tiles. Another 20 tiles could probably fit. It doesn't support underruns but hard codes ping pong buffer swaps 3 times per second. The tricky part is getting it started, where the disk needs to temporarily get 2 buffers ahead & then fall back to 1 buffer of lead time.

It gets within 2 frames of underrunning the largest tile. In reality, the disk speed was quite non deterministic. The seeks were fixed stepper motor times but the sector reads weren't guaranteed. The problem is much easier when scrolling always goes 1 way at a fixed speed.

Making smooth scrolling would be the tricky bit. It needs to scroll a single pixel in an interrupt handler. Every 8 pixels, it needs to switch VIC banks in the interrupt handler, then make the mane loop scroll the color RAM 1 cell & then scroll the next bitmap & screen RAM 1 cell. Since there's no VIC bank for color RAM, the color RAM scroll for the current position needs to be done after switching banks while the bitmap & screen RAM for the next position is done before switching banks. There's an unavoidable color glitch when switching banks which is still visible on the gootubes.

A few hours later, it was smooth scrolling. The mane problem with smooth scrolling was page flipping inside the raster interrupt. All the disk I/O uses the same CIA register $dd00 as the VIC bank switching so all the disk I/O needed to selectively disable interrupts to be concurrent. The glitches from copying color RAM were definitely pronounced. Confirmed the glitches were not from debug prints.

The mane disappointment now is world map looking like a turd & being only 5 tiles. Only 3 tiles actually have to be spooled from disk. Text would be the most visible art but wouldn't justify bitmap mode.
It is pretty much what young lion imagined building a game around. It took lightyears more effort & tools than he could have ever mustered. Not sure how exciting a jupiter lander it would be without some opponents. It would be too slow for the world map alone to be an obstacle course.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Any tiled game idea was far fetched at best.

The mane problem is the faster it scrolls, the less detail it can load. The more detail it loads, the slower it has to scroll. There is no happy outcome.

It needs to be limited to scrolling 1 way horizontally & have only 1 offscreen tile. This would allow it to read 40 columns ahead & have at least 5 seconds to load a new tile at 8fps. The tiles could be expanded to 35 rows to allow some vertical movement or diagonal movement but it would have to be a rail shooter. Many games were horizontal side scrollers but none had unlimited tiles. A starship jupiter lander with a Mars canyon comes to mind. The reduce the speed, it could have opponents, targets on the ground, or a race course inside the background image.

A reasonable amount of detail would probably need full bitmaps so you'd need the 2 page flips (20,000 bytes) + 1 offscreen bitmap (10,000 bytes) for 25 rows. 35 rows would require 2 offscreen bitmaps & not fit. It would have to store just the 10 hidden rows of the visible tile & swap them with visible rows. That would need 18000 offscreen + 20000 onscreen bytes.

Compression remanes a real problem. For a minimum viable game speed, 2560 bytes of IO are likely the limit for each tile or 1/4 of a visible bitmap. Large areas of flat colors, a flat area like a canyon floor or a sky could be a big win. The next step might be doubling the pixel height or scaling individual cells. There's not much on the internet about compressing bitmaps. The C64 was not fast enough for any kind of JPEG decompression.

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

After some more ideas, there was still no way a 2D scrolling sandbox would have been possible with a useful amount of detail or speed. The macross demo showed moving around such a low detail world with the keyboard was pretty unbearable under 5 columns per second. It wouldn't be practical for a flying game.

A vector line art system would probably rely on a character set but with character based line segments procedurally drawn 1 cell at a time, filled areas procedurally filled 1 cell at a time. It would escalate in computational complexity about as much as it de-escelated in IO & probably be equally slow. It wouldn't be able to store more detail.

A sandbox world would have to move 1 page at a time. Art would have to be aligned to the tiles. There could be a progress bar. Each page could then have its own character set & more detail. It's pretty much what all sandbox games already did. Not sure what benefit it would have. A bitmap defined by a unique character set is 4600 bytes. It's kind of a novel design which combines the compression of a character set with the colors of a bitmap.

An object oriented world might work. The tiles would be much larger but defined in much less memory. An SDF-1 object, carrier object, building object, tree object could be memory resident. Instead of memory resident tiles, it would render the screen on demand. Tile memory would go to object storage.

Buffering 9 tiles is probably useless. It has to be constantly loading from disk & the player has to be moving across those tiles instantly. Blue Max, zaxxon obviously used object oriented worlds. Young lion didn't appreciate what a triumph in linear world design Blue Max was.

A robotech game like what was envisioned would have to be made like Blue Max, a rail shooter from the carrier horizontally across key points on Macross island, all memory resident. Another alternative would an FPV flier like Buck Rogers but with more objects. The carrier takeoff from the TV show could be replicated in high fidelity & be the basis of the game It definitely wasn't what 10 year old lion envisioned.
Tile rendering
08/06/2024 at 08:22 • 0 comments

Always seen as the hardest step is tile loading. 5 algorithms are required for just the 4 cartesian scrolling directions & a full page refresh. Diagonal scrolling would probably not fit.

The mane trick with debugging tile rendering was making test pattern worlds.

Much diabolical assembly yielded the 1st B&W screen draws, properly spanning 4 tiles. The tile cache & all the gaps required to align the VIC buffers left 18991 contiguous bytes for the program & just the B&W drawing engine took 6012 bytes. There are actually 2168 & 504 byte gaps between the VIC buffers. Some subroutines & tables could be loaded there. Those would probably be direct sector reads with the fastloader. It's proven more efficient to use the disk as raw storage instead of bothering with a filesystem.

Helas, a bug in makeworld artificially reduced the character set. The real world needed more like 105 characters instead of 55.

SDF-1 started materializing. Tempting to go with B&W. Decompressing color is a big job, but color is what young lion envisioned. The big challenges are reusing characters by remapping colors & unpacking colors. Color encoding was solved failure case by failure case.

The SDF-1 was finally revealed.

Sadly, because the I/O is dictating an extremely slow scrolling speed & the screen drawing is going pretty fast, it might make sense to do full screen redraws for all the scrolling instead of the border routines. The big question is where to do asynchronous disk I/O if it's always accessing the tile memory instead of copying the bitmap memory. It would need to happen between screen redraws.

For the test world, the $d800 color RAM only ended up used in 1 cell for the entire world & it would require a small tweek to work around. Updating the color RAM is going to be extremely slow & burn 4500 bytes for data that's never going to be seen. The color RAM itself only has 4 bits per byte implemented so it's of limited use in another role. If the world had photo quality graphics or dithering, the color RAM could be used. This would use up the character set fast & slow the IO way down. These factors make the color RAM unlikely to ever be used.

If the world had only 2 unique colors + black in each cell, it would still be ahead of character mode. Character mode only allowed 1 unique color in each cell with 3 other global colors. Greeblies are 1 area where color RAM could be important. Without greeblies, the way it is now, it's in the territory of vector line art. A vector line art world would need vastly less IO & probably scroll a lot faster. Instead of a tile cache, the screen or the borders would be rendered from scratch. The entire world could fit in RAM. It wouldn't be as unlimited as a disk world, but the largest world young lion envisioned would fit in RAM.

There's still a desire to use elements which couldn't be drawn with line art, like trees. Those could still be character based objects drawn on top of the line art. The line art engine would just draw lines, filled triangles, & filled rectangles. That could possibly end up slower than character tiles.
A quick demo with keyboard navigation had a lot of glitches but revealed what young lion's scrolling tile world would have really looked like. The initial reaction was it would be pretty hard to navigate through such a small pinhole. It would need a map inset or a lot more detail. It was no more than a static image a player couldn't interact with & with no moving objects. At least bungeling bay had an interactive, animated world.
It was much the same situation young lion ended up in, day after day. He would arrive at a gist of a game engine that wasn't heading towards anything useful & the initial impression ended any motivation to continue with it.
Fastloader bug
08/03/2024 at 21:45 • 0 comments

Noticed the emulator showed unnecessary seeks to track 1 in the middle of reading multiple sectors from the same track. It seeks directly to the right track, reads 1 sector, then seeks to track 1, then seeks back to the right track to read a few more sectors. The ROM source code implies it's a head bumping operation caused by an error & in real life, it would be a familiar knocking.

In lion opinion, the knocking sound was not the head impacting anything but the sound of a stepper motor stalling & slipping out of phase. The head probably experienced much less force. There are no useful recordings of that sound.

After much poking, the Steil fastloader had a bug where address $05 has to be 0 before calling DOREAD or it'll head bump. He used $05 as a counter & either the values he stored in it got lucky or it's another Vice bug. It's unlikely 10 year old lion would have figured that out.

There's a useful table of d64 track offsets not in any goog searches:

http://unusedino.de/ec64/technical/formats/d64.html

The working fastloader did the 9 tile cache fill in 9.1 seconds or 1.01 seconds per tile, with no concurrency. This would allow 6.1fps of vertical scrolling with only the 2 visible tiles read or 4.1fps with a complete row read. Things are going to be much slower with concurrent scrolling. There's also going to be a time quantization where it starts reading a 3rd tile even though not enough time remanes. Since it can't interrupt a tile read, it's always going to load a complete row.

-----------------------------------------------------------------------------------------------------------------------------------------------

The best optimizations lions could come up with were making it load 1 row of the tile cache at a time, storing all the tiles for a row on 1 track, & packing multiple tiles in each sector to reduce the sector reads. It would need an offset table to pack sectors. Currently, the world is limited to 256 tiles & 10 rows of tiles to save on data types.

Sadly, there is no way to abort a sector read. Testing for ATN low to allow partial sector reads would slow down the reads too much.

It's slower to buffer complete sectors than to decompress 1 byte at a time. Buffering a complete sector requires shifting in all the unused bits while aborting a read requires just driving the clock. Because of the way multitasking works, it has to switch the kernal ROM for every RLE code whether the sectors are loaded previously or not.

Packing the tiles only works if they're read from lowest to highest & a complete row is read. It's always going to throw away the start of the 1st sector in each row. It's usually going to slow down sideways movement because now a start & end of a sector is thrown away instead of just the end. It could slightly benefit vertical movement because 1 out of 3 tiles wouldn't require throwing away any data. There's a small chance the packing could reduce seeking enough to have an improvement. The complexity of it makes it a pass.

Sadly, storing 1 row per track was actually slower than packing multiple rows in each track. It seems to cut just enough seeking if the tracks contain multiple rows.

Filling the cache forwards or backwards makes no difference. It seeks more when reading backwards.

---------------------------------------------------------------------------------------------------------------

A hybrid of scrolling & paging seems in order. Allow the player to traverse the entire screen while scrolling. When the player hits an edge, redraw the whole screen. This would result in a more general purpose game engine which did either scrolling or paging.

The long feared tile renderer converged on something that always renders 4 tiles. It determines the 4 visible tiles & draws a hard coded corner of each visible tile. A scratch built tile renderer for each scrolling direction + a 5th renderer which draws the entire screen seems unavoidable.
Tile loading algorithm
07/29/2024 at 18:45 • 0 comments

Tile loading is the heart of the demo & a piece of diabolical assembly language. Made some diagrams to try to predict the performance of different algorithms.

A 6 tile cache has to choose between loading 2 extra vertical or 2 extra horizontal tiles, based on the player's heading. The player can still turn 180 without stuttering.

If the player turns left in this moment, it'll stutter as it loads 2 horizontal tiles.

A 9 tile cache is the minimum for stutter free scrolling in all directions. If scrolling is limited to 5fps, it'll have 4 seconds to load 2 new tiles for X movement & 2 seconds to load 2 new tiles for Y movement. It still has to prioritize loading order based on heading.

There could be a way to RLE compress each row independently. 9 tiles would be stored compressed in roughly 9000 bytes. The center 25 rows & 120 columns would be decompressed into 7500 bytes to allow fast X movement. Y movement would decompress 1 row at a time.

It would require a 25 byte index in each tile, containing the compressed size of each row. There is a memory fragmentation problem from using variable sized tiles. Each tile would have to be allocated the size of the largest tile, probably 1000 bytes. Concessions will always have to be made in the artwork to achieve reasonable compression. That gives 16500 for the world map, 20000 for the bitmaps, 29000 bytes free.

The original idea of always having 6 uncompressed tiles would burn 15000 bytes, be prone to stuttering, & couldn't do diagonal movement.

If it had 9 uncompressed tiles, it would burn 22500 bytes & leave 23000 bytes free. 9 seems to be the minimum for stutter free movement in all directions. The on the fly decompression would be so complex, it's worth knowing how much free memory is really needed.

To free up 10000 bytes, the screen could be shrunk to 38x23 to make single buffer scrolling less garish.

---------------------------------------------------------------------------------------------------------------------------------------------------

Like young lion, division & multiplication operations in 6502 assembly have proven daunting. There are easily searchable solutions, the basic ROM has math functions, but it's still more memory efficient to use loop subtracting & loop adding.

RLE decompression is definitely a buster in assembly language. Young lion never heard of it or envisioned using it. He did know some kind of encoding was used to compress redundancy in newer PCX files. The speed requirements definitely require it. Multicolor bitmap mode was not in his plans either. Young lion would have needed a higher level language for the RLE compression, not that any of the world creation tools were attainable.

In a modern development environment printing all the output to the console, the commodore looks like an ordinary UNIX box. That's the same kind of output the GEOS developers would have seen 40 years ago.

It prints out the [tile number]-[assigned buffer] in the 9 tile cache positions. After much debugging, it was able to load the 9 tiles of the cache in 15.5 seconds or 1.7 seconds per tile with no concurrent scrolling. Not quite fast enough to scroll vertically at 5fps. Maximum vertical speed would have to be 3.6fps & horizontal speed would be 5.8fps.
It takes a lot of instructions to load tiles into shadow RAM above $d000. It has to disable interrupts & write the port register to write every byte, then re-enable interrupts & write the port register to read the next byte from the I/O addresses above $d000. It might be faster to load a complete sector into a 256 byte scratch area before decompressing it. There's a 1528 byte gap above $c000, imposed by the VIC-II bank boundary.
Scrolling with fastloader
07/28/2024 at 05:27 • 0 comments

In testing for emulation bugs, it seems storing the world map on drive 9 caused many problems. Disable drive 9 & the bugs go away. There are some hints that Vice doesn't faithfully emulate the serial bus.

https://www.lemon64.com/forum/viewtopic.php?t=74765

Despite having blanks for 4 drives, there's no evidence anyone uses dual drives in vice.

A quick test in GEOS failed to access drive 9. Drive 9 has to be disabled or it'll get stuck looking for desktop 1.5.

Then after enabling it, you have to run the configure program.

The configure program can't detect it. Drive 9 worked with standard kernal I/O but not with any kind of fast loader. So the world map for testing needs to be in the same .d64 image as the program & a maximum world map is going to require a disk swap.

So in addition to the lack of physical I/O signals, emulation is going to involve compromises in terms of peripherals. An FPGA emulator accessing drives through physical I/O signals might still work. Of course, 1 drive is a more accurate representation of what young lion had.

-------------------------------------------------------------------------------------------------------------------

Some testing showed when ATN is low, the drive always reads low voltage from DAT. When ATN is high, the drive properly reads DAT. That's not supposed to happen. It could be yet another emulator bug. Given limited time & tools, there could be a way to pulse ATN just for synchronization & neglect the printer state. The debugging problem ended up being hard enough to settle on just blinking binary on the LED. The lack of logic probing & endless obscure emulator bugs consumed such a vast amount of time, they put a real emphasis on doing the least that could work.

Big surprise, the Steil fastloader didn't work in emulation. Then came a few attempts at simple bit clocking. Driving CLK from the drive was slower than driving it from the host. Sending 1 bit at a time seemed to be the faster way, since a lot of bit fudging is required to pack 2 bits at a time. Using ATN as a clock had a problem where DAT changes at the same time ATN changes, so 2 bits at a time really have to free run.

Once the many issues were finally resigned to emulation bugs & the fastloading settled on clocking 1 bit at a time, the rest came together pretty fast. Concurrent scrolling & disk reads with the fastloader were lightyears ahead of the stock ROM, even with single bit clocking. Concurrent I/O will always have an impact to the scrolling, but it seems to be reasonably imperceptible now.

Sector reads with seeks of 1 track ran at 1 second. When it seeked 4 tracks, it took 1.2 seconds. Scrolling goes at 9fps during the data transfer. The stock ROM needed 2.4 seconds per sector with 6fps. If it's not reading the disk, scrolling still goes at 9fps. The impact is not detectable in a screenshot. It's intriguing how little time is spent bit banging the serial port.

Attention turned to manetaneing a tile database in RAM.

----------------------------------------------------------------------------------

The double buffered bitmaps suck 20,000 bytes. By packing the color memory, each tile sucks 2500 bytes. It takes 4 tiles or 10,000 bytes to render the current view. At least 12 more tiles or 30,000 bytes have to be offscreen to scroll in 4 directions without stuttering. It became clear that wouldn't fit in 10 year old lion's commodore. 2 offscreen tiles are the minimum & would leave roughly 30,000 bytes free. It could scroll smoothly 1 way. Any change in direction would usually cause it to stutter. Sometimes the player would get lucky. It would be even luckier if 4 offscreen tiles fit. Then the stuttering would only happen in 90 degree direction changes.

The general idea is to have an age for each tile & load the next 2 tiles based on the player's heading. Ideally, instead of stuttering, it would draw black while a tile loads. This requires it to redraw the entire screen after the tile loads, creating a stutter. It needs a full screen redraw anyways when it starts. Eliminating double buffering would free up 3 more tiles. The great escape didn't double buffer but also cropped down to make it more bearable.

A combination of world scrolling & player movement seems inevitable. When a new tile isn't available, the player just moves across the screen. Then there's a warp back to the center when the tile is loaded or a freeze when the player hits the edge. Frame rate is probably going to max out at 8 or 7. It could be pretty unbearable to move the player. Some ideas 40 years ago involved not scrolling until the player was 1/4 screen from the edge, but it wouldn't provide much look ahead.

Then there's the option of abandoning scrolling. When the player gets to the edge, load 1 tile on demand & redraw the screen. Police quest for DOS worked this way.

Most games in those days just scrolled left & right, in which case it would be a rail shooter. It would never stutter & just need 3 tiles in RAM.
VIC1541 fast loader
07/22/2024 at 17:52 • 0 comments

The biggest need was decidedly speeding up ACPTR. This entails running a loader inside the 1541 at runtime. You can't overwrite the 1541 ROM with a shadow RAM like the C64 itself. You have to upload an executable to RAM & call into the 1541 ROM from it. The mane problem is the stock fastloaders are very focused on loading a program in 1 shot, blanking the screen & taking full control. Concurrency would require a scratch built fast loader.

Reviewing the 1541 ROM source code from 50 years ago, it's clear that they were working just as hard as the current generation. They optimized complex algorithms down to every single byte, writing in unintelligible opcodes. They didn't have it any easier than the current generation but faced equally difficult problems. The current problems are the massive size of modern API's, massive numbers of steps required to do the simplest thing. Each generation worked at the limit of what was possible with a certain amount of capital. The limiting factor is the amount of human decisions which can be made in a certain amount of time.

The only 1541 source code is a disassembly with absolute addresses:

https://g3sl.github.io/c1541rom.html

The byte for ACPTR is clocked out at 0xe958. The only delay is 0xe97b, a call to 0xfef3. The only other delays are many debounce routines. They had induction problems. It uses address 0x23 as a speed flag. Some tests with u+, u-, ui+, ui- didn't do anything useful.

The most useful resource was this presentation. The same guy wrote a fast loader using these methods in 2011. His video was in 2021.

https://www.pagetable.com/?p=568

The FCODE segment runs on the drive. The PART2 segment runs on the host. He handles the badlines. All it does is load a sequence of hard coded sectors & sends them using custom bit banging. It doesn't use TALK or LISTEN. It just bit bangs data out of the drive after the OPEN call. There's no debounce code. Lions thus need to add a function which bit bangs a track & sector number to the drive & transitions between reading & writing. The drive would become a simple sector reader.

The mane problem is uploading the program to the drive. He uses some heroic methods to load the firmware directly from disk to the drive's execution space. The trick with this is if the world map is a separate disk in a separate drive, it would entail loading a 3rd disk containing the fast loader or using some world space for the fast loader. The easiest system is to load the firmware from drive 8 & run it in drive 9.

You have to use the M-W command to write drive memory. page 38

https://www.commodore.ca/wp-content/uploads/2018/11/commodore_vic_1541_floppy_drive_users_manual.pdf

Then you have to use the M-E command to run it. page 39

He doesn't use the data channel. All the data is transferred over the control channel. The port registers have separate out & in bits implying full duplex communication, but the host & the drive share just 1 data & 1 clock line.

------------------------------------------------------------------------------------------------------------------------------

Debugging 1541 firmware

There's no monitor for the 1541 CPU in VICE. It's quite difficult with emulation. At least real hardware could bit bang a UART on the LED & have real serial port lines to probe.

It's well known that 1 peripheral can transmit to another peripheral because they were all daisychained on 1 bus, so practical debugging depends on the 1541 printing to the printer directly.

--------------------------------------------------------------------------------------------------

For historic re-enactment, there are no more real 1541's. The heads have all perished. The SD2IEC dongle doesn't work with a fast loader. 1 current replica is the ultimate II cartridge.

https://ultimate64.com/Main_products

It can run without plugging into a C64. It doesn't have the internal logic signals but emulates the original ROM.

https://cbm-pi1541.firebaseapp.com/

A much cheaper solution is a raspberry pi 1541 emulator. This only works on pi's below the 4 but it could be a use for the zillions of zero W's with broken wifi chips. It requires fabricating a level shifter. It seems any hardware re-enactment is going to be largely home made. Since lions only need the I/O port signals, 1 possible future is a C64 emulator on a raspberry pi communicating with all the peripherals on other raspberry pi's through an I/O board. There should really be microcontroller peripheral emulations.

---------------------------------------------------------------------------------------------------------------------------

Some general traps for young players are: $dd00 is the data port register on the host & $1800 is the data port register on the drive. The GPIO bits are different bits on each side.

The 1541 user manual specified m-w: as the command but the bios expects it to not have a :. It stores the command at $0200 & reads the 1st byte of payload from $0205.

The CLK_OUT, DAT_OUT bits are the inverse of the wire voltage on both sides. The CLK_IN, DAT_IN are the inverse of the line voltage on the drive & the direct line voltage on the host.

Also noted the originalSteil fast loader doesn't handle TALK/LISTEN anymore. It takes a 1 way trip to load a program & crash. It needs major changes to continuously read sector addresses & send data. Implementing TALK, LISTEN in the fast loader would be really hard so attempt 1 was using ATN to select the peripheral. Low ATN would communicate with the disk. High ATN would communicate with the printer.

Debugging using CIOUT should work inside the fastloader since all peripherals share the same serial lines. Noted CIOUT buffers the argument & sends the previous character. It flushes the buffer in the TALK/LISTEN command. This could have been to avoid waiting for the listener to start listening. Efforts to clone CIOUT in the fastloader didn't work. A more likely system was bit banging debug text to the host & printing from the host.

--------------------------------------------------------------------------------------------------------------

Helas, after much effort, the new fastloader was erratically crashing or locking up. A verify operation would also randomly lock up, depending on random placement of nops anywhere in & after the end of the mane program. Printing to the screen or printer made no difference. Fixing the DATA segment made no difference. Uploading random data instead of the fastloader failed the same way. Started noticing behavior like this when simultaneously scrolling & loading sectors. There seems to be a dependency on program size when sending disk commands from inside the program. It's been the closest sign so far of an emulator bug, but GEOS & all it's I/O functions still work.

Lions aren't inclined to burn $500 on a hardware replica to further test the problem. It's extremely expensive when the same money would buy a 1st rate modern confuser.

Noted after calling LISTEN on the drive, you can still print to the screen by calling CHROUT & write characters to the drive by calling CIOUT. That narrowed the problem down to just the disk I/O.
Concurrent bitmap scrolling & disk reads
07/20/2024 at 22:57 • 0 comments
The next step was making a scrolling demo which performs sector reads while simultaneously scrolling.

Despite all their busy waits & sleep commands, the LISTEN, UNLSN, CIOUT commands are actually fast enough to run without noticeably affecting scrolling. The busters are TALK & ACPTR. Sadly, there is no way to poll the serial port for data like there is for the keyboard.

For the ACPTR operation, there are notes saying the drive releases CLK at T0 when it's ready to send

https://ia902702.us.archive.org/11/items/Commodore_1541_Troubleshooting_and_Repair_Guide/Commodore_1541_Troubleshooting_and_Repair_Guide.pdf

page 165. Then the host releases data at T1 when it's ready to receive. The document has a typo: ""clear to send" by releasing the clock line to a logic high" The kernal waits only 256us between T1 & T2 before it times out.

The drive lowers CLK at T2 to send the 1st byte & the host has to be polling to receive it. There's no more timeout code after T2. It might be easier with physical hardware to probe. It big bangs all the serial port reads. The drive is driving CLK. The host has to be fast enough to catch the clock transitions.

Profiling the read operation is difficult because the jiffy clock doesn't run during I/O operations & the CIA time of day doesn't seem to work at all in emulation. An external hardware timer could do the job, but that's another job for physical hardware. The only way was to setup the only unused CIA timer
```
    lda #$ff   ; reset CIA 2 timer A
    sta d2t1h
    sta d2t2l  ; reset CIA 2 timer B
    lda #$11
    sta d2cra  ; start timer A
    lda #$51
    sta d2crb  ; run timer B off of timer A
```
Then extract a 16 bit time value. It decrements every 256 us. The value has to be complimented to get a positive number.
```
    lda #$ff
    sbc d2t1h ; get CIA 2 timer A
    sta dst
    lda #$ff
    sbc d2t2l ; get CIA 2 timer B
    sta dst + 1
    lda #$ff ; reset the clock
    sta d2t1h
    sta d2t2l
    lda #$11
    sta d2cra
    lda #$51
    sta d2crb
```
Some extensive profiling revealed the TALK after sending "u1 2 0 1 0" has the long delay for the sector read. The TALK takes around .4 seconds in a test of reading multiple tracks. The trick with profiling is you can't print anything until after all the disk I/O is done, otherwise the LISTEN for printing steps on the TALK for the disk.

After copying bits of kernal source code & bodging in random delays, got it to poll the TALK state while scrolling. There's no documentation for the TALK operation, but it seems to rely on driving the attention line & waiting for a pulse in the data line. There's a chance it could miss the data pulse so it has to poll fast enough. After polling just enough signal transitions, it can call the native TALK function without blocking. The demo prints the time spent polling the TALK state, the time spent running the actual TALK function, & the time spent reading the 1st byte.

Sadly, scrolling still slows down during the ACPTR operations, even though it's not as bad as a .4 second delay.

Polling the CLK pin before calling ACPTR was useless. The polling function spins just as long as ACPTR & ACPTR is limited by the baud rate.

A fast loader would speed up ACPTR. The only other ways are to fix the scrolling frame rate with some kind of timer & spread out the serial port commands. Experiments with the jiffy clock accuracy during sector reads would be required. Raster interrupts are unlikely to work with constant I/O.

Depending on how I/O servicing is divided between bitmap operations, the single sector reads are 1.8 seconds at 4fps & 2.4 seconds at 6fps. Quite a bit slower than the 400 byte/second rating, because of the serial port handshaking, seeking, & multitasking. Reduce the scrolling speed & the I/O could be concealed. There could be a case for just letting the scrolling stutter.