Unlimited tile world on a Commodore 64

Project Logs

Collapse

Sprite concepts
lion mclionhead • 09/09/2024 at 01:58 • 0 comments

Sprites had 24x21 resolution. 2 abreast multicolor mode is still 24x21. A flame costs 2 more, but isn't always on. Memory accounting is a difficult problem. In both bitmap banks, the sprite pointers can access 7168 + 192 bytes. It takes more memory but is simpler to have a copy of every possible sprite in both banks. It's a lot more efficient but more complicated to memcopy the sprites to static pointers in realtime. Disk I/O would pay for the memcpy. The memcpy option could perform most rotations at runtime. The dynamic pointer option requires every possible rotation to be stored twice in RAM so it almost has to use memcpy. It needs 16 sprites or 1024 bytes in each bank since they all have to be page flipped.

Rough concepts were pretty bad. It's kind of intelligible when zoomed out. The journey began with screencapping the website animation with rotations. It was decided to not reverse the lighting when it was upside down to save memory & because it didn't add to the intelligibility. It is just a matter of mirroring & rotating the 4 base images to get flat spins, landing burns, strafing. It also needs RUDs. This stuff is very labor intensive for the result. It would be easier to not shade it at all & not roll it. A low fidelity version might be in order for starting.

Young lion was similarly surprised by how bad sprites really looked in 1985.

A lot of the delusion might have been from assuming the C64 ports of these games were faithful to the arcade versions, but they might have also looked a lot better on a CRT. Young lion also had some idea game programmers were working within budget constraints so they couldn't max out the machine. Aligning players horizontally was particularly crippling. There's no way to multiplex horizontal sprites.

Some fake CRT blurring made it slightly more bearable.

The last time a ship flame was photographed was many years ago. The flame needs to animate faster than the ship. It's easier to use 2 double height sprites than a single double width. The only times it can realistically have a flame are the landing flip & the flat spin.

The big problem is developing a sprite engine. It's mane feature is a table of the desired image address, transformation, x, y, visibility, colors of each sprite. It needs to compare a table of new sprite states with a table of current sprite states, copy the desired image data into the 2 banks outside the raster interrupt, copy the new state table to the current state table inside the raster interrupt, then use the current state table to update the VIC II registers inside every raster IRQ.

A general purpose sprite engine was another snag encountered by 10 year old lion. In this case, there's no multiplexing. It's a memory management & queueing problem. As much as young lion believed multiplexing was essential in making any game look good, it actually would never have been practical for his ideas.

On top of the sprite engine, it needs a higher state table for the animation frames. If only real starships didn't do so many flips.

Cinelerra can screencap an emulator & apply blurring, bobbing in realtime, at 60fps. It definitely improves the photo rendition. Bobbing is imperceptible unless it's 4x & the colors are high contrast. The memory is lost of the actual blurriness of a TV. There is an intermittent sheering problem with the bobbing. Unfortunately, trying to debug the bobbing effect with a flashing white image damages the monitor. It now has permanent flashing lines.

The problem isn't temporal. Bobbing works properly when all the rows are a repeating pattern. It's a spatial problem when the rows are unique.
Enhancing the bitmap conversion
lion mclionhead • 08/20/2024 at 18:57 • 0 comments

There were some attempts to make the world map more legible & wider. Grey scale looked the best of all methods but still terrible. Shades of red were worse. There could be pseudo colors made of checkboard patterns. A checkerboard pattern can compress if every other byte is a ROL.

It's pretty hard to make the world bigger in 1 direction. It seems if the tile buffers underrun, it currently just crashes. It tries to start a new tile read every time it scrolls 40 columns, but there's no way to interrupt a disk read so it crashes.

A downtown map was similarly bad. Saturating the image helps, but it seems either the palette is too limited or the color selection is just horrible.

The image conversion is the biggest area which needs improvement. A problem with the nearest color algorithm could be making grey look better than red.

For a mars based game, another idea is monochrome bitmap mode, using dithering to get 3 shades of red. The starship would be shades of grey. Helas, that looked really ugly in "the great escape".

Some guys used genetic algorithms to improve the color matching. https://www.syntiac.com/tech_ga_c64.html

For a static mars landscape with shades of red, manual palette matching would be best. If the palette is just 4 shades of red, color memory can be static. The commodore had 5 shades of red including black & 5 shades of grey including B & W. Static color memory could define only 4 colors & would only save 500 bytes. 5 colors could be defined in 500 bytes of color memory if 3 of the colors were fixed. The non reusable aspect, the limited gain & the loss of color substitutions make the full 1500 byte color memory preferable. RLE compression would do most of the optimizing. Experiments would have to be done with 5 & 4 shades of red.
Compared the hardware palette with a custom palette & random dithering. The custom palette gave the most intelligible results by having less contrast. Random dithering is required for anything bearable. The softer pixels of a real CRT might help.
The compression was still 8% in the worst case & 39% for the manely blank tiles.

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

There was another idea where this type of passive scrolling background could serve a purpose as a timing indicator for a simple racing game. At certain points in the map, enemies would appear & the player would need to take certain actions. There would be a finish line indicated by a landing pad. For now, dodging the enemies & doing a landing flip at precisely the right time would be the key maneuvers. The enemies would try to push you off the map or ram you. You'd have a flat spin & landing flip to knock down enemies, sideways movement, drifting, & a final landing flip. Knockdowns would shake the screen.

Vertical scrolling would have more sideways range & look more intuitive. It would still be slower. Lions dream of a liftoff level & a landing level.

It's a tantalizing idea, but lions are just no good at creating game graphics. Converting a photo to a bitmap might be attainable but player art was a disaster 40 years ago as now. There's no AI for commodore player graphics. Getting enough colors would limit each player to single sprites. Most games got away with single sprite players.

A spitter post had some low fidelity player graphics.

https://x.com/Flight5starship/status/1830332428206108775/photo/1

That's about all the detail it could have. It would take 4 multicolor sprites in 2x height mode to define the player & flame. Another 4 would define an enemy & flame. 2x width mode didn't have useful resolution in multicolor mode. ...
Read more »
Useless side scrolling demo
lion mclionhead • 08/08/2024 at 17:55 • 0 comments

As a pure useless demo, lions still think a hard coded, single direction scrolling bitmap with concurrent disk reads feeding the scrolling would be damn impressive looking. Don't think it can ever be detailed or fast enough for any kind of game but it could be an interesting visual if the detail was maxed out & it moved at the maximum IO speed. Thinking the demo would pan over valles marinares. It would need more than 25 row tiles & a script specifying all the tile locations, scrolling keyframes to support diagonal scrolling. For a 2D race track game, it could work.

In reality, it would have worn down a real disk pretty fast. Nowadays, 5.25" disks are irreplaceable so it might not be very practical except as a thought experiment. The drive head & worm gear could probably last forever. It could store the map in multiple locations to level the wear.

Some example photo conversions from https://www.digartroks.be/img2c64mc/index.php show what it would look like. The undithered version seems more playable in a game since the players show up more. Not sure what the yellow on the canyon floor is. The converter almost needs some kind of color balance sliders to try to aid the color mapping.

A simple map compressed into 5 tiles with 9500 bytes compressing down to 7526 bytes in the worst case. A test load of the worst case took 13 seconds or 3 columns/sec maximum scrolling speed.

Still somewhat readable with the colors converted to commodore.

As far as contiguous memory for offscreen bitmap tiles, there's a 9500 byte block at $dae0 & $35e0 which leaves 11744 at 0x0800 for the program & a few bits elsewhere.

Much hacking yielded a 3fps, 1 cell at a time scroller that concurrently read tiles from disk. Definitely potato cam in the emulator. This was 3k of program memory & 29440 bytes of world memory on disk to store 5 tiles. Another 20 tiles could probably fit. It doesn't support underruns but hard codes ping pong buffer swaps 3 times per second. The tricky part is getting it started, where the disk needs to temporarily get 2 buffers ahead & then fall back to 1 buffer of lead time.

It gets within 2 frames of underrunning the largest tile. In reality, the disk speed was quite non deterministic. The seeks were fixed stepper motor times but the sector reads weren't guaranteed. The problem is much easier when scrolling always goes 1 way at a fixed speed.

Making smooth scrolling would be the tricky bit. It needs to scroll a single pixel in an interrupt handler. Every 8 pixels, it needs to switch VIC banks in the interrupt handler, then make the mane loop scroll the color RAM 1 cell & then scroll the next bitmap & screen RAM 1 cell. Since there's no VIC bank for color RAM, the color RAM scroll for the current position needs to be done after switching banks while the bitmap & screen RAM for the next position is done before switching banks. There's an unavoidable color glitch when switching banks which is still visible on the gootubes.

A few hours later, it was smooth scrolling. The mane problem with smooth scrolling was page flipping inside the raster interrupt. All the disk I/O uses the same CIA register $dd00 as the VIC bank switching so all the disk I/O needed to selectively disable interrupts to be concurrent. The glitches from copying color RAM were definitely pronounced. Confirmed the glitches were not from debug prints.

The mane disappointment now is world map looking like a turd & being only 5 tiles. Only 3 tiles actually have to be spooled from disk. Text would be the most visible art but wouldn't justify bitmap mode.
It is pretty much what young lion imagined building a game around. It took lightyears more effort & tools than he could have ever mustered. Not sure how exciting a jupiter lander it would be without some opponents. ...
Read more »
Tile rendering
lion mclionhead • 08/06/2024 at 08:22 • 0 comments

Always seen as the hardest step is tile loading. 5 algorithms are required for just the 4 cartesian scrolling directions & a full page refresh. Diagonal scrolling would probably not fit.

The mane trick with debugging tile rendering was making test pattern worlds.

Much diabolical assembly yielded the 1st B&W screen draws, properly spanning 4 tiles. The tile cache & all the gaps required to align the VIC buffers left 18991 contiguous bytes for the program & just the B&W drawing engine took 6012 bytes. There are actually 2168 & 504 byte gaps between the VIC buffers. Some subroutines & tables could be loaded there. Those would probably be direct sector reads with the fastloader. It's proven more efficient to use the disk as raw storage instead of bothering with a filesystem.

Helas, a bug in makeworld artificially reduced the character set. The real world needed more like 105 characters instead of 55.

SDF-1 started materializing. Tempting to go with B&W. Decompressing color is a big job, but color is what young lion envisioned. The big challenges are reusing characters by remapping colors & unpacking colors. Color encoding was solved failure case by failure case.

The SDF-1 was finally revealed.

Sadly, because the I/O is dictating an extremely slow scrolling speed & the screen drawing is going pretty fast, it might make sense to do full screen redraws for all the scrolling instead of the border routines. The big question is where to do asynchronous disk I/O if it's always accessing the tile memory instead of copying the bitmap memory. It would need to happen between screen redraws.

For the test world, the $d800 color RAM only ended up used in 1 cell for the entire world & it would require a small tweek to work around. Updating the color RAM is going to be extremely slow & burn 4500 bytes for data that's never going to be seen. The color RAM itself only has 4 bits per byte implemented so it's of limited use in another role. If the world had photo quality graphics or dithering, the color RAM could be used. This would use up the character set fast & slow the IO way down. These factors make the color RAM unlikely to ever be used.

If the world had only 2 unique colors + black in each cell, it would still be ahead of character mode. Character mode only allowed 1 unique color in each cell with 3 other global colors. Greeblies are 1 area where color RAM could be important. Without greeblies, the way it is now, it's in the territory of vector line art. A vector line art world would need vastly less IO & probably scroll a lot faster. Instead of a tile cache, the screen or the borders would be rendered from scratch. The entire world could fit in RAM. It wouldn't be as unlimited as a disk world, but the largest world young lion envisioned would fit in RAM.

There's still a desire to use elements which couldn't be drawn with line art, like trees. Those could still be character based objects drawn on top of the line art. The line art engine would just draw lines, filled triangles, & filled rectangles. That could possibly end up slower than character tiles.
A quick demo with keyboard navigation had a lot of glitches but revealed what young lion's scrolling tile world would have really looked like. The initial reaction was it would be pretty hard to navigate through such a small pinhole. It would need a map inset or a lot more detail. It was no more than a static image a player couldn't interact with & with no moving objects. At least bungeling bay had an interactive, animated world.
It was much the same situation young lion ended up in, day after day. He would arrive at a gist of a game engine that wasn't heading towards anything useful & the initial impression ended any motivation to continue with...
Read more »
Fastloader bug
lion mclionhead • 08/03/2024 at 21:45 • 0 comments

Noticed the emulator showed unnecessary seeks to track 1 in the middle of reading multiple sectors from the same track. It seeks directly to the right track, reads 1 sector, then seeks to track 1, then seeks back to the right track to read a few more sectors. The ROM source code implies it's a head bumping operation caused by an error & in real life, it would be a familiar knocking.

In lion opinion, the knocking sound was not the head impacting anything but the sound of a stepper motor stalling & slipping out of phase. The head probably experienced much less force. There are no useful recordings of that sound.

After much poking, the Steil fastloader had a bug where address $05 has to be 0 before calling DOREAD or it'll head bump. He used $05 as a counter & either the values he stored in it got lucky or it's another Vice bug. It's unlikely 10 year old lion would have figured that out.

There's a useful table of d64 track offsets not in any goog searches:

http://unusedino.de/ec64/technical/formats/d64.html

The working fastloader did the 9 tile cache fill in 9.1 seconds or 1.01 seconds per tile, with no concurrency. This would allow 6.1fps of vertical scrolling with only the 2 visible tiles read or 4.1fps with a complete row read. Things are going to be much slower with concurrent scrolling. There's also going to be a time quantization where it starts reading a 3rd tile even though not enough time remanes. Since it can't interrupt a tile read, it's always going to load a complete row.

-----------------------------------------------------------------------------------------------------------------------------------------------

The best optimizations lions could come up with were making it load 1 row of the tile cache at a time, storing all the tiles for a row on 1 track, & packing multiple tiles in each sector to reduce the sector reads. It would need an offset table to pack sectors. Currently, the world is limited to 256 tiles & 10 rows of tiles to save on data types.

Sadly, there is no way to abort a sector read. Testing for ATN low to allow partial sector reads would slow down the reads too much.

It's slower to buffer complete sectors than to decompress 1 byte at a time. Buffering a complete sector requires shifting in all the unused bits while aborting a read requires just driving the clock. Because of the way multitasking works, it has to switch the kernal ROM for every RLE code whether the sectors are loaded previously or not.

Packing the tiles only works if they're read from lowest to highest & a complete row is read. It's always going to throw away the start of the 1st sector in each row. It's usually going to slow down sideways movement because now a start & end of a sector is thrown away instead of just the end. It could slightly benefit vertical movement because 1 out of 3 tiles wouldn't require throwing away any data. There's a small chance the packing could reduce seeking enough to have an improvement. The complexity of it makes it a pass.

Sadly, storing 1 row per track was actually slower than packing multiple rows in each track. It seems to cut just enough seeking if the tracks contain multiple rows.

Filling the cache forwards or backwards makes no difference. It seeks more when reading backwards.

---------------------------------------------------------------------------------------------------------------

A hybrid of scrolling & paging seems in order. Allow the player to traverse the entire screen while scrolling. When the player hits an edge, redraw the whole screen. This would result in a more general purpose game engine which did either scrolling or paging.

The long feared tile renderer converged on something that always renders 4 tiles. It determines the 4 visible tiles & draws a hard coded corner of each visible tile. ...
Read more »
Tile loading algorithm
lion mclionhead • 07/29/2024 at 18:45 • 0 comments

Tile loading is the heart of the demo & a piece of diabolical assembly language. Made some diagrams to try to predict the performance of different algorithms.

A 6 tile cache has to choose between loading 2 extra vertical or 2 extra horizontal tiles, based on the player's heading. The player can still turn 180 without stuttering.

If the player turns left in this moment, it'll stutter as it loads 2 horizontal tiles.

A 9 tile cache is the minimum for stutter free scrolling in all directions. If scrolling is limited to 5fps, it'll have 4 seconds to load 2 new tiles for X movement & 2 seconds to load 2 new tiles for Y movement. It still has to prioritize loading order based on heading.

There could be a way to RLE compress each row independently. 9 tiles would be stored compressed in roughly 9000 bytes. The center 25 rows & 120 columns would be decompressed into 7500 bytes to allow fast X movement. Y movement would decompress 1 row at a time.

It would require a 25 byte index in each tile, containing the compressed size of each row. There is a memory fragmentation problem from using variable sized tiles. Each tile would have to be allocated the size of the largest tile, probably 1000 bytes. Concessions will always have to be made in the artwork to achieve reasonable compression. That gives 16500 for the world map, 20000 for the bitmaps, 29000 bytes free.

The original idea of always having 6 uncompressed tiles would burn 15000 bytes, be prone to stuttering, & couldn't do diagonal movement.

If it had 9 uncompressed tiles, it would burn 22500 bytes & leave 23000 bytes free. 9 seems to be the minimum for stutter free movement in all directions. The on the fly decompression would be so complex, it's worth knowing how much free memory is really needed.

To free up 10000 bytes, the screen could be shrunk to 38x23 to make single buffer scrolling less garish.

---------------------------------------------------------------------------------------------------------------------------------------------------

Like young lion, division & multiplication operations in 6502 assembly have proven daunting. There are easily searchable solutions, the basic ROM has math functions, but it's still more memory efficient to use loop subtracting & loop adding.

RLE decompression is definitely a buster in assembly language. Young lion never heard of it or envisioned using it. He did know some kind of encoding was used to compress redundancy in newer PCX files. The speed requirements definitely require it. Multicolor bitmap mode was not in his plans either. Young lion would have needed a higher level language for the RLE compression, not that any of the world creation tools were attainable.

In a modern development environment printing all the output to the console, the commodore looks like an ordinary UNIX box. That's the same kind of output the GEOS developers would have seen 40 years ago.

It prints out the [tile number]-[assigned buffer] in the 9 tile cache positions. After much debugging, it was able to load the 9 tiles of the cache in 15.5 seconds or 1.7 seconds per tile with no concurrent scrolling. Not quite fast enough to scroll vertically at 5fps. Maximum vertical speed would have to be 3.6fps & horizontal speed would be 5.8fps.
It takes a lot of instructions to load tiles into shadow RAM above $d000. It has to disable interrupts & write the port register to write every byte, then re-enable interrupts & write the port register to read the next byte from the I/O addresses above $d000. It might be faster to load a complete sector into a 256 byte scratch area before decompressing it. There's a 1528 byte gap above $c000, imposed by the VIC-II bank boundary.
Scrolling with fastloader
lion mclionhead • 07/28/2024 at 05:27 • 0 comments

In testing for emulation bugs, it seems storing the world map on drive 9 caused many problems. Disable drive 9 & the bugs go away. There are some hints that Vice doesn't faithfully emulate the serial bus.

https://www.lemon64.com/forum/viewtopic.php?t=74765

Despite having blanks for 4 drives, there's no evidence anyone uses dual drives in vice.

A quick test in GEOS failed to access drive 9. Drive 9 has to be disabled or it'll get stuck looking for desktop 1.5.

Then after enabling it, you have to run the configure program.

The configure program can't detect it. Drive 9 worked with standard kernal I/O but not with any kind of fast loader. So the world map for testing needs to be in the same .d64 image as the program & a maximum world map is going to require a disk swap.

So in addition to the lack of physical I/O signals, emulation is going to involve compromises in terms of peripherals. An FPGA emulator accessing drives through physical I/O signals might still work. Of course, 1 drive is a more accurate representation of what young lion had.

-------------------------------------------------------------------------------------------------------------------

Some testing showed when ATN is low, the drive always reads low voltage from DAT. When ATN is high, the drive properly reads DAT. That's not supposed to happen. It could be yet another emulator bug. Given limited time & tools, there could be a way to pulse ATN just for synchronization & neglect the printer state. The debugging problem ended up being hard enough to settle on just blinking binary on the LED. The lack of logic probing & endless obscure emulator bugs consumed such a vast amount of time, they put a real emphasis on doing the least that could work.

Big surprise, the Steil fastloader didn't work in emulation. Then came a few attempts at simple bit clocking. Driving CLK from the drive was slower than driving it from the host. Sending 1 bit at a time seemed to be the faster way, since a lot of bit fudging is required to pack 2 bits at a time. Using ATN as a clock had a problem where DAT changes at the same time ATN changes, so 2 bits at a time really have to free run.

Once the many issues were finally resigned to emulation bugs & the fastloading settled on clocking 1 bit at a time, the rest came together pretty fast. Concurrent scrolling & disk reads with the fastloader were lightyears ahead of the stock ROM, even with single bit clocking. Concurrent I/O will always have an impact to the scrolling, but it seems to be reasonably imperceptible now.

Sector reads with seeks of 1 track ran at 1 second. When it seeked 4 tracks, it took 1.2 seconds. Scrolling goes at 9fps during the data transfer. The stock ROM needed 2.4 seconds per sector with 6fps. If it's not reading the disk, scrolling still goes at 9fps. The impact is not detectable in a screenshot. It's intriguing how little time is spent bit banging the serial port.

Attention turned to manetaneing a tile database in RAM.

----------------------------------------------------------------------------------

The double buffered bitmaps suck 20,000 bytes. By packing the color memory, each tile sucks 2500 bytes. It takes 4 tiles or 10,000 bytes to render the current view. At least 12 more tiles or 30,000 bytes have to be offscreen to scroll in 4 directions without stuttering. It became clear that wouldn't fit in 10 year old lion's commodore. 2 offscreen tiles are the minimum & would leave roughly 30,000 bytes free. It could scroll smoothly 1 way. Any change in direction would usually cause it to stutter. Sometimes the player would get lucky. It would be even luckier if 4 offscreen tiles fit. Then the stuttering would only happen in 90 degree direction...
Read more »
VIC1541 fast loader
lion mclionhead • 07/22/2024 at 17:52 • 0 comments

The biggest need was decidedly speeding up ACPTR. This entails running a loader inside the 1541 at runtime. You can't overwrite the 1541 ROM with a shadow RAM like the C64 itself. You have to upload an executable to RAM & call into the 1541 ROM from it. The mane problem is the stock fastloaders are very focused on loading a program in 1 shot, blanking the screen & taking full control. Concurrency would require a scratch built fast loader.

Reviewing the 1541 ROM source code from 50 years ago, it's clear that they were working just as hard as the current generation. They optimized complex algorithms down to every single byte, writing in unintelligible opcodes. They didn't have it any easier than the current generation but faced equally difficult problems. The current problems are the massive size of modern API's, massive numbers of steps required to do the simplest thing. Each generation worked at the limit of what was possible with a certain amount of capital. The limiting factor is the amount of human decisions which can be made in a certain amount of time.

The only 1541 source code is a disassembly with absolute addresses:

https://g3sl.github.io/c1541rom.html

The byte for ACPTR is clocked out at 0xe958. The only delay is 0xe97b, a call to 0xfef3. The only other delays are many debounce routines. They had induction problems. It uses address 0x23 as a speed flag. Some tests with u+, u-, ui+, ui- didn't do anything useful.

The most useful resource was this presentation. The same guy wrote a fast loader using these methods in 2011. His video was in 2021.

https://www.pagetable.com/?p=568

The FCODE segment runs on the drive. The PART2 segment runs on the host. He handles the badlines. All it does is load a sequence of hard coded sectors & sends them using custom bit banging. It doesn't use TALK or LISTEN. It just bit bangs data out of the drive after the OPEN call. There's no debounce code. Lions thus need to add a function which bit bangs a track & sector number to the drive & transitions between reading & writing. The drive would become a simple sector reader.

The mane problem is uploading the program to the drive. He uses some heroic methods to load the firmware directly from disk to the drive's execution space. The trick with this is if the world map is a separate disk in a separate drive, it would entail loading a 3rd disk containing the fast loader or using some world space for the fast loader. The easiest system is to load the firmware from drive 8 & run it in drive 9.

You have to use the M-W command to write drive memory. page 38

https://www.commodore.ca/wp-content/uploads/2018/11/commodore_vic_1541_floppy_drive_users_manual.pdf

Then you have to use the M-E command to run it. page 39

He doesn't use the data channel. All the data is transferred over the control channel. The port registers have separate out & in bits implying full duplex communication, but the host & the drive share just 1 data & 1 clock line.

------------------------------------------------------------------------------------------------------------------------------

Debugging 1541 firmware

There's no monitor for the 1541 CPU in VICE. It's quite difficult with emulation. At least real hardware could bit bang a UART on the LED & have real serial port lines to probe.

It's well known that 1 peripheral can transmit to another peripheral because they were all daisychained on 1 bus, so practical debugging depends on the 1541 printing to the printer directly.

--------------------------------------------------------------------------------------------------

For historic re-enactment, there are no more real 1541's. The heads have all perished. The SD2IEC dongle doesn't work with a fast loader. ...
Read more »
Concurrent bitmap scrolling & disk reads
lion mclionhead • 07/20/2024 at 22:57 • 0 comments
The next step was making a scrolling demo which performs sector reads while simultaneously scrolling.

Despite all their busy waits & sleep commands, the LISTEN, UNLSN, CIOUT commands are actually fast enough to run without noticeably affecting scrolling. The busters are TALK & ACPTR. Sadly, there is no way to poll the serial port for data like there is for the keyboard.

For the ACPTR operation, there are notes saying the drive releases CLK at T0 when it's ready to send

https://ia902702.us.archive.org/11/items/Commodore_1541_Troubleshooting_and_Repair_Guide/Commodore_1541_Troubleshooting_and_Repair_Guide.pdf

page 165. Then the host releases data at T1 when it's ready to receive. The document has a typo: ""clear to send" by releasing the clock line to a logic high" The kernal waits only 256us between T1 & T2 before it times out.

The drive lowers CLK at T2 to send the 1st byte & the host has to be polling to receive it. There's no more timeout code after T2. It might be easier with physical hardware to probe. It big bangs all the serial port reads. The drive is driving CLK. The host has to be fast enough to catch the clock transitions.

Profiling the read operation is difficult because the jiffy clock doesn't run during I/O operations & the CIA time of day doesn't seem to work at all in emulation. An external hardware timer could do the job, but that's another job for physical hardware. The only way was to setup the only unused CIA timer
```
    lda #$ff   ; reset CIA 2 timer A
    sta d2t1h
    sta d2t2l  ; reset CIA 2 timer B
    lda #$11
    sta d2cra  ; start timer A
    lda #$51
    sta d2crb  ; run timer B off of timer A
```
Then extract a 16 bit time value. It decrements every 256 us. The value has to be complimented to get a positive number.
```
    lda #$ff
    sbc d2t1h ; get CIA 2 timer A
    sta dst
    lda #$ff
    sbc d2t2l ; get CIA 2 timer B
    sta dst + 1
    lda #$ff ; reset the clock
    sta d2t1h
    sta d2t2l
    lda #$11
    sta d2cra
    lda #$51
    sta d2crb
```
Some extensive profiling revealed the TALK after sending "u1 2 0 1 0" has the long delay for the sector read. The TALK takes around .4 seconds in a test of reading multiple tracks. The trick with profiling is you can't print anything until after all the disk I/O is done, otherwise the LISTEN for printing steps on the TALK for the disk.

After copying bits of kernal source code & bodging in random delays, got it to poll the TALK state while scrolling. There's no documentation for the TALK operation, but it seems to rely on driving the attention line & waiting for a pulse in the data line. There's a chance it could miss the data pulse so it has to poll fast enough. After polling just enough signal transitions, it can call the native TALK function without blocking. The demo prints the time spent polling the TALK state, the time spent running the actual TALK function, & the time spent reading the 1st byte.

Sadly, scrolling still slows down during the ACPTR operations, even though it's not as bad as a .4 second delay.

Polling the CLK pin before calling ACPTR was useless. The polling function spins just as long as ACPTR & ACPTR is limited by the baud rate.

A fast loader would speed up ACPTR. The only other ways are to fix the scrolling frame rate with some kind of timer & spread out the serial port commands. Experiments with the jiffy clock accuracy during sector reads would be required. Raster interrupts are unlikely to work with constant I/O.

Depending on how I/O servicing is divided between bitmap operations, the single sector reads are 1.8 seconds at 4fps & 2.4 seconds at 6fps. Quite a bit slower than the 400 byte/second rating, because of the serial port handshaking, seeking, & multitasking. Reduce the scrolling speed & the I/O could be concealed. There could be a case for just letting the scrolling stutter.
Sector reads on a 1541
lion mclionhead • 07/18/2024 at 17:56 • 0 comments
Young lion knew a lot about sector access on the 1541 but not anymore. There were undelete tools & disk editors in Compute magazine.

There are some bits about raw sector reads.

https://codebase64.org/doku.php?id=base:reading_a_sector_from_disk

https://wpguru.co.uk/2016/01/how-to-use-direct-block-access-commands-in-commodore-dos/

https://www.lemon64.com/forum/viewtopic.php?t=55010

I/O doesn't get as much love as graphics & sound, probably because the mane focus of retro computing is games & much I/O has been replaced with RAM expanders or custom hardware.

You have to specify the track & sector. The tracks start on 1. Sectors start on 0. Track 1 sector 0 is byte 0 of the .d64 file. The last sector on track 1 is 20. The number of sectors changes based on track. The world table of contents thus needs to define tracks & sectors instead of monolithic sectors.
```
track       # sectors
1 – 17      21     
18 – 24     19     
25 – 30     18     
31 – 35     17     
```
-------------------------------------------------------------------------------------------------------------

It's highly desirable to have the world map on drive 9 & the mane program on drive 8. The mane problem for animals is lockups after calling CHKIN for drive 9. You have to read a status register $90. If bit 8 is 1, no device is attached.

~/.vice/vicerc needs to have a line enabling drive 9 to get around the lockup

Drive9Type=1541

Then x64 needs a -9 argument to attach a disk. Noted the -attach commands don't work. Only -9 works.

-----------------------------------------------------------------------------------------------------------------------------

There are some long delays when calling OPEN. Some of the internet notes it's waiting for the device to assert a clock signal. It also has a hard time printing to the printer while accessing a disk.

Noted in the asm version of opening the data channel
```
open 5,9,5,"#"
```
You have to call SETNAM to set "#" as the filename.

The only way to read a sector was to call OPEN for the control channel with "u1 2 0 1 0" as the filename. With the CHKOUT API, it's not possible to read consecutive sectors without calling OPEN & CLOSE for each sector & having a long delay. A lower level API is required to have any hope of asynchronous reads. The big challenge with what young lion envisioned was the simultaneous loading of tiles in the background while scrolling the world map.

It seems the serial API only goes 1 way at a time. You can't CHKOUT & CHKIN simultaneously to print to a debug port while reading from a disk port. It would be easier if the screen could be used for debug output. Then CHROUT could simultaneously write to the screen while CHRIN read from a serial port.

https://www.pagetable.com/?p=1031

https://retro-bobbel.de/zimmers/cbm/programming/serial-bus.pdf

There are some notes about using a lower level talk/listen API instead of the CHKIN/CHKOUT API but no examples.

There's a reference manual for the 1541

https://www.mocagh.org/cbm/c1541II-manual.pdf

The kernal source code is

https://github.com/mist64/c64rom/tree/master

Inspecting the kernal source code & peeking addresses revealed how to use the talk/listen API. Basically call LISTEN with 9 to access drive 9. Call SECOND with (15 | 0x60) to specify the secondary address for the control channel. Call CIOUT to send the u1 command to the control channel. Call UNLSN to execute the command. The mane trick is the secondary address sent to SECOND has to be ored with 0x60. The OPEN workflow is still necessary to open the drive but it's only needed once in the program lifecycle. Then, the talk API can send consecutive "u1" commands to read sectors.

This API was able to read all 256 bytes from each sector. Some sources say only 255 bytes per sector are accessible.

Also noted UNLSN is only required...
Read more »