Close

20241115 - Witchcraft

A project log for ROM Disassembly - Cefucom-21

Peering into the soul of this obscure machine

ziggurat29ziggurat29 11/15/2024 at 20:480 Comments

As mentioned yesterday, there is a section of code that is confusing.  There are a few call levels deep with return address manipulation and indirection that obscure the flow of execution.  But it needs to be figured out because the mechanism is used with some frequency (16 known instances) and operates on sizeable hunks of embedded data.

4ECC  sub_4ECC:
4ECC CD 14 5A  call    sub_5A14
4ECF 0C        db  0Ch      ; lump o data
4ED0 0A        db  0Ah
4ED1 D7        db 0D7h
4ED2 4E        db  4Eh
4ED3 00        db    0
4ED4 C3        db 0C3h
4ED5 E2        db 0E2h
...

The routine eats the return address into HL (which points to the lump o data) and then calls something else, and then jumps to whatever was left in HL. 

5A14  sub_5A14:
5A14 E1        pop     hl
5A15 CD 19 5A  call    sub_5A19
5A18 E9        jp      (hl)

The 'something else' it calls, calls yet something else, and apparently does an infinite loop.  I.e., not returning to the caller to provide HL to which to jump.

5A19  sub_5A19:
5A19 CD 1E 5A  call    sub_5A1E     ; do something with HL pointing to a lump o data
5A1C 18 FB     jr      sub_5A19     ; infinite loop?

The 'yet something else' it calls gets a code at the start of the lump o data, and increments the pointer past that, and does some checking and mapping of value before invoking my old friend 'sub_5905' which is known to dispatch via a 'subsequent table'.

5A1E  sub_5A1E:
5A1E 7E        ld      a, (hl)
5A1F 47        ld      b, a         ; remember the code
5A20 23        inc     hl           ; adjust data pointer to after the code
5A21 FE B0     cp      0B0h
5A23 38 08     jr      c, loc_5A2D
5A25 FE C0     cp      0C0h
5A27 3E 10     ld      a, 10h
5A29 38 02     jr      c, loc_5A2D
5A2B 3E 01     ld      a, 1
5A2D  loc_5A2D:
5A2D CD 05 59  call    sub_5905
5A30 52 5A     dw sub_5A52
5A32 56 5A     dw sub_5A56
5A34 6C 5A     dw sub_5A6C
5A36 71 5A     dw sub_5A71
... 17 of such subroutines

I had seen sub_5905 late last week when I was cataloguing the various direct dispatch tables.  The gist being that a table of functions follows the call, and A indexes into that table and finally a jump (effectively) is made to the selected address.

5905  sub_5905:
5905 E3        ex      (sp), hl                    ; HL now has the table address (which followed the call site)
5906 CD 0B 59  call    sub_590B                    ; HL = * ((WORD*)HL) [A]
5909 E3        ex      (sp), hl
590A C9        ret                                 ; effectively a jump

Those were simpler times.  The 'witchcraft' method uses a similar 'lomp o data follows what looks like a call' technique, but the data is not a simple table.  And the interstitial infinite loop adds a twist.  My limited mental faculties required that I page out to paper, and kept track of the call stack on the way down to the dispatched functions:

stack:
     ret 0 = witchcraft_01, sub_5A19, infinite loop
     ret 1 = witchcraft_00, sub_5A14, thunk via HL
on entry:
     B = code dispatched upon
     HL = pointer into data after the code

So, it seems this witch knows 17 'spells', and is given a lump of data as an 'incantation' specifying a sequence of spells to cast along with contextual data.  So it's time to take a closer look at her spell repertoire.

The first 'spell', referenced by code 0, is brief:

5A52  witchspell00_5A52:
5A52 33        inc     sp
5A53 33        inc     sp
5A54 C9        ret

By bumping SP twice, we effectively eat a return address.  Often times we do that with a pop, but I guess the author did not want to clobber any registers at all and chose the two increments instead.  The net effect is to return to the caller's caller, 'ret 1', which gets us past the infinite loop to where we thunk over via HL. 

The second spell involves more stuff than I want to dig into right now, and moreover can be entered multiple ways so I'm going to save that for later.

The third 'spell', referenced by code 2, is straightforward:

5A6C  witchspell02_5A6C:
5A6C 5E        ld      e, (hl)
5A6D 23        inc     hl
5A6E 56        ld      d, (hl)
5A6F 23        inc     hl
5A70 C9        ret

So it loads DE (little-endian) from the data pointer, and advances the data pointer past that.  Notably it does a ret, so it goes back to 'ret 0', which is the infinite loop.  I.e. it will then continue consuming and dispatching spells.

At this point we can start to see a design emerging.  The witchcraft mechanism is yet another 'embedded scripting' technique used in this machine.

The fourth spell invokes our old friend the embedded scripting engine:

5A71  witchspell03_5A71:
5A71 D5        push    de       ; save
5A72 CD 23 41  call    impl_rst10_4123  ; run VM program
5A75 D1        pop     de       ; restore
5A76 C9        ret

I'll need to spend some time with the other 14 spells, because as mentioned there are 16 invocations of this witchcraft that I will need to annotate.  But I can already do a short one:

...
51A2 CD 14 5A  call    witchcraft_00_5A14  ; witchcraft00; process subsequent data block
51A2  ; ---------------------------------------------------------------------------
51A5 02        db 2         ; witchspell02_ 5A6C load DE
51A6 20 C8     dw unk_C820  ; ... with this value
51A8 D1        db 0D1h      ; witchspell01_ 5A56, embedded param 11h XXX off_ 5B5D entry 17
51A9 03        db 3         ; witchspell03_ 5A71 run RST10 program
51AA 48        db 48h       ; vmop48_4589 - 'bin2bcd'
51AB C8 25     dw 25C8h     ; c825 - dest buffer
51AD 81 04     dw 481h      ; 8104 - uint16be_t* value
51AF 04 03     dw 304h      ; 0403 - 4 digits, blank pad
51B1 03        db 3         ; vmop03_41B1 - *(uint16be_t*)param1 = param2
51B2 C8 29     dw 29C8h     ; c829
51B4 14 37     dw 3714h     ; 1437
51B6 00        db 0         ; vmop00_end_419E
51B7 00        db 0         ; witchspell00_ 5A52 end incantation and thunk to HL (which is next byte)
51B8  ; ---------------------------------------------------------------------------
51B8 C9        ret          ; return from this invocation of witchcraft

So, are we done yet with embedded scripting engines?  I'm not so sure.  The magic dispatch function that is used by the witchcraft mechanism (for thunking via a subsequent table of function pointers) is referenced in another place:

...
4E54 D2 A7 53  jp      nc, sub_53A7                ; A >= 45; too high
4E57 CD 05 59  call    doMagicDispatch_5905        ; dispatch (not call) by A with subsequent table
4E5A B4 4E  dispatch_4E5A:dw sub_4EB4
4E5C CC 4E     dw sub_4ECC
4E5E E1 4E     dw sub_4EE1
...

So there is another mechanism with 45 'opcodes'.  ¡Ay, chihuahua!

I apologise for my whimsical naming of 'witchcraft', 'spells', and 'incantations', but that silliness helps to keep me going.  I intended to change it to something more pedestrian, but now that I've gone through it all, I'm not sure if I should.  I've got to call it something, and there's so much more to decipher.

But it's Friday, and maybe this weekend I will go outside and get some light for a change....

Discussions