Close

Ridiculous delvings

A project log for Improbable AVR -> 8088 substitution for PC/XT

Probability this can work: 98%, working well: 50% A LOT of work, and utterly ridiculous.

eric-hertzEric Hertz 01/25/2017 at 23:032 Comments

The 8087 performs about 50,000 FLOPS

AVRs perform about 60,000 FLOPS (here, and here, and maybe here), definitely on the same order...

I had no idea AVRs could do so many... and here I've been avoiding floating-point in darn-near every AVR project for over a decade. Guess that explains how grbl gets away with it.

------------

In my 1988 edition of Upgrading And Repairing PCs, there's an entire section on "In Circuit Emulators", which, in relation to the PC/XT, were, essentially, 286 or 386 processor upgrades for the 8088-based motherboards. (Like this one)

I dunno why they call 'em emulators... seems like a glorified processor upgrade, to me... But ok.

The section has some interesting commentaries maybe relevant to the current status of this project...

Specifically, mention of "Asynchronous" vs. "Synchronous" ICEs. It says something like an async ICE running at 12MHz is equivalent to a synchronous unit running at 7.2MHz.

Hmmm... Maybe I've put too much effort into trying to synchronize this stuff...? (Are we talking synchronization to the system-bus?). Maybe it'd've worked if I hadn't.

--------

What I don't get, though... Is how these things can speed up a system as much as the author claims... 900%!

It really seems to me like the tremendous bottleneck is the bus... These things don't speed that up... (and remember that *everything* is connected through the bus). E.G. At 4.77MHz, and 4 cycles per transaction (minimum), that's only slightly faster than 1MB/s. Considering an average instruction is two bytes, that's a maximum of 500KIPS, which is right on the order of claims. Throw in some wait-states, the low speeds of DRAMs, the fact that some instructions are as many as 6 bytes, etc... and it's understandable that might drop down to the 350KIPS or less listed. (Remember that the 8088 processor processes instructions *separately* from fetching them, almost as though there are two separate parallel processors internally).

So, *some* of these "in circuit emulators" have a full 640K of onboard RAM (some even more), so plausibly there's a faster bus onboard, and the 8088 motherboard is basically little more than some ISA slots and a few peripherals. At which point, the ICE is basically an entire 386 motherboard (daughter-card?) that happens to plug into a "backplane" via a 40pin DIP socket.

But, surely, it still has to make use of the original DMA controller, etc... huh. So, at some point, the "emulator" must keep track of where its bus-transactions should be directed, and prevent them from going to the 8088-bus if they're actually happening onboard (e.g. onboard RAM). Weird stuff. Guess it is a bit more than just a processor upgrade. Wonder if it has to "boot" first, or whether those sorts of things are handled by logic-gates.

And, what if there's an 8087 on the motherboard...? Does the ICE just "lie" by indicating idle-states, to essentially deselect it during all transactions? Guess that's where "emulation" comes into play.... hmmm. Oh, I suppose that'd be similar to a DMA, which wouldn't indicate Queue-Statii during transactions... so the 8087 (any others?) just wouldn't pay attention... Ah Hah.

Actually, that all seems quite similar to what I'm attempting... just that my attempts are much slower ;)

Discussions

Yann Guidon / YGDES wrote 01/26/2017 at 04:28 point

No wonder that the i486 with its onchip cache was so much faster... add some pipelining and there you go !

  Are you sure? yes | no

Eric Hertz wrote 01/26/2017 at 09:53 point

That *would* make sense, except for many things I probably totally misunderstand:

In the case of say dropping a really fast 8088 in place of a really slow 8088 with its really slow bus, it totally makes sense that e.g. a *really fast* (real) 8088 with a direct connection to a *really fast* 640KB of (cache) RAM would be *really fast*, if it also shadowed the BIOS, and only accessed the original (slow) 8088 bus for device-I/O...

But I still have a hard time wrapping my head around the idea that (small amounts of) cache is very helpful at all, except when the programmer actually explicitly allocates ALL the parts that should be cached (both instructions *and* data). Which most caching schemes don't do, right? The processor usually handles caching via its own predictive-scheme. In the case of a long program with no loops at all, cache would have no effect on speed at all! Once it filled, it'd reach the end and cache-miss on every instruction thereafter! Those types of programs aren't too common, but combine all that with threading, multitasking, context-switching... nevermind mere function-calls (and library/OS-calls!), seems to me cache would be rendered moot unless it's large enough to handle all the tasks that are running, in their entirety. Otherwise, you're basically pushing and popping the entirety of the cache back through the slower memory-interface with every context-switch!

And, then, looking at the numbers I came up with, that basically the 8088's number of MIPS is limited by its bus, rather than its actual internal processing-speed... pipelining wouldn't do a thing to speed that up. (without as-fast cache).

Still not wrapping my head around this caching/pipelining stuff, obviously.

  Are you sure? yes | no