Inside Five CPU Emulators

I didn’t set out to build five CPU emulators. I set out to build one—the 6502, because that’s what everyone starts with—and then I kept finding reasons to add another, and another, until I had this sprawling mess of Rust crates all pretending to be microprocessors from the 1970s.

The thing that surprised me most wasn’t how different these chips are. It’s how similar they are, once you squint hard enough. Each one is basically a state machine that reads bytes, does something, and tells you how long it took. I ended up with a loose contract across all five: step() runs one instruction and returns a cycle count, get_state() dumps everything to JSON, set_state() restores it. That’s it. The contract isn’t a Rust trait—I kept meaning to formalize it and never did—but it’s close enough that save states and debugging work the same way everywhere.

Whether that abstraction is actually correct, I’m honestly not sure. It works for what I’m doing, but I suspect someone with more emulation experience would find holes in it.

The 6502 came first

I started with the MOS 6502 because it’s the classic beginner’s emulation target, and also because the Apple II and Commodore 64 nostalgia runs deep. The 6502 is a weird little chip—only three registers that matter (A, X, Y), but a surprisingly rich set of addressing modes that let you treat zero page almost like extra registers.

The addressing modes are where I spent most of my time. Zero page gives you fast 8-bit address lookups, absolute gives you the full 64KB, and then there’s indexed indirect and indirect indexed, which sound similar but behave completely differently. (I got them backwards at least three times before the tests caught it.) The 6502 also has this famous bug where JMP ($xxFF) wraps within the page instead of crossing into the next one—if you jump indirect through address $12FF, it reads the low byte from $12FF and the high byte from $1200, not $1300. Real software depends on this bug, so you have to emulate it.

I also implemented decimal mode, where the chip does BCD arithmetic, and the undocumented opcodes—the ones that aren’t in any official datasheet but that some games and demos use anyway. The guiding principle became “emulate what the silicon did, not what the manual claimed.”

Then the Z80, because CP/M

After the 6502 worked, I wanted something more complex, and the Z80 seemed like the natural next step. The Z80 is what the 8080 wanted to be when it grew up: same basic register layout, but with shadow registers for fast context switches, indexed addressing via IX and IY, and a proper interrupt system with three different modes.

The shadow registers are the interesting part. You can swap the main register set with the alternate set in a single instruction—EX AF, AF' swaps A and flags, EXX swaps BC/DE/HL with their primed counterparts. It’s basically hardware support for interrupt handlers that need to preserve state without pushing everything to the stack. The Z80 core has to track both register banks and know which one is “current” at any moment, which adds complexity but isn’t conceptually hard.

What was hard: the undocumented instructions. The Z80 has about forty opcodes that Zilog never documented but that work consistently across all silicon revisions. Things like accessing the high and low bytes of IX and IY as separate 8-bit registers (IXH, IXL, IYH, IYL), or the quirky behaviour of the DDCB and FDCB prefix combinations where the result gets copied to a register even when the opcode doesn’t look like it should. I ran the emulator against the raxoft/z80test suite overnight and woke up to a list of failures that took me a full day to work through. The tests don’t lie—if your emulator disagrees with real hardware, your emulator is wrong.

The 8088 for DOS nostalgia

The Intel 8088 is where things got messy. The 8088 is the chip that powered the original IBM PC, and it introduces segmented memory—a 20-bit address space accessed through 16-bit registers and 16-bit segments, where physical = (segment << 4) + offset. This one formula creates the entire DOS memory map with all its aliasing weirdness, where the same physical address can be reached through multiple segment

combinations.

I spent more time debugging flag behaviour on the 8088 than on any other core. The auxiliary carry flag in particular—it tracks carries out of bit 3 for BCD operations—kept catching me. I’d run the SingleStepTests suite and watch my pass rate climb from 74% to 96% as I fixed one flag edge case after another. There are still a handful of obscure instructions I’m not confident about, but the common ones work.

The BIOS uses INT 0x80 instead of the DOS-style INT 10h/INT 13h vectors, which is historically inaccurate but gives me a cleaner service interface. If you’re hoping to run actual DOS programs, you’ll be disappointed. If you want to write assembly that talks to a simulated serial port, it works fine.

The 8080 filled in the gap

I built the 8008 before the 8080, which in retrospect was a mistake. The 8008 (Intel’s first 8-bit processor, from 1972) is so limited that it’s barely useful—14-bit addressing caps you at 16KB of memory, the stack is an 8-level internal array rather than RAM, and there are no RST interrupt vectors to hook. I built it mostly for completeness and because I was curious how minimal a CPU could be while still being programmable.

And then I immediately learned that I’d screwed up the instruction encoding. The 8008 uses different register numbering than the 8080 family—A is 000 in the 8008 but 111 in the 8080—and I’d initially used the 8080 encoding because that’s what I’d been staring at. Classic mistake. I had to go back and rewrite the opcode decoder to match the authentic 8008 format, replacing instructions that don’t exist (like INR A) with the ones that do (like ADI 1).

The 8080 came after, and it’s essentially the 8008 grown up: proper stack pointer in RAM, 64KB address space, auxiliary carry flag for BCD math. Once you have a real stack pointer, your BIOS can use normal CALL and RET instructions, and interrupts work sensibly. The chip stops being a fancy calculator and becomes a computer. I ran the 8080EXM diagnostics—which executes something like 24 billion cycles—and fixed the auxiliary carry bug in the decrement instruction that had been lurking since initial implementation. (It was inverted. AC should be set when there’s no borrow from bit 4, and I had it backwards.)

What I actually learned

Five cores is probably overkill for any single project. But building all of them taught me what’s universal about CPU emulation and what’s just personality. The universal part is the contract: step, count cycles, serialize. The personality is everything else—the quirky addressing modes, the undocumented opcodes, the bugs that software depends on.

Cycle accuracy turned out to matter more than I expected. Every core returns cycle counts from step(), and those numbers drive the UART pacing, the video timing, anything that pretends to have real time. Get the cycles wrong and audio drifts, video tears, and the test suites start lying to you. I ran overnight tests against all five cores and woke up to commit messages like “fix AF flag clearing in update_flags_logic” and “fix run_cycles() to properly wake from HALT state on interrupt.” The bugs always come out eventually.

The whole thing compiles to WebAssembly, which is nice for browser embedding but wasn’t really a design driver—I’d have made the same architectural choices targeting native code. WASM just lets me ship the result without asking people to install anything.