Inside Five CPU Emulators
Building cycle-accurate emulators for the Intel 8008, 8080, 8086/8088, Zilog Z80, and MOS 6502 in Rust compiled to WebAssembly—instruction decoding, flag handling, and the quirks that made these chips legendary.
Inside Five CPU Emulators
Why one emulator isn’t enough when you’re chasing the microcomputer era
I didn’t set out to write five CPU emulators. I set out to write one, realized the architecture decisions I’d made would generalize nicely, and then fell into what I can only describe as a period of obsessive completionism. If the 8080 was the CPU that made microcomputers happen, shouldn’t I also emulate the 8008 that came before it? And if I have the 8080, the Z80 is right there—it’s mostly a superset. And if I’m doing Intel’s 8-bit lineage, surely the 6502 deserves representation…
You see how this goes.
The result is five complete CPU cores in Rust, all compiled to WebAssembly, all running in the browser. Each has its own quirks, its own historical context, and its own collection of “why did they design it that way?” moments. This is the story of building them.
The Unified Architecture
Before getting into individual CPUs, let me explain the shared infrastructure. Every CPU core implements the same trait:
pub trait Cpu {
fn reset(&mut self);
fn step(&mut self) -> u32; // Returns cycles consumed
fn get_state(&self) -> CpuState;
fn set_state(&mut self, state: &CpuState);
fn read_memory(&self, addr: u32) -> u8;
fn write_memory(&mut self, addr: u32, val: u8);
}
The step() method executes one instruction and returns the number of clock cycles it consumed. This is critical for timing—the emulator’s scheduler needs to know how long each instruction takes to keep everything synchronized with peripheral timing.
The get_state()/set_state() methods enable save states. Each CPU serializes its registers, flags, memory, and internal state to a format that can be stored and restored. This is how you save your game, close your browser, come back tomorrow, and pick up exactly where you left off.
// Intel 8008 state serialization
pub fn get_state(&self) -> Intel8008State {
Intel8008State {
a: self.a,
b: self.b,
c: self.c,
d: self.d,
e: self.e,
h: self.h,
l: self.l,
pc: self.pc,
stack: self.stack.clone(), // 7-level hardware stack
stack_ptr: self.stack_ptr,
memory: self.memory.clone(),
flags: self.flags,
halted: self.halted,
}
}
The memory model varies by CPU—the 8008 has 16KB addressable, the 8080/Z80/6502 have 64KB, and the 8086/8088 has 1MB through segment:offset addressing. But the interface is the same: read_memory(addr) and write_memory(addr, val).
Intel 8008 (1972): The Calculator CPU
The Intel 8008 was designed as a terminal controller chip, but it ended up powering some of the first personal computers—the Mark-8 and the SCELBI. It’s a strange beast by modern standards: 8-bit data bus, 14-bit address space (16KB), and a hardware stack that lives on-chip rather than in memory.
Registers and the Stack Problem
The 8008 has seven 8-bit registers: A (accumulator), B, C, D, E, H, and L. The H:L pair can be used as a 14-bit memory pointer. So far, so normal.
But the stack is weird. Instead of a stack pointer that indexes into RAM, the 8008 has a 7-level hardware stack built into the CPU itself. When you CALL a subroutine, the return address gets pushed onto this internal stack. When you RET, it pops off.
pub struct Intel8008 {
// Registers
a: u8, b: u8, c: u8, d: u8, e: u8, h: u8, l: u8,
// 14-bit program counter
pc: u16,
// Hardware stack: 7 levels of 14-bit addresses
stack: [u16; 8], // Actually 7 usable levels
stack_ptr: u8,
// 16KB address space
memory: Vec<u8>,
// Flags: Sign, Zero, Parity, Carry
flags: u8,
halted: bool,
}
Seven levels sounds limiting, and it is. Nest your subroutines too deep, and the oldest return addresses simply fall off the bottom. There’s no stack overflow exception—you just lose the ability to return correctly. Early programmers learned to keep their call hierarchies flat.
(The Calculator-8008 peripheral in the emulator uses this chip to run an HP-35 style calculator program. BCD arithmetic, 12-digit display, and all the satisfaction of watching an LED calculator operate at the silicon level.)
Instruction Decoding
The 8008 has about 66 instructions, encoded in a relatively straightforward way. Most instructions are single-byte; some have an 8-bit or 16-bit immediate operand following. The decode logic looks at the high bits to determine the instruction class:
fn step(&mut self) -> u32 {
if self.halted {
return 1;
}
let opcode = self.fetch_byte();
match opcode {
// HLT - Halt
0x00 | 0x01 | 0xFF => {
self.halted = true;
1
}
// MOV r1, r2 - Register to register
0xC0..=0xFF if (opcode & 0xC0) == 0xC0 => {
let dst = (opcode >> 3) & 0x07;
let src = opcode & 0x07;
let val = self.read_reg(src);
self.write_reg(dst, val);
5
}
// MVI r, data - Move immediate to register
0x06 | 0x0E | 0x16 | 0x1E | 0x26 | 0x2E | 0x36 | 0x3E => {
let reg = (opcode >> 3) & 0x07;
let data = self.fetch_byte();
self.write_reg(reg, data);
8
}
// ... 60+ more instructions
}
}
The cycle counts are critical for timing. A simple register move takes 5 cycles. A memory reference through M (the H:L pointer) takes 8 cycles. Get these wrong, and anything timing-sensitive breaks—screen refresh, keyboard scanning, serial communication.
The Flag Gotchas
The 8008 has four flags: Sign, Zero, Parity, and Carry. Notably absent: Auxiliary Carry. This made BCD (binary-coded decimal) arithmetic awkward—you couldn’t use the DAA instruction to adjust results because there was no half-carry flag to detect mid-byte carries.
fn update_flags_szp(&mut self, result: u8) {
// Sign: bit 7 of result
self.set_flag(FLAG_S, (result & 0x80) != 0);
// Zero: result is 0
self.set_flag(FLAG_Z, result == 0);
// Parity: even number of 1 bits
self.set_flag(FLAG_P, result.count_ones() % 2 == 0);
}
Parity is set when the result has an even number of 1 bits. This was useful for serial communication error checking, less useful for everything else. But once it was in the ISA, it stayed there through the 8080, Z80, and even into modern x86.
Intel 8080 (1974): The Altair CPU
The 8080 is where microcomputers really started. It powered the Altair 8800, the IMSAI 8080, and virtually every CP/M machine. If you used a computer in the late 1970s that wasn’t an Apple or a Commodore PET, it probably had an 8080 inside.
The 8080 is essentially an 8008 done right. Same basic register set, but with a proper stack pointer and 64KB address space. It’s what the 8008 would have been if Intel had known in 1971 what they learned by 1974.
The Register File
pub struct Intel8080 {
// Main registers
a: u8,
b: u8, c: u8, // BC pair
d: u8, e: u8, // DE pair
h: u8, l: u8, // HL pair (memory pointer)
// 16-bit registers
sp: u16, // Stack pointer (finally!)
pc: u16, // Program counter
// 64KB address space
memory: Vec<u8>,
// Flags: Sign, Zero, Auxiliary Carry, Parity, Carry
flags: u8,
// Interrupt state
inte: bool, // Interrupt enable
halted: bool,
}
The addition of the SP register transforms the programming model. Now you can push and pop to a stack in RAM, call subroutines without worrying about nesting depth, and implement proper recursive algorithms. The 8080’s stack grows downward—PUSH decrements SP, POP increments it.
Auxiliary Carry and DAA
The 8080 added the Auxiliary Carry flag (also called Half-Carry), which tracks carries out of bit 3. This enables the DAA (Decimal Adjust Accumulator) instruction for BCD arithmetic:
fn daa(&mut self) {
let mut adjust = 0u8;
let mut carry = self.get_flag(FLAG_C);
// Low nibble adjustment
if (self.a & 0x0F) > 9 || self.get_flag(FLAG_AC) {
adjust += 0x06;
}
// High nibble adjustment
if (self.a >> 4) > 9 || carry || (self.a >> 4) >= 9 && (self.a & 0x0F) > 9 {
adjust += 0x60;
carry = true;
}
self.a = self.a.wrapping_add(adjust);
self.update_flags_szp(self.a);
self.set_flag(FLAG_C, carry);
}
DAA looks at the accumulator and adjusts it so that if you’ve just added two BCD digits, the result is a valid BCD digit. It’s essential for business applications that need exact decimal arithmetic—which in 1974 meant almost all applications. Nobody trusted floating point for financial calculations.
RST Interrupt Vectoring
The 8080 has eight RST n instructions that act like single-byte CALLs to fixed addresses (0x00, 0x08, 0x10, … 0x38). These were designed for interrupt handling—external hardware could jam a RST instruction onto the data bus during an interrupt acknowledge cycle, and the CPU would automatically call that vector.
fn execute_rst(&mut self, vector: u8) {
let addr = (vector as u16) * 8; // RST 0 -> 0x0000, RST 1 -> 0x0008, etc.
self.push_word(self.pc);
self.pc = addr;
}
// External interrupt handling
pub fn trigger_interrupt(&mut self, vector: u8) {
if self.inte {
self.inte = false; // Disable interrupts
self.halted = false;
self.execute_rst(vector);
}
}
Our 8080 BIOS uses RST 1 for UART I/O—calling RST 1 with a character in A outputs it to the terminal. This mirrors how CP/M’s BIOS worked, with RST vectors handling the low-level hardware abstraction.
UART Emulation via Ports
The 8080 has IN and OUT instructions for port-based I/O. Our emulator maps ports 0x00 and 0x01 to a UART:
fn port_in(&mut self, port: u8) -> u8 {
match port {
0x00 => {
// UART status register
let mut status = 0u8;
if self.uart_rx_ready { status |= 0x01; } // RX data available
if self.uart_tx_ready { status |= 0x02; } // TX buffer empty
status
}
0x01 => {
// UART data register (read)
self.uart_rx_ready = false;
self.uart_rx_data
}
_ => 0xFF
}
}
fn port_out(&mut self, port: u8, val: u8) {
match port {
0x01 => {
// UART data register (write)
self.output_buffer.push(val as char);
}
_ => {}
}
}
This is how CP/M talked to terminals, printers, and modems. Poll the status port, check if there’s data or space, read or write the data port. Simple, universal, and it still works today via the emulator’s serial abstraction.
Intel 8086/8088 (1978): The PC CPU
The 8086 was Intel’s first 16-bit processor, and the 8088 (with its 8-bit external bus) powered the original IBM PC. Everything you’re reading this on—every x86 PC, every “IBM compatible”—traces its lineage back to this chip.
The 8086 is a fundamental departure from the 8080. Yes, it has some backwards compatibility modes, but the architecture is radically different: segment:offset addressing, a completely different instruction encoding, and a vastly expanded instruction set.
Segment:Offset Addressing
The 8086 has a 20-bit address bus (1MB) but only 16-bit registers. The solution: segment registers. Every memory access combines a 16-bit segment and a 16-bit offset to produce a 20-bit physical address:
fn calculate_address(&self, segment: u16, offset: u16) -> u32 {
((segment as u32) << 4) + (offset as u32)
}
The formula is (segment << 4) + offset. A segment of 0x1000 with offset 0x0000 gives address 0x10000. A segment of 0x0FFF with offset 0x0010 gives the same address. This means there are multiple segment:offset pairs that map to the same physical location—a source of both flexibility and confusion.
pub struct Intel8088 {
// General purpose registers (16-bit, split into high/low bytes)
ax: u16, // AX = AH:AL (accumulator)
bx: u16, // BX = BH:BL (base)
cx: u16, // CX = CH:CL (count)
dx: u16, // DX = DH:DL (data)
// Index registers
si: u16, // Source index
di: u16, // Destination index
bp: u16, // Base pointer
sp: u16, // Stack pointer
// Segment registers
cs: u16, // Code segment
ds: u16, // Data segment
es: u16, // Extra segment
ss: u16, // Stack segment
// Program counter
ip: u16, // Instruction pointer (within CS)
// Flags (16-bit)
flags: u16,
// 1MB address space
memory: Vec<u8>,
// Interrupt state
irq_pending: Option<u8>,
interrupt_flag: bool,
}
The 8088 is almost identical to the 8086—same instruction set, same registers—but with an 8-bit external data bus instead of 16-bit. This made it cheaper (fewer pins, simpler board design) at the cost of taking twice as long for 16-bit memory accesses. IBM chose the 8088 for the PC purely on cost grounds.
Complex Instruction Decoding
The 8086/8088 instruction encoding is… elaborate. Instructions can be 1-6 bytes long, with optional prefixes, ModR/M bytes, SIB bytes, and immediate values. The decoder has to handle a lot of cases:
fn decode_modrm(&mut self, modrm: u8) -> (Operand, Operand) {
let mod_bits = (modrm >> 6) & 0x03;
let reg_bits = (modrm >> 3) & 0x07;
let rm_bits = modrm & 0x07;
let reg_operand = self.decode_register(reg_bits, self.current_word_op);
let rm_operand = match mod_bits {
0x00 => self.decode_memory_mode_0(rm_bits),
0x01 => self.decode_memory_mode_1(rm_bits), // + disp8
0x02 => self.decode_memory_mode_2(rm_bits), // + disp16
0x03 => self.decode_register(rm_bits, self.current_word_op),
_ => unreachable!()
};
(reg_operand, rm_operand)
}
The ModR/M byte encodes both operands for two-operand instructions. Mode 00/01/02 are memory references with different displacement sizes; mode 11 is register-to-register. The RM field specifies which address calculation to use—[BX+SI], [BP+DI], etc.
Interrupt Handling
The 8088 implementation supports configurable interrupt numbers:
pub fn trigger_irq(&mut self, int_num: u8) {
self.irq_pending = Some(int_num);
}
fn handle_interrupt(&mut self, int_num: u8) {
// Push flags, CS, IP
self.push_word(self.flags);
self.push_word(self.cs);
self.push_word(self.ip);
// Clear IF and TF
self.flags &= !(FLAG_IF | FLAG_TF);
// Load handler from IVT (Interrupt Vector Table)
let vector_addr = (int_num as u32) * 4;
self.ip = self.read_word(vector_addr);
self.cs = self.read_word(vector_addr + 2);
}
The Interrupt Vector Table lives at the bottom of memory (0x0000-0x03FF), with 256 four-byte entries containing CS:IP pairs for each interrupt handler. This is how DOS, BIOS, and drivers all hook into the system—write your handler, poke its address into the IVT, and you’ve intercepted that interrupt.
Zilog Z80 (1976): The 8-Bit King
The Z80 was Federico Faggin’s revenge chip. After designing the 8080 at Intel, he left to start Zilog and created a CPU that was software-compatible with the 8080 but far more capable. It became the most popular 8-bit CPU ever made, powering the TRS-80, ZX Spectrum, MSX, Game Boy, and countless arcade machines.
The Expanded Register Set
The Z80 takes the 8080’s registers and doubles them:
pub struct Z80 {
// Main register set
a: u8, f: u8,
b: u8, c: u8,
d: u8, e: u8,
h: u8, l: u8,
// Alternate register set
a_alt: u8, f_alt: u8,
b_alt: u8, c_alt: u8,
d_alt: u8, e_alt: u8,
h_alt: u8, l_alt: u8,
// Index registers (Z80 addition)
ix: u16,
iy: u16,
// Stack pointer and program counter
sp: u16,
pc: u16,
// Interrupt registers
i: u8, // Interrupt page register
r: u8, // Refresh register
// Interrupt flip-flops
iff1: bool,
iff2: bool,
interrupt_mode: u8, // 0, 1, or 2
// NMI state
nmi_pending: bool,
// Memory
memory: Vec<u8>,
}
The alternate registers (A’, F’, B’, C’, etc.) can be swapped with the main set using the EXX and EX AF,AF’ instructions. This enables blazing-fast context switches—interrupt handlers can swap to alternate registers, do their work, swap back, and leave the main program’s state completely untouched. No pushing, no popping.
IX and IY: Indexed Addressing
The Z80’s IX and IY registers enable indexed addressing with displacement:
// LD A,(IX+d) - Load A from memory at IX + signed displacement
fn ld_a_ix_d(&mut self) {
let displacement = self.fetch_byte() as i8;
let addr = self.ix.wrapping_add(displacement as u16);
self.a = self.read_byte(addr);
}
This is transformative for structured data. Instead of manually calculating offsets, you point IX or IY at a data structure and access fields with (IX+0), (IX+1), etc. High-level language compilers loved this—it mapped directly to record and array access.
The Three Interrupt Modes
The Z80 has three interrupt modes, selected by IM 0, IM 1, and IM 2 instructions:
fn handle_maskable_interrupt(&mut self) {
if !self.iff1 {
return;
}
self.iff1 = false;
self.iff2 = false;
self.halted = false;
match self.interrupt_mode {
0 => {
// Mode 0: Execute instruction on data bus (8080 compatible)
// Usually RST vector
self.execute_rst(self.data_bus_value);
}
1 => {
// Mode 1: RST 38h (simplest mode)
self.push_word(self.pc);
self.pc = 0x0038;
}
2 => {
// Mode 2: Vectored interrupts via I register
let vector_addr = ((self.i as u16) << 8) | (self.data_bus_value as u16);
self.push_word(self.pc);
self.pc = self.read_word(vector_addr);
}
_ => panic!("Invalid interrupt mode")
}
}
Mode 2 is the magic one. The I register provides the high byte of a vector table address; the interrupting device provides the low byte. This means you can have up to 128 different interrupt handlers, each peripheral getting its own vector. The ZX Spectrum used Mode 1 (simple but inflexible); CP/M machines often used Mode 2.
Non-Maskable Interrupts
The Z80’s NMI (Non-Maskable Interrupt) always gets through—it can’t be disabled. When triggered, it copies IFF1 to IFF2 (preserving the interrupt enable state), disables interrupts, and jumps to address 0x0066:
fn handle_nmi(&mut self) {
if !self.nmi_pending {
return;
}
self.nmi_pending = false;
self.halted = false;
// Save interrupt state
self.iff2 = self.iff1;
self.iff1 = false;
// Jump to NMI handler
self.push_word(self.pc);
self.pc = 0x0066;
}
// RETN - Return from NMI
fn retn(&mut self) {
self.pc = self.pop_word();
self.iff1 = self.iff2; // Restore interrupt state
}
The RETN instruction (return from NMI) restores IFF1 from IFF2, re-enabling interrupts if they were enabled before the NMI. This subtle state preservation made the Z80 much more reliable for real-time systems than the 8080.
MOS 6502 (1975): The Apple and Commodore CPU
The 6502 took a completely different approach than Intel’s chips. While the 8080 emphasized orthogonal register operations, the 6502 embraced a minimalist accumulator architecture with extremely efficient memory addressing. It was cheaper to manufacture, faster at memory operations, and became the heart of the Apple II, Commodore 64, Atari 2600, and NES.
The Minimalist Register Set
pub struct MOS6502 {
// Main registers
a: u8, // Accumulator
x: u8, // X index
y: u8, // Y index
sp: u8, // Stack pointer (page 1 only: 0x0100-0x01FF)
pc: u16, // Program counter
// Processor status
p: u8, // Flags: NV-BDIZC
// Memory
memory: Vec<u8>,
// Interrupt state
irq_pending: bool,
nmi_pending: bool,
nmi_edge_detected: bool,
}
Three registers. That’s it. The 6502 makes up for this with thirteen addressing modes that let you access memory in every conceivable way. Where the 8080 might MOV A,M to load through HL, the 6502 can do:
- Zero page:
LDA $42(fast, single-byte address) - Zero page indexed:
LDA $42,X(base + X) - Absolute:
LDA $1234(full 16-bit address) - Absolute indexed:
LDA $1234,XorLDA $1234,Y - Indirect:
JMP ($1234)(pointer dereference) - Indexed indirect:
LDA ($42,X)(table of pointers) - Indirect indexed:
LDA ($42),Y(pointer + offset)
Zero Page: The Fast Lane
The first 256 bytes of memory (0x00-0xFF) are special on the 6502. Instructions that reference zero page use single-byte addresses, saving a cycle and a byte compared to full 16-bit addresses:
fn lda_zp(&mut self) -> u32 {
let addr = self.fetch_byte() as u16;
self.a = self.read_byte(addr);
self.update_nz(self.a);
3 // 3 cycles for zero page
}
fn lda_abs(&mut self) -> u32 {
let addr = self.fetch_word();
self.a = self.read_byte(addr);
self.update_nz(self.a);
4 // 4 cycles for absolute
}
Smart 6502 programmers used zero page as an extension of the register file—frequently accessed variables lived there, turning slow memory accesses into fast pseudo-register operations.
The JMP Indirect Bug
The 6502 has a famous hardware bug in its JMP (addr) instruction. If the indirect address falls on a page boundary (e.g., 0x12FF), the high byte is fetched from 0x1200 instead of 0x1300:
fn jmp_indirect(&mut self) -> u32 {
let ptr = self.fetch_word();
// Famous 6502 bug: doesn't cross page boundary correctly
let low = self.read_byte(ptr);
let high_addr = if (ptr & 0xFF) == 0xFF {
ptr & 0xFF00 // Wrap within page (bug!)
} else {
ptr.wrapping_add(1)
};
let high = self.read_byte(high_addr);
self.pc = ((high as u16) << 8) | (low as u16);
5
}
This bug exists in every 6502 ever made. Some software actually depends on it. The emulator faithfully replicates the bug, because “correct” behavior would break programs that expect the incorrect behavior.
(The 65C02, a later CMOS version, fixed this bug. But we’re emulating the original NMOS 6502, bugs and all.)
Page Crossing Penalties
Some addressing modes take an extra cycle if the calculated address crosses a page boundary:
fn lda_abs_y(&mut self) -> u32 {
let base = self.fetch_word();
let addr = base.wrapping_add(self.y as u16);
self.a = self.read_byte(addr);
self.update_nz(self.a);
// Page crossing adds a cycle
if (base & 0xFF00) != (addr & 0xFF00) {
5
} else {
4
}
}
This affects performance tuning on the 6502. Careful programmers aligned data structures to avoid page crossings in hot loops. The cycle counts in our emulator track this precisely, which matters for anything timing-critical—like video display code that has to finish before the beam reaches the next scanline.
Decimal Mode
The 6502 has a decimal mode flag (D) that makes ADC and SBC operate in BCD. Unlike the 8080’s DAA which adjusts after the fact, the 6502’s decimal mode changes how the addition itself works:
fn adc_decimal(&mut self, val: u8) {
let mut low = (self.a & 0x0F) + (val & 0x0F) + self.carry_val();
let mut high = (self.a >> 4) + (val >> 4);
if low > 9 {
low -= 10;
high += 1;
}
if high > 9 {
high -= 10;
self.set_flag(FLAG_C, true);
} else {
self.set_flag(FLAG_C, false);
}
self.a = ((high << 4) | (low & 0x0F)) as u8;
self.update_nz(self.a);
}
(The NES’s 6502 variant, the 2A03, has the decimal mode flag but it doesn’t actually do anything. Nintendo removed the decimal logic to save die space. Our emulator can be configured for either behavior.)
Complete Opcode Coverage
The 6502 has 256 possible opcodes. Of those, 151 are documented (“official”). The remaining 105 are undocumented opcodes—they do something, because the silicon doesn’t have a “this opcode is invalid” exception. Some are useful (like LAX, which loads A and X simultaneously), some are unpredictable, and some halt the CPU entirely.
// Complete switch covers all 256 opcodes
match opcode {
0x00 => self.brk(),
0x01 => self.ora_ix(),
0x02 => self.jam(), // Undocumented: halts CPU
0x03 => self.slo_ix(), // Undocumented: ASL + ORA
// ... all 256 cases ...
0xFF => self.isb_abs_x(), // Undocumented: INC + SBC
}
Our implementation covers all 256 opcodes. The compiler confirms complete coverage—there’s no default case because every opcode is explicitly handled. This matters because some commercial games (and a lot of demoscene code) deliberately use undocumented opcodes for performance or copy protection.
WebAssembly Compilation
All five CPU cores compile to WebAssembly using wasm-pack:
# Build all CPU cores for WASM
cd cores/intel-8008 && wasm-pack build --target web
cd cores/intel-8080 && wasm-pack build --target web
cd cores/intel-8088 && wasm-pack build --target web
cd cores/z80 && wasm-pack build --target web
cd cores/mos-6502 && wasm-pack build --target web
The WASM binaries are surprisingly compact:
| CPU | WASM Size | Opcodes | Addressing Modes |
|---|---|---|---|
| Intel 8008 | 42 KB | 66 | 3 |
| Intel 8080 | 58 KB | 244 | 7 |
| Intel 8088 | 127 KB | 300+ | 8 |
| Z80 | 89 KB | 693 | 11 |
| MOS 6502 | 67 KB | 256 | 13 |
The 8088 is largest because of the complex instruction decoding. The Z80 has the most opcodes (including all the IX/IY prefix variations), but the encoding is regular enough that the code compiles smaller than you’d expect.
Memory Management
Each CPU allocates its memory as a Vec<u8> in Rust, which becomes linear memory in WASM. The JavaScript side can access this memory directly for display rendering and peripheral I/O:
// Get pointer to CPU memory from WASM
const memoryPtr = cpu.get_memory_ptr();
const memoryLen = cpu.get_memory_len();
// Create typed array view into WASM memory
const cpuMemory = new Uint8Array(
wasmInstance.exports.memory.buffer,
memoryPtr,
memoryLen
);
// Now we can read/write CPU memory from JavaScript
const videoRam = cpuMemory.slice(0x8000, 0x8000 + 0x400);
This zero-copy sharing between WASM and JavaScript is crucial for performance. Video rendering can read directly from the emulated CPU’s memory without copying data across the boundary.
Lessons Learned
Building five CPU emulators taught me a few things:
1. Flags are harder than instructions. Getting the instruction logic right is usually straightforward—you read the spec and implement what it says. But flag behavior has subtle edge cases everywhere. Does a rotate instruction affect the zero flag? (Depends on the CPU.) Does an increment affect carry? (Usually not, but check.) Does DAA set all flags or just some? The flag tests took longer to write than the instruction tests.
2. Cycle counts matter. An emulator that doesn’t track timing is a simulator, not an emulator. Real software depends on exact timing—video drivers that race the beam, floppy controllers that expect precise bit timing, music routines that count cycles for tempo. Getting cycle counts wrong breaks surprisingly much software.
3. The manuals lie (sometimes). Intel’s 8080 manual has errors. The 6502 documentation contradicts itself. The Z80 manual doesn’t fully describe undocumented opcodes (obviously). Cross-reference multiple sources, write test cases, and trust actual behavior over documentation when they conflict.
4. Edge cases are the whole game. The JMP indirect bug. The DAA flag behavior. Page-crossing penalties. Half-carry propagation. These aren’t edge cases—they’re the cases. Simple instruction execution is easy; perfect emulation is endless debugging.
5. Rust is excellent for this. The type system catches off-by-one errors. Pattern matching makes instruction decoding clean. No GC pauses during emulation. And WASM compilation just works.
References and Further Reading
If you want to go deeper on any of these CPUs, I maintain a collection of original documentation at emulator.ca:
- Intel 8008: MCS-8 User’s Manual (1972)
- Intel 8080: Intel 8080 Microcomputer Systems User’s Manual (1975)
- Intel 8086/8088: iAPX 86/88 User’s Manual (1979)
- Zilog Z80: Z80 CPU User’s Manual (1976)
- MOS 6502: MOS 6500 Series Hardware Manual (1976)
For testing, I used:
- 8080: Kelly Smith’s test suite and Microcosm’s exerciser
- Z80: ZEXALL and ZEXDOC
- 6502: Klaus Dormann’s functional tests
Related Articles
- Deep Dive: The Z-Machine Interpreter — Another kind of emulation: Infocom’s virtual machine
- Deep Dive: KCS Cassette Storage — How the 6502-era stored programs on tape
- 2026-02-01: Server Backends and Peripherals — The Calculator-8008 peripheral that runs a real 8008
Five chips, five decades of computing history. The 8008 taught us what microprocessors could be. The 8080 proved they could replace minicomputers. The 8086 scaled them up. The Z80 refined them. And the 6502 made them cheap enough for everyone. Building their emulators is building a museum—one that still runs the original code.