Deep Dive: The Z-Machine Interpreter

The bug that cost me three hours was a single byte. Branch offsets in Z-Machine bytecode are measured from after the branch data, not from the opcode, and I kept measuring from the wrong place. When I finally logged the PC values before and after each branch, the pattern became obvious: every backward jump landed two bytes early. One line fixed it, but finding that line meant understanding how Infocom packed an entire virtual machine into constraints I would never choose today.

In this project I built a transpiler in Rust that converts Z-Machine bytecode into ZIL AST, then runs that AST through an existing ZIL interpreter. It is an unusual route—most Z-Machine implementations interpret bytecode directly—but it buys clarity: I can dump the generated ZIL and see exactly what a story file is doing, instruction by instruction.

Opcode Shapes: The Decoder Is the Choke Point

Z-Machine instruction decoding is hard not because there are many opcodes, but because the opcode shape is encoded into the same byte as the opcode itself. The first byte decides whether this is OP0, OP1, OP2, or VAR, and that decision controls how many operands follow and how to read them.

The rule set:

0xxxxxxx → long form (OP2), operand types encoded in bits 5-6
10ttnnnn → short form (OP1 or OP0 depending on operand type bits)
11xxxxxx → variable form (VAR), types byte follows
0xBE → extended (V5+ only)

In Rust, this becomes a pure function that I can test in isolation:

fn decode_opcode_type(byte: u8) -> (OpcodeType, u8) {
    if byte == 0xBE {
        (OpcodeType::EXT, byte)
    } else if byte & 0xC0 == 0xC0 {
        // VAR opcode: 11xxxxxx
        (OpcodeType::VAR, byte & 0x1F)
    } else if byte & 0xC0 == 0x80 {
        // Short form: 10xxxxxx
        let op_type = (byte >> 4) & 0x03;
        if op_type == 0x03 {
            (OpcodeType::OP0, byte & 0x0F)
        } else {
            (OpcodeType::OP1, byte & 0x0F)
        }
    } else {
        // Long form: 0xxxxxxx
        (OpcodeType::OP2, byte & 0x1F)
    }
}

I keep this logic pure because every downstream structure depends on it. If je and jl swap places because of a mask error, the game doesn’t crash—it just quietly lies. The failure mode is silence, which makes testing essential.

Z-Strings: Three Characters per Word

Infocom’s text encoding packs three 5-bit Z-characters into a 16-bit word, with the high bit marking the final word. That single bit is the terminator, so a decoder must read words, unpack Z-characters, and stop exactly when the high bit appears.

The alphabets are tiny and the shifts are transient:

A0 (default): lowercase a–z
A1 (shift 4): uppercase A–Z
A2 (shift 5): punctuation, digits, special characters

Abbreviations add recursion: a Z-character in the 1–3 range triggers a lookup into a separate table, which then decodes as another Z-string. Compression becomes recursive decompression.

The failure mode here is subtle. If you try to “skip” a Z-string without decoding it—say, to find the next instruction boundary—you can miscalculate badly. The terminator bit is only meaningful once you’ve read each word in sequence. I learned this the hard way when my disassembler started misaligning after every inline PRINT instruction.

Branch Offsets and the Off-by-Two Trap

Branch operands are 1–2 bytes encoding a signed 14-bit offset. The relative address is measured from after the branch data, not from the opcode. That single word—“after”—is enough to make backward loops land two bytes early if you get it wrong.

The sign extension is the other trap:

// Two-byte offset: 14 bits, signed
let unsigned = (((branch_byte & 0x3F) as u16) << 8) | (low_byte as u16);
let signed = if unsigned & 0x2000 != 0 {
    (unsigned as i16) | !0x3FFF  // Sign-extend from 14 bits
} else {
    unsigned as i16
};

Special cases add more complexity: offset 0 means “return false” and offset 1 means “return true.” These aren’t jumps at all—they’re early exits compressed into the branch encoding.

Once I named these rules explicitly in the code, the off-by-two bugs disappeared. The naming was the fix.

Variables: Three Storage Classes

The storage model is compact and fixed:

Variable	Storage
0	Evaluation stack (push/pop)
1–15	Locals (per-routine, up to 15)
16–255	Globals (240 words at a fixed address)

The interpreter must know which class a variable refers to at decode time to emit the right ZIL symbol or stack operation. In the transpiler, I map these to ZIL prefixes: stack becomes STACK, locals get a period prefix (.L1), and globals get a comma prefix (,G0).

This is another small rule that quietly drives everything else. Get it wrong and variables leak between routines, or globals overwrite locals, and the only symptom is a game that “feels buggy.”

Save/Restore as a Text Protocol

The Z-Machine spec expects save and restore to be portable across platforms, but I didn’t want save/restore logic inside the core interpreter. The solution is a text protocol: the interpreter serializes state to JSON, writes it into the output buffer between ##SAVE_STATE## and ##END_SAVE## markers, and the web worker intercepts those markers to perform the actual file operation.

"SAVE" => {
    match self.serialize_state() {
        Ok(json) => {
            self.output_buffer
                .push_str(&format!("##SAVE_STATE##{}##END_SAVE##\r\n", json));
            Ok(EvalResult::Value(Value::Bool(true)))
        }
        Err(e) => {
            self.output_buffer
                .push_str(&format!("?SAVE ERROR: {}\r\n", e));
            Ok(EvalResult::Value(Value::Bool(false)))
        }
    }
}

It’s ugly but clean: the interpreter stays pure (no filesystem access), the UI stays in charge of storage, and the only contract is a predictable marker string. Restore works the same way—##RESTORE_REQUEST##—and the worker handles the file picker and state injection.

Versions and Constraints

Eight Z-Machine versions exist. This transpiler focuses on V3 (the Infocom era, 1983–1987) and keeps the decoder structure ready for V5+. The constraints matter more than the version numbers:

Story files are capped at 128K for V1–V3, 256K for V4–V5, and 512K for V6–V8
File length encoding changes by version: multiply by 2, 4, or 8 depending on version
Packed addresses compress routine pointers, and the multiplier depends on version
The instruction set is tuned for text: a small core, dense encodings, an expectation of tiny memory

If you internalize those constraints, the rest becomes bookkeeping. Every “weird” design choice—the packed addresses, the 5-bit text encoding, the 14-bit branch offsets—makes sense when you remember these games shipped on 48K machines with floppy drives.

The Z-Machine is a time capsule that still executes. Forty years after Zork, the same bytecode runs in browsers, on phones, on hardware that Infocom couldn’t have imagined. Building this transpiler forced me to stop fighting the old constraints and start understanding them—to see the 128K limit not as a limitation but as the organizing principle that shaped every other decision.

The transpiler isn’t the most efficient way to run Z-Machine code. Interpreting bytecode directly would be faster. But efficiency wasn’t the goal. The goal was to see the system clearly, and for that, translating each instruction to readable ZIL was exactly the right trade-off.

Related files:

languages/zil/zil-wasm/src/zmachine/opcodes.rs — opcode decoder
languages/zil/zil-wasm/src/zmachine/text.rs — Z-string encoding/decoding
languages/zil/zil-wasm/src/zmachine/transpiler.rs — bytecode to ZIL AST
languages/zil/zil-wasm/src/zmachine/story.rs — story file loader and header parsing
languages/zil/zil-wasm/src/zmachine/symbols.rs — symbol table generation