Back to Articles
2026 / 01
| 11 min read

Deep Dive: The Z-Machine Interpreter

Building an Infocom Z-Machine interpreter in Rust—instruction decoding, Z-strings, call frames, and understanding what makes Zork tick at the bytecode level.

emulator z-machine infocom rust interpreters

Deep Dive: The Z-Machine Interpreter

The first time I looked at the Z-Machine specification, I thought “this is going to be straightforward—it’s just a simple virtual machine from 1979.” Two weeks later, I was debugging a signed 14-bit branch offset calculation at 2 AM and questioning my life choices. The Z-Machine is deceptively deep, and every assumption you make about “how hard could it be” will eventually bite you.

But here’s the thing: if you want to understand how Zork works—really understand it, at the bytecode level—you have to build one of these. And once you do, you’ll have a new appreciation for what Infocom accomplished with 128KB of memory.


What Even Is a Z-Machine?

The Z-Machine is a virtual computer designed by Infocom in 1979 for running interactive fiction. The name comes from “Zork,” their first (and most famous) game. The brilliant insight was this: instead of writing Zork for every platform separately, they’d write it once for a fictional machine, then write small interpreters for that fictional machine on every real platform.

Sound familiar? It’s exactly what Java did fifteen years later, what .NET did after that, and what WebAssembly does today. Infocom was doing portable bytecode in the era of cassette tapes.

The Z-Machine has gone through eight versions, from V1 (1979) to V8 (1995). Each version added capabilities—more memory, better text compression, graphics support—but maintained backward compatibility where it could. Our implementation focuses on V3 (the classic Infocom era), with the decoder structure ready for V5 and V8 when we need them.

(V6 is the weird one. It added graphics and mouse support, but only a handful of games ever used it. It’s the Z-Machine equivalent of the Sega CD: technically impressive, commercially irrelevant.)


The Instruction Decoder

The Z-Machine has 119 opcodes in V3, organized by operand count. This sounds tidy until you realize that the encoding is… let’s call it “compact.” The first byte tells you what kind of instruction you’re dealing with, but the rules for decoding that byte are surprisingly intricate.

Here’s how it breaks down:

Byte PatternOpcode TypeOperand Count
0xxxxxxxLong (OP2)Exactly 2 operands
10ttnnnnShort (OP1/OP0)0 or 1 operand
11nnnnnnVariable (VAR)0-8 operands
0xBEExtended (EXT)V5+ only

The tt bits in the short form tell you the operand type. If tt = 11, there’s no operand (OP0). Otherwise, it’s a single operand whose type is encoded in those bits.

fn decode_opcode_type(byte: u8) -> (OpcodeType, u8) {
    if byte == 0xBE {
        // Extended opcode (V5+)
        (OpcodeType::EXT, byte)
    } else if byte & 0xC0 == 0xC0 {
        // VAR opcode: 11xxxxxx
        (OpcodeType::VAR, byte & 0x1F)
    } else if byte & 0xC0 == 0x80 {
        // Short form: 10xxxxxx (OP1 or OP0)
        let op_type = (byte >> 4) & 0x03;
        if op_type == 0x03 {
            (OpcodeType::OP0, byte & 0x0F)
        } else {
            (OpcodeType::OP1, byte & 0x0F)
        }
    } else {
        // Long form: 0xxxxxxx (OP2)
        (OpcodeType::OP2, byte & 0x1F)
    }
}

This is where most Z-Machine bugs hide. Get the bit masking wrong by one position, and je (jump if equal) becomes jl (jump if less than), and your game logic silently breaks. Ask me how I know.

Operand Types

Operands come in three flavors:

#[derive(Debug, Clone, Copy, PartialEq)]
pub enum OperandType {
    LargeConst, // 16-bit constant (2 bytes)
    SmallConst, // 8-bit constant (1 byte)
    Variable,   // Variable number (1 byte)
}

For OP2 instructions in long form, the operand types are encoded in bits 6 and 5 of the opcode byte itself—a neat space optimization that means you never waste a byte on a type specifier. For VAR instructions, a separate “types byte” follows the opcode, packing four 2-bit type codes that tell you how to read the next 0-4 operands.

fn decode_operands_var(memory: &[u8], offset: &mut usize) -> Result<Vec<Operand>, String> {
    let types_byte = memory[*offset];
    *offset += 1;

    let mut operands = Vec::new();
    for i in 0..4 {
        let op_type = (types_byte >> (6 - i * 2)) & 0x03;
        match op_type {
            0x00 => {
                // Large constant (16-bit)
                let value = read_word(memory, *offset)?;
                *offset += 2;
                operands.push(Operand::Const(value as i16));
            }
            0x01 => {
                // Small constant (8-bit)
                let value = memory[*offset];
                *offset += 1;
                operands.push(Operand::Const(value as i16));
            }
            0x02 => {
                // Variable reference
                let var = memory[*offset];
                *offset += 1;
                operands.push(Operand::Variable(var));
            }
            0x03 => break, // No more operands
            _ => {}
        }
    }
    Ok(operands)
}

The 0x03 type code means “omitted”—the operand list terminates. This is how VAR opcodes can have anywhere from 0 to 4 operands (or up to 8 in double-VAR form, but let’s not go there today).


Z-String Text Encoding

If you’ve ever wondered why old adventure games had such distinctive prose styles—short sentences, limited vocabulary—part of the answer is Z-string compression. Infocom squeezed text into 5 bits per character using a scheme that’s clever, compact, and absolutely maddening to debug.

Here’s the idea: pack three 5-bit “Z-characters” into each 16-bit word. The high bit of the word indicates whether it’s the last word in the string.

Word format: e zzzzz zzzzz zzzzz
             │ └─────────────────── 3 Z-characters (5 bits each)
             └───────────────────── End bit (1 = last word)

The 5-bit Z-characters map to one of three alphabets:

Z-charA0 (default)A1 (shift 4)A2 (shift 5)
0spacespacespace
1-3abbreviationabbreviationabbreviation
4shift to A1shift to A2shift to A0
5shift to A2shift to A0shift to A1
6-31a-zA-Zpunctuation + digits
fn lookup_zchar(zchar: u8, alphabet: usize) -> Result<char, String> {
    const A0: &[char] = &[
        ' ', ' ', ' ', ' ', ' ', ' ', // 0-5 (special codes)
        'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
        'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
    ];

    const A1: &[char] = &[
        ' ', ' ', ' ', ' ', ' ', ' ',
        'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
        'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
    ];

    const A2: &[char] = &[
        ' ', ' ', ' ', ' ', ' ', ' ',
        ' ', '\n', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
        '.', ',', '!', '?', '_', '#', '\'', '"', '/', '\\', '-', ':', '(', ')',
    ];

    let alphabet_table = match alphabet {
        0 => A0,
        1 => A1,
        2 => A2,
        _ => return Err(format!("Invalid alphabet: {}", alphabet)),
    };

    Ok(alphabet_table[zchar as usize])
}

Z-characters 1-3 trigger abbreviation mode: the next Z-character becomes an index into an abbreviation table, and the abbreviation (itself a Z-string) gets recursively decoded and inserted. It’s compression on top of compression.

(The abbreviation system is why Zork can say “You are standing in an open field west of a white house” without blowing the memory budget. Common phrases get abbreviation slots, and the actual text becomes a sequence of abbreviation references.)

The Terminator Gotcha

The string terminator is the high bit of the last word. This seems simple until you realize that you need to decode the string to know where it ends, and you need to know where it ends to know what comes after it (store variable? branch? next instruction?). The decoder has to track this carefully:

pub fn decode_zstring(
    memory: &[u8],
    addr: usize,
    max_bytes: usize,
    abbreviations: &[(usize, String)],
) -> Result<String, String> {
    let mut result = String::new();
    let mut offset = 0;
    let mut alphabet = 0;
    let mut abbrev_mode = 0;

    loop {
        if offset >= max_bytes {
            break;
        }

        let word_addr = addr + offset;
        let word = ((memory[word_addr] as u16) << 8) | (memory[word_addr + 1] as u16);
        offset += 2;

        // Extract 3 Z-characters (5 bits each)
        let zchars = [
            ((word >> 10) & 0x1F) as u8,
            ((word >> 5) & 0x1F) as u8,
            (word & 0x1F) as u8,
        ];

        for zchar in &zchars {
            // ... decode each character ...
        }

        // Check if this is the last word (high bit set)
        if word & 0x8000 != 0 {
            break;
        }
    }

    Ok(result)
}

The 0x8000 mask checks that high bit. Miss it, and your decoder runs off the end of the string into whatever data happens to follow.


Stack and Call Frame Management

The Z-Machine is a stack machine, but it’s not just a stack machine. It has three kinds of storage:

  1. Global variables (G0-G239): 240 words at a fixed address in memory
  2. Local variables (L1-L15): up to 15 per routine, stored in the call frame
  3. The evaluation stack: for intermediate values

Variable number 0 always refers to the stack (push/pop), 1-15 are locals, 16-255 are globals. This encoding is compact but means the decoder has to track context:

fn operand_to_expr(&self, operand: &Operand) -> Expr {
    match operand {
        Operand::Const(n) => Expr::Number(*n as i32),
        Operand::Variable(v) => {
            if *v == 0 {
                // Stack access (special)
                self.sym("STACK")
            } else if *v < 16 {
                // Local variable
                Expr::Symbol(SymbolRef {
                    prefix: Prefix::None,
                    name: format!(".L{}", v),
                })
            } else {
                // Global variable
                Expr::Symbol(SymbolRef {
                    prefix: Prefix::None,
                    name: format!(",G{}", v - 16),
                })
            }
        }
    }
}

When a routine is called, the Z-Machine creates a call frame containing:

  • Return address
  • Result storage location (where to put the return value)
  • Local variable storage
  • Stack frame pointer (for unwinding)

The routine header declares how many locals it needs, plus initial values for each:

// Read routine header (V1-3: 1 byte num locals + default values)
let num_locals = self.story.read_byte(pc);
pc += 1;

// Skip default local values (V1-3: 2 bytes per local)
pc += (num_locals as usize) * 2;

In V4+, the default values are gone—locals initialize to zero—saving memory but changing the bytecode format. Another one of those “simple spec change, subtle implementation impact” situations.


SAVE/RESTORE Implementation

SAVE and RESTORE in the Z-Machine aren’t just about dumping memory to disk. The spec requires that a saved game be portable between interpreters and even platforms. The save file format (Quetzal, since 1997) captures:

  • Dynamic memory (the part that can change during play)
  • The call stack (complete with local variables)
  • The PC where RESTORE should resume

Our interpreter takes a pragmatic approach: serialize the relevant state to JSON and emit it through a special output marker that the browser worker can catch:

"SAVE" => {
    match self.serialize_state() {
        Ok(json) => {
            // Output marker that worker will detect
            self.output_buffer.push_str(
                &format!("##SAVE_STATE##{}##END_SAVE##\r\n", json)
            );
            Ok(EvalResult::Value(Value::Bool(true)))
        }
        Err(e) => {
            self.output_buffer.push_str(&format!("?SAVE ERROR: {}\r\n", e));
            Ok(EvalResult::Value(Value::Bool(false)))
        }
    }
}

The ##SAVE_STATE##...##END_SAVE## markers are a hack, but a deliberate one. The interpreter doesn’t know about the browser’s filesystem APIs—it just outputs text. The worker intercepts these markers and handles the actual file operations. Clean separation, ugly protocol.

RESTORE works in reverse: the worker injects saved state, and the interpreter rebuilds its internal structures. The tricky part is that RESTORE can fail (corrupted save, wrong story file), so it has to return a success/failure value to the calling code.


V3/V5/V8 Version Differences

The Z-Machine versions aren’t just “more opcodes.” They’re fundamentally different in ways that cascade through the implementation:

FeatureV3V5V8
Max story size128KB256KB512KB
Max objects2556553565535
Object entry size9 bytes14 bytes14 bytes
Local defaultsYesNoNo
Packed address multiplier×2×4×8
Unicode supportNoLimitedYes

The packed address multiplier is particularly sneaky. When the Z-Machine stores a routine address, it stores it divided by 2 (V3), 4 (V5), or 8 (V8) to fit larger addresses in 16-bit words. Forget to multiply when decoding a call instruction, and you’ll jump to completely wrong code.

let file_length = match version {
    1..=3 => (Self::read_word_static(bytes, 0x1A) as u32) * 2,
    4..=5 => (Self::read_word_static(bytes, 0x1A) as u32) * 4,
    6..=8 => (Self::read_word_static(bytes, 0x1A) as u32) * 8,
    _ => return Err(format!("Invalid version: {}", version)),
};

Object table layout is another version-dependent detail. V3 uses 9-byte entries (4 bytes attributes, 3 bytes tree pointers, 2 bytes property address). V4+ uses 14-byte entries with larger fields. The code has to check the version and adjust:

let prop_defaults_size = if version <= 3 { 31 * 2 } else { 63 * 2 };
let object_entry_size = if version <= 3 { 9 } else { 14 };

The Transpiler: Z-Machine to ZIL

Rather than executing Z-Machine bytecode directly, we transpile it to ZIL (Zork Implementation Language) AST and run it through our ZIL interpreter. This is unusual—most Z-Machine implementations interpret bytecode directly—but it has some advantages:

  1. Debugging: You can inspect the ZIL output to understand what the bytecode is doing
  2. Integration: The ZIL interpreter already exists and handles all the game state
  3. Flexibility: We can extend the ZIL interpreter without touching the decoder

The transpiler walks through bytecode and emits corresponding ZIL expressions:

fn instruction_to_ast(&mut self, instr: &Instruction, pc: usize) -> Result<Expr, String> {
    match (instr.opcode_type, instr.opcode) {
        // je a b: jump if a == b
        (OpcodeType::OP2, 0x01) => self.je_to_ast(instr, pc),
        
        // add a b → result
        (OpcodeType::OP2, 0x14) => self.add_to_ast(instr),
        
        // print (literal string follows)
        (OpcodeType::OP0, 0x02) => self.print_to_ast(instr),
        
        // rtrue (return true)
        (OpcodeType::OP0, 0x00) => Ok(self.rtrue()),
        
        // ... 115 more opcodes ...
        
        _ => Ok(self.comment(&format!("TODO: opcode {:?} {}", 
            instr.opcode_type, instr.opcode))),
    }
}

The TODO fallback is important. Unimplemented opcodes become comments in the output rather than crashes, which means you can partially transpile a story file and see what’s supported versus what needs work.

Branch Handling

Branches in Z-Machine are particularly fiddly. The branch condition and offset are packed into 1-2 bytes after the instruction:

fn decode_branch(memory: &[u8], offset: &mut usize) -> Result<BranchInfo, String> {
    let branch_byte = memory[*offset];
    *offset += 1;

    let condition = if (branch_byte & 0x80) != 0 {
        BranchType::IfTrue
    } else {
        BranchType::IfFalse
    };

    // Branch offset encoding
    let offset_value = if (branch_byte & 0x40) != 0 {
        // Single byte offset (6 bits)
        (branch_byte & 0x3F) as i16
    } else {
        // Two byte offset (14 bits, signed)
        let low = memory[*offset];
        *offset += 1;
        let unsigned = (((branch_byte & 0x3F) as u16) << 8) | (low as u16);
        // Convert to signed 14-bit
        if unsigned & 0x2000 != 0 {
            (unsigned as i16) | !0x3FFF  // Sign extend
        } else {
            unsigned as i16
        }
    };

    // Special offsets: 0=rfalse, 1=rtrue
    let return_value = match offset_value {
        0 => Some(false),
        1 => Some(true),
        _ => None,
    };

    Ok(BranchInfo { condition, offset: offset_value, return_value })
}

That sign extension on line 18 is where the jump offset bug mentioned in the January 29 post lived. A 14-bit signed value in a 16-bit container needs proper sign extension, and getting the mask wrong means negative branches (loops!) jump to wrong addresses. The fix is that !0x3FFF mask, which sets all the high bits for negative values.


What Went Sideways

The Off-by-Two Bug

Branch offsets in Z-Machine are relative to the address after the branch data, not after the opcode. I initially calculated them from the wrong point, which made forward branches work (close enough!) but backward branches land two bytes too early. This took an embarrassingly long time to figure out because most test programs don’t use backward branches in their first few routines.

Inline String Length

The print and print_ret opcodes include a Z-string directly in the instruction stream. To know where the next instruction starts, you have to decode the string (to find the terminator word), then resume decoding from there. I initially tried to skip this by looking for the 0x8000 pattern, but Z-strings can contain 0x8000 in the middle if the right character combinations line up. You have to actually decode it.

Store/Branch Confusion

Not every opcode that looks like it should store a result actually does. And not every opcode that looks like it should branch actually does. The spec has detailed tables, but they’re easy to misread. I implemented opcode_stores_result() and opcode_has_branch() as explicit lookup tables:

fn opcode_stores_result(opcode_type: OpcodeType, opcode: u8) -> bool {
    match opcode_type {
        OpcodeType::OP2 => matches!(opcode,
            0x0F..=0x18 // loadw, loadb, get_prop, add, sub, mul, div, mod
        ),
        OpcodeType::OP1 => matches!(opcode,
            0x03 | 0x04 | 0x08 | 0x09 | 0x0E | 0x0F
        ),
        OpcodeType::VAR => matches!(opcode, 0x00 | 0x04 | 0x07 | 0x08),
        _ => false,
    }
}

These tables came from the spec, but they still needed correction when running real story files. Trust, but verify.


Testing Against Real Games

The ultimate test is running actual Infocom games. We have minimal test fixtures (minimal-test.z3, print-test.z3) for unit testing, but the real validation comes from loading ZORK1.Z3 and seeing if you can go north from the West of House.

#[test]
fn test_load_minimal_story() {
    let bytes = include_bytes!("fixtures/minimal-test.z3");
    let story = StoryFile::load(bytes).unwrap();

    assert_eq!(story.version, 3);
    assert_eq!(story.entry_point, 0x0300);
    assert_eq!(story.dict_addr, 0x0200);
    assert_eq!(story.object_table_addr, 0x0100);
    assert_eq!(story.global_vars_addr, 0x0080);

    let globals = story.extract_globals();
    assert_eq!(globals.len(), 240);

    let objects = story.extract_objects().unwrap();
    assert!(!objects.is_empty());
}

The minimal test story is a hand-crafted 768-byte file that exercises the header parsing and basic object extraction without needing a full game. It’s small enough to include in the repository and fast enough to run on every commit.


Why This Matters

The Z-Machine is a time capsule. It’s the bytecode format that ran some of the most influential games in computing history—Zork, Planetfall, A Mind Forever Voyaging, Trinity. When you implement a Z-Machine interpreter, you’re not just building a compatibility layer; you’re preserving access to a style of game design that’s genuinely different from anything made today.

Modern games can have voice acting, physics engines, and photorealistic graphics. But they rarely have the density of prose or the puzzle design sophistication of a good Infocom game. Getting the Z-Machine right means those games remain playable, not as museum pieces, but as living software.

And honestly? There’s something satisfying about building a virtual machine that’s older than most of the people who’ll use it. It’s a reminder that good architecture ages well. The Z-Machine’s design—bytecode portability, memory discipline, structured data formats—anticipated problems that we’re still solving in 2026.


  • Opcode Decoder: languages/zil/zil-wasm/src/zmachine/opcodes.rs
  • Z-String Text: languages/zil/zil-wasm/src/zmachine/text.rs
  • Story Loader: languages/zil/zil-wasm/src/zmachine/story.rs
  • Transpiler: languages/zil/zil-wasm/src/zmachine/transpiler.rs
  • Symbol Table: languages/zil/zil-wasm/src/zmachine/symbols.rs
  • ZIL Interpreter: languages/zil/zil-wasm/src/interpreter.rs
  • Test Fixtures: languages/zil/zil-wasm/tests/fixtures/
  • Integration Tests: languages/zil/zil-wasm/tests/integration_test.rs

Previous: 2026-01-29 — Storage peripherals and audio realism