Assemblers, Memory Maps, and Madness
Building cross-assemblers for retro CPUs in Rust, migrating to Pest parsers, and discovering why load addresses matter more than you think.
Assemblers, Memory Maps, and Madness
I spent two weeks writing a 6502 assembler. Then I threw it away and wrote it again using a proper parser generator. Then I spent three more days debugging why my VIC-20 test programs kept corrupting themselves in memory. The assembler was producing correct bytecode. The problem was that I didn’t understand where the code was supposed to live.
Building cross-assemblers for retro CPUs is one of those tasks that seems straightforward until you actually do it. You think: parse some mnemonics, look up opcodes in a table, emit bytes. How hard could it be? The answer is “harder than the CPU emulation itself, in some ways,” because assemblers force you to confront the gap between how we think about programs and how they actually execute.
Why Build Assemblers at All?
Here’s the thing: excellent 6502 and Z80 assemblers already exist. CA65, DASM, xa65, ZMAC, z80asm. They’re battle-tested, well-documented, and free. Why would I spend weeks building my own?
The answer is WASM.
The emulator runs entirely in the browser. Everything compiles to WebAssembly. I could shell out to native tools during development, but that creates a dependency I don’t want in the final product. I want users to be able to type assembly code into a text box and see it run, with no server round-trips and no native binaries.
More importantly, I want the assembler to be part of the emulator architecture. When you write a BIOS routine, the assembler should understand the same memory map the CPU understands. When you define a label, the debugger should be able to show you that label. Native tools can’t do that without elaborate bridging.
So I built them. One for the 6502 (VIC-20, C64, generic), one for the Z80 (future machines), and I’ve got stubs for x86 (unlikely to be completed, but the structure is there). They all compile to WASM and integrate with the rest of the system.
(Was it worth it? Honestly, probably not from a pure efficiency standpoint. But now I understand 6502 addressing modes at a level I never would have otherwise, and that understanding has paid off in debugging the CPU emulation itself.)
The Hand-Rolled Parser: A Cautionary Tale
My first 6502 assembler used a hand-written recursive descent parser. It worked, mostly. I could parse immediate values (LDA #$42), zero-page addresses (STA $80), and labels with forward references. The code looked like this:
fn parse_operand(&self, text: &str) -> Option<AddressingMode> {
let text = text.trim();
// Immediate: #$xx or #decimal
if text.starts_with('#') {
let value = self.parse_number(&text[1..])?;
return Some(AddressingMode::Immediate(value as u8));
}
// Indirect modes: ($xx,X), ($xx),Y, ($xxxx)
if text.starts_with('(') && text.ends_with(')') {
// ... 40 more lines of nested if-statements ...
}
// Zero page vs absolute (ambiguous until we know the value)
if let Some(value) = self.parse_number(text) {
if value <= 0xFF {
return Some(AddressingMode::ZeroPage(value as u8));
} else {
return Some(AddressingMode::Absolute(value));
}
}
// Must be a label reference
Some(AddressingMode::RelativeLabel(text.to_string()))
}
This worked until it didn’t. The problem is that 6502 assembly syntax is ambiguous in ways that look simple but aren’t:
LDA $42is zero-page, butLDA $0042is absolute (different opcodes, different byte counts)ASLwith no operand is accumulator mode, butASL Ais also accumulator mode($40,X)is indexed-indirect,($40),Yis indirect-indexed (these are completely different operations)- Branch targets can be labels or numeric offsets
Every edge case required another branch in the parser. The code grew to 500 lines of tangled conditions, and I still couldn’t handle comments properly.
The Pest Migration
I rewrote the parser using Pest, a PEG (Parsing Expression Grammar) library for Rust. PEG parsers are declarative: you write a grammar file, and the library generates a parser. Here’s the core of the 6502 grammar:
// MOS 6502 Assembly Language Grammar
// Supports all 6502 addressing modes and instructions
WHITESPACE = _{ " " | "\t" }
COMMENT = _{ ";" ~ (!NEWLINE ~ ANY)* }
// Top-level program structure
program = { SOI ~ NEWLINE* ~ (statement ~ (NEWLINE+ | EOI))* ~ EOI }
statement = { (label ~ (instruction | directive)?) | instruction | directive }
// Labels end with colon
label = { identifier ~ ":" }
// Directives start with dot
directive = { "." ~ identifier ~ directive_args? }
// Instructions consist of mnemonic and optional addressing mode
instruction = { mnemonic ~ addressing_mode? }
// Addressing modes (order matters - most specific first)
addressing_mode = {
indexed_indirect | // ($xx,X) - must come before indirect
indirect_indexed | // ($xx),Y - must come before indirect
indirect | // ($xxxx) - for JMP
immediate | // #$xx or #value
number_with_x | // $xx,X or $xxxx,X
number_with_y | // $xx,Y or $xxxx,Y
number_only | // $xx or $xxxx
accumulator | // A
label_ref // label (for branches and jumps)
}
The grammar is 73 lines. The Rust code that uses it is another 200. That’s 270 lines total, compared to 500+ lines of hand-written spaghetti that still had bugs.
The key insight is that order matters in PEG grammars. The parser tries alternatives from top to bottom, and the first match wins. So indexed_indirect (the pattern ($xx,X)) must come before indirect (the pattern ($xxxx)), or the parser will match ($40 as an incomplete indirect and fail.
After the Pest migration, all 64 unit tests passed on the first run. Not because I’m brilliant, but because the grammar made the edge cases explicit. You can’t accidentally forget to handle ($40),Y when it’s right there in the grammar file.
The 6502 Addressing Mode Nightmare
The MOS 6502 has 13 addressing modes. This sounds manageable until you realize that some instructions support 8 modes and others support 1, and the opcode encoding is completely irregular.
Here are the modes:
| Mode | Syntax | Example | Bytes |
|---|---|---|---|
| Implied | (none) | RTS | 1 |
| Accumulator | A | ASL A | 1 |
| Immediate | #$nn | LDA #$42 | 2 |
| Zero Page | $nn | LDA $80 | 2 |
| Zero Page,X | $nn,X | LDA $80,X | 2 |
| Zero Page,Y | $nn,Y | LDX $80,Y | 2 |
| Absolute | $nnnn | LDA $2000 | 3 |
| Absolute,X | $nnnn,X | LDA $2000,X | 3 |
| Absolute,Y | $nnnn,Y | LDA $2000,Y | 3 |
| Indirect | ($nnnn) | JMP ($FFFC) | 3 |
| Indexed Indirect | ($nn,X) | LDA ($40,X) | 2 |
| Indirect Indexed | ($nn),Y | LDA ($40),Y | 2 |
| Relative | $nn | BNE $FD | 2 |
The zero-page vs. absolute distinction is particularly annoying. $42 and $0042 refer to the same address, but LDA $42 assembles to A5 42 (2 bytes, zero-page) while LDA $0042 assembles to AD 42 00 (3 bytes, absolute). The zero-page version is faster and smaller, so you want to use it when possible, but the assembler has to know the value at parse time to choose the right opcode.
Here’s how the assembler decides:
fn instruction_size(mode: &AddressingMode) -> u16 {
match mode {
AddressingMode::Implied | AddressingMode::Accumulator => 1,
AddressingMode::Immediate(_)
| AddressingMode::ZeroPage(_)
| AddressingMode::ZeroPageX(_)
| AddressingMode::ZeroPageY(_)
| AddressingMode::IndexedIndirect(_)
| AddressingMode::IndirectIndexed(_)
| AddressingMode::Relative(_)
| AddressingMode::RelativeLabel(_) => 2,
AddressingMode::Absolute(_)
| AddressingMode::AbsoluteX(_)
| AddressingMode::AbsoluteY(_)
| AddressingMode::Indirect(_) => 3,
}
}
And the opcode lookup is a massive match statement:
fn get_opcode(&self, mnemonic: &str, mode: &AddressingMode) -> Option<u8> {
match (mnemonic, mode) {
// LDA - Load Accumulator
("LDA", AddressingMode::Immediate(_)) => Some(0xA9),
("LDA", AddressingMode::ZeroPage(_)) => Some(0xA5),
("LDA", AddressingMode::ZeroPageX(_)) => Some(0xB5),
("LDA", AddressingMode::Absolute(_)) => Some(0xAD),
("LDA", AddressingMode::AbsoluteX(_)) => Some(0xBD),
("LDA", AddressingMode::AbsoluteY(_)) => Some(0xB9),
("LDA", AddressingMode::IndexedIndirect(_)) => Some(0xA1),
("LDA", AddressingMode::IndirectIndexed(_)) => Some(0xB1),
// ... 150 more entries ...
_ => None,
}
}
There’s no clever pattern here. The 6502’s opcode matrix has some regularity, but not enough to exploit algorithmically. You just have to enumerate all the valid combinations.
Branch Instructions and Label Resolution
Branch instructions on the 6502 use relative addressing. Instead of jumping to an absolute address, they specify an offset from the current program counter. The offset is a signed 8-bit value, so branches can reach -128 to +127 bytes from the instruction following the branch.
The problem is that the offset is calculated from the address after the branch instruction, not from the branch instruction itself. So a BNE loop where loop is 3 bytes earlier requires an offset of -5, not -3 (because the branch instruction is 2 bytes).
Here’s the label resolution code:
fn encode_instruction(
&mut self,
mnemonic: &str,
mode: &AddressingMode,
line: usize,
) -> Result<(), String> {
let address = self.origin + self.bytecode.len() as u16;
// Resolve RelativeLabel to Relative with calculated offset
let resolved_mode = match mode {
AddressingMode::RelativeLabel(label) => {
let target_addr = self
.get_label_address(label)
.ok_or_else(|| format!("Line {}: Undefined label: {}", line, label))?;
// Calculate relative offset: target - (current_address + 2)
// The +2 accounts for the 2-byte branch instruction
let offset = (target_addr as i32) - ((address as i32) + 2);
// Validate offset is within signed 8-bit range
if offset < -128 || offset > 127 {
return Err(format!(
"Line {}: Branch target '{}' is out of range ({} bytes)",
line, label, offset
));
}
AddressingMode::Relative(offset as i8)
}
_ => mode.clone(),
};
// ... emit bytecode ...
}
The + 2 is crucial. Get it wrong, and backward branches work while forward branches are off by two, or vice versa. This was one of those bugs that took hours to find because the symptoms were so confusing. Some test programs worked, others crashed, and there was no obvious pattern until I traced through the generated bytecode by hand.
The Z80: More Instructions, More Prefixes
The Z80 is the 6502’s more complex cousin. It has more registers (A, B, C, D, E, H, L, plus 16-bit pairs and index registers), more addressing modes, and a lot more instructions. Where the 6502 has ~56 unique instructions, the Z80 has over 700 when you count all the prefixed variants.
The Z80 uses prefix bytes to extend its instruction set:
CBprefix: bit operations (SET, RES, BIT) and shifts (RLC, RRC, SLA, SRA, SRL)DDprefix: operations using the IX index registerFDprefix: operations using the IY index registerEDprefix: miscellaneous extended operations (block moves, I/O, etc.)
Here’s how the assembler handles a CB-prefixed instruction:
fn encode_rotate_shift(
&mut self,
bytes: &mut Vec<u8>,
mnemonic: &str,
operands: &[AddressingMode],
line: usize,
) -> Result<(), String> {
let rot_code = match mnemonic {
"RLC" => 0,
"RRC" => 1,
"RL" => 2,
"RR" => 3,
"SLA" => 4,
"SRA" => 5,
"SRL" => 7,
_ => return Err(format!("Line {}: Invalid rotate/shift operation", line)),
};
bytes.push(0xCB); // CB prefix
match &operands[0] {
AddressingMode::Register(r) => {
let opcode = (rot_code << 3) | self.reg_code(r)?;
bytes.push(opcode);
}
_ => {
return Err(format!(
"Line {}: Unsupported rotate/shift addressing mode",
line
));
}
}
Ok(())
}
The Z80 parser doesn’t use Pest (yet). It’s hand-written, but the lessons from the 6502 migration make it much cleaner than my first attempt would have been. I know which patterns to watch for.
One Z80 quirk: assembler directives can be written with or without a dot prefix. ORG $8000 and .ORG $8000 mean the same thing. The parser handles this by normalizing directives early:
fn parse_directive(&mut self, line: &str, line_num: usize) -> Option<Statement> {
let parts: Vec<&str> = line.splitn(2, ' ').collect();
let mut name = parts[0].to_uppercase();
// Normalize: ensure directive name has dot prefix
if !name.starts_with('.') {
name = format!(".{}", name);
}
// ...
}
fn is_directive_keyword(&self, line: &str) -> bool {
let first_word = line.split_whitespace().next().unwrap_or("");
matches!(
first_word.to_uppercase().as_str(),
"ORG" | "BYTE" | "DB" | "WORD" | "DW" | "EQU" | "DEFB" | "DEFW" | "DS" | "DEFS"
)
}
Memory Maps: Where Theory Meets Reality
Here’s where things get interesting (and by “interesting” I mean “frustrating for three days straight”).
Different computers use different memory maps. The VIC-20’s screen RAM starts at $1000 (unexpanded) or $1E00 (with expansion). Color RAM is at $9600. The C64’s screen RAM is at $0400, color RAM at $D800. The CC-40 has 6KB of RAM starting at $0000 with expansion slots beyond that.
When you write assembly code, you need to know where it will live in memory. This is what the .ORG directive is for:
.ORG $0200 ; Code starts at address $0200
START:
LDA #$00
STA $1000 ; Write to screen RAM
; ...
The .ORG directive doesn’t change where the assembler outputs bytes. It changes where the assembler thinks those bytes will be when the program runs. This affects:
- Label addresses (a label after
.ORG $0200will be at $0200 plus its offset) - Absolute address encoding (JMP, JSR targets)
- Relative branch calculations
Here’s the problem: the load address (where the binary gets placed in memory) and the execution address (where the program expects to be) must match. If they don’t, everything breaks.
The Self-Overwriting Bug
I wrote a VIC-20 test program that displayed the color palette. It looked perfect in the assembler output. But when I loaded it into the emulator, the screen filled with garbage and the program crashed.
The original code:
* = $0200
INIT: LDX #$00
LDA #$20
CLEAR: STA $1000,X
; ... more code ...
The * = $0200 is xa65 syntax for .ORG $0200. The program expected to run at address $0200.
The loader was putting it at… $0200.
So far so good. But the program was 250 bytes, which meant it extended from $0200 to $02FA. And one of the data tables in the program looked like this:
ROW_TABLE:
.word $1000
.word $1016
; ... more words ...
The assembler placed ROW_TABLE at around $0280. When the program read from ROW_TABLE, it was reading from addresses that the program itself occupied. If any part of the code modified those addresses, the table would be corrupted.
But that’s not what was happening. The actual bug was simpler and dumber.
The program used zero-page addresses for temporary variables:
ADDR = $FB
COLOR_ADDR = $FD
CURRENT_COLOR = $02
ROW = $03
TEMP = $04
$02, $03, $04. Those are in the first page of memory. On a real VIC-20, those addresses are used by the KERNAL for temporary storage during LOAD operations. When I loaded the program, the loader wrote to those addresses as part of the load process, overwriting the program’s initialized data before execution even began.
The fix was to move the program to a higher address:
* = $0300
Now the program lived at $0300-$03FA, clear of the loader’s temporary storage area. It worked perfectly.
(This is the kind of bug that would never happen with a modern toolchain. The linker would allocate sections, the loader would respect them, and everyone would be happy. But in 1982, you had to know the memory map by heart, and “load address” and “execution address” were concepts you kept in your head, not in metadata.)
The VIC-20 Video Memory Layout
Since I’m talking about memory maps, let’s look at the VIC-20’s video system. It’s a good example of memory-mapped I/O, where writing to specific addresses directly affects hardware behavior.
VIC-20 Memory Map (Unexpanded):
$0000-$03FF 1KB RAM (zero page, stack, screen colors)
$0400-$0FFF 3KB expansion RAM area
$1000-$1FFF 4KB RAM (screen memory, BASIC program)
$2000-$7FFF Cartridge/expansion ROM area
$8000-$8FFF Character ROM
$9000-$93FF VIC chip registers + color RAM
$9400-$95FF (unused)
$9600-$97FF Color RAM (for screen characters)
$A000-$BFFF BASIC ROM
$C000-$DFFF BASIC ROM (continued)
$E000-$FFFF KERNAL ROM
Screen RAM (the characters to display) is at $1000. Color RAM (the color of each character) is at $9600. They’re not contiguous, which means filling the screen requires two separate loops:
; Fill screen with solid blocks ($A0 = reverse space)
LDX #$00
FILL1: LDA #$A0
STA $1000,X ; Screen RAM page 1
STA $1100,X ; Screen RAM page 2
INX
CPX #$FA ; 22 columns * 23 rows = 506 bytes
BNE FILL1
; Set colors for each character
LDX #$00
COLORS1:
TXA
AND #$07 ; Colors 0-7 only
STA $9600,X ; Color RAM page 1
STA $9700,X ; Color RAM page 2
INX
CPX #$FA
BNE COLORS1
The VIC-20 only supports 8 foreground colors (the high bit of color RAM has other uses), so the AND #$07 masks out any higher bits.
On the C64, the layout is different:
C64 Memory Map (Simplified):
$0000-$00FF Zero page
$0100-$01FF Stack
$0200-$03FF BASIC input buffer, misc
$0400-$07FF Screen RAM (40x25 = 1000 bytes)
$0800-$9FFF BASIC program area
$A000-$BFFF BASIC ROM
$C000-$CFFF RAM (upper)
$D000-$D3FF VIC-II registers
$D400-$D7FF SID (sound) registers
$D800-$DBFF Color RAM
$E000-$FFFF KERNAL ROM
Screen RAM at $0400, color RAM at $D800. Both video systems use memory-mapped I/O, but the addresses are completely different. Code written for one machine won’t work on the other without modification.
The CC-40 Memory Architecture
The Texas Instruments CC-40 is an interesting case because it’s a handheld computer from 1983 with a custom architecture. It uses a variant of the TMS70xx CPU family, not a 6502 or Z80.
The memory system is simpler than the Commodore machines:
/// CC-40 Memory
pub struct Memory {
/// RAM storage
ram: Vec<u8>,
/// RAM size in bytes
size: usize,
}
impl Default for Memory {
fn default() -> Self {
Self::new(6 * 1024) // 6KB default
}
}
6KB of RAM, expandable to 18KB with cartridges. No separate video memory (the LCD controller is memory-mapped but not directly writable). No ROM that you can peek into for fun.
The CC-40 uses a HexBus interface for peripherals, which is even weirder. Peripherals have their own address space accessible through I/O operations, not memory-mapped addresses. The printer, cassette drive, and disk drive all communicate through this bus.
This is why I have three different assemblers: each target machine has different conventions, different memory layouts, and different expectations about where code lives.
The Linking Problem (We Don’t Have a Linker)
Modern development uses separate compilation: you write multiple source files, compile each one to an object file, then link them together into an executable. The linker handles address assignment, symbol resolution, and section placement.
Retro assemblers… don’t do that. Not really.
The 6502 assembler produces a flat binary. Whatever bytes come out of the assembler are the bytes that go into memory. If you want to combine multiple source files, you include them with a directive (some assemblers support .INCLUDE) or concatenate the outputs manually.
The .ORG directive is the only “linking” we have. It tells the assembler where the code should live:
.ORG $8000
; BIOS code starts here
BIOS_INIT:
; ...
.ORG $FF00
; Reset vectors must be at $FFxx
RESET_VECTOR:
.WORD BIOS_INIT
NMI_VECTOR:
.WORD NMI_HANDLER
IRQ_VECTOR:
.WORD IRQ_HANDLER
If you get the addresses wrong, the CPU will jump to garbage on reset. If you accidentally place code where the stack will grow into it, the stack will corrupt your code. If you load a program at the wrong address, it will execute random memory as instructions.
There’s no symbol table, no relocation, no error checking beyond “did the bytes fit in the output buffer.” This is both liberating (no complex build system) and terrifying (one wrong digit and nothing works).
Testing Assemblers
How do you know an assembler is correct? You compare its output to known-good results.
I have a test suite with 64 test cases that verify bytecode output against reference implementations. Each test assembles a snippet and checks the exact bytes:
#[test]
fn test_lda_immediate() {
// LDA #$42 -> A9 42
assemble_and_check("LDA #$42", &[0xA9, 0x42]);
}
#[test]
fn test_backward_branch_with_label() {
let source = "
.ORG $0200
loop:
INX
BNE loop
";
// At $0200: INX -> E8
// At $0201: BNE loop -> D0 FD
// The branch is at $0201, instruction ends at $0203
// Target is $0200, so offset = $0200 - $0203 = -3 = $FD
assemble_and_check(
source,
&[
0xE8, // INX
0xD0, 0xFD, // BNE -3 (back to loop)
],
);
}
The comments document the expected behavior, so when a test fails, you know exactly what went wrong.
I also have integration tests that assemble real programs (fibonacci sequence, array summation, nested loops) and verify the output matches what xa65 produces. If my assembler and a well-tested reference assembler produce identical bytes, I have confidence the implementation is correct.
Round-trip testing is another approach: assemble source to bytecode, disassemble bytecode back to source, assemble again, and verify the bytecode matches. This catches cases where the disassembler and assembler have different ideas about instruction encoding.
Lessons Learned
After building two-and-a-half assemblers (the x86 one is incomplete), here’s what I know now that I wish I’d known at the start:
1. Use a parser generator. The Pest migration for the 6502 assembler was the best decision I made. The grammar file documents the syntax, the generated parser handles edge cases correctly, and adding new features is straightforward. Hand-written parsers are fine for prototypes but become maintenance nightmares.
2. Know your memory map before you write code. The load address bug cost me three days. I should have drawn a diagram showing where the program would live, where the stack would grow, and where the KERNAL uses RAM. This is basic stuff, but it’s easy to forget when you’re focused on getting the assembler working.
3. Test against reference implementations. I could have saved time by writing the test suite first. Every time I found a bug, I added a test case. Now the test suite is comprehensive enough that I can refactor with confidence.
4. Instruction encoding is irregular. Don’t try to be clever with opcode generation. The 6502 and Z80 both have patterns in their instruction encoding, but the exceptions outnumber the rules. A big match statement is ugly but correct.
5. Relative branches are trickier than they look. The offset is calculated from the address after the branch instruction, not from the branch instruction itself. Every assembler author learns this the hard way.
6. Memory-mapped I/O means knowing the hardware. The assembler doesn’t care whether address $D800 is color RAM or not. It just emits bytes. You have to care, or your program will write to the wrong place and produce garbage on screen.
Related Files
- 6502 Assembler:
assemblers/mos6502-asm/src/assembler.rs - 6502 Pest Grammar:
assemblers/mos6502-asm/src/grammar.pest - 6502 Pest Parser:
assemblers/mos6502-asm/src/pest_parser.rs - 6502 Tests:
assemblers/mos6502-asm/src/tests.rs - Z80 Assembler:
assemblers/z80-asm/src/assembler.rs - Z80 Parser:
assemblers/z80-asm/src/parser.rs - CC-40 Memory:
cores/cc40/src/memory.rs - VIC-20 Test Programs:
web/src/backend/vic20-video/test-programs/
The 6502 was designed in 1975. The Z80 in 1976. We’re still writing code for them in 2026, and the same problems that confused programmers fifty years ago still confuse us today. The difference is that now we have Rust to catch our off-by-one errors at compile time, and test frameworks to catch them at runtime. Progress, I suppose.