Three Languages, One Weekend: Building BASIC, Forth, and Scheme Interpreters in Rust
Implementing three classic programming language interpreters in Rust, compiled to WebAssembly. BASIC for nostalgia, Forth for stack machines, Scheme for elegance—and what I learned along the way.
Three Languages, One Weekend: Building BASIC, Forth, and Scheme Interpreters in Rust
The title is a bit of a lie. It was more like three weekends, spread across several months, with plenty of “just one more feature” detours that turned into multi-day rabbit holes. But the spirit is accurate: I set out to build three fundamentally different language interpreters in Rust, compile them to WebAssembly, and run them in a browser as part of the emulator.ca BBS simulation.
The result is that you can now dial 555-0300 to get a BASIC prompt, 555-0400 for Forth, and eventually a Scheme REPL when I settle on a phone number. (I’m thinking 555-5353 for LISP on a phone keypad, but that feels too clever by half.)
Why three languages? Because each one represents a fundamentally different approach to programming:
- BASIC: The language of my childhood, of PEEK and POKE and GOTO. The language that taught a generation to program by being forgiving of mistakes and immediate in feedback.
- Forth: The language of stacks and concatenation, where programs are built by composing tiny words into larger ones. Minimal, powerful, and deeply satisfying once it clicks.
- Scheme: The language of elegance, where everything is an expression and functions are first-class citizens. The lambda calculus made practical.
Building all three in the same codebase taught me more about language design than any book could. Let me show you what I mean.
The Shared Infrastructure
Before diving into each language, I should explain the architecture that makes all of this possible. Each interpreter follows the same pattern:
- A Rust crate in
languages/{language}/that implements the interpreter - WASM compilation via
wasm-pack - A Web Worker that loads the WASM module and handles I/O
- A TypeScript backend class that routes modem traffic to the worker
The WorkerBasedInterpreterBackend class is the glue:
// worker-based-interpreter-backend.ts
export abstract class WorkerBasedInterpreterBackend extends BackendInterface {
protected worker: Worker | null = null;
protected workerReady: boolean = false;
protected abstract getWorkerPath(): string;
protected abstract onWorkerReady(): void;
async initWorker(): Promise<void> {
if (this.workerReady) return;
return new Promise<void>((resolve, reject) => {
const workerPath = this.getWorkerPath();
const workerUrl = new URL(workerPath, import.meta.url);
this.worker = new Worker(workerUrl, { type: 'module' });
this.workerReadyResolve = resolve;
this.worker.addEventListener('message', (event) => {
this.handleWorkerMessage(event.data);
});
});
}
}
Each language backend just extends this and provides its worker path. The BASIC interpreter at 555-0300 uses interpreter.worker.ts; Forth at 555-0400 uses forth.worker.ts. The worker handles the WASM lifecycle—loading the module, passing characters in, collecting output—while the backend handles modem signals like DTR and CTS.
This separation turned out to be crucial. WASM modules block the main thread, and you really don’t want your entire UI freezing while someone types 10 PRINT "HELLO": GOTO 10. The Web Worker keeps everything responsive.
BASIC: The Language of Nostalgia
I’ll confess: BASIC is the one I started with because I wanted to play Star Trek. The original 1971 Mike Mayfield version, with its 8x8 quadrant grid and “ENERGY LOW” warnings. I’d played it on a PET as a kid, and the idea of running it in a browser through a modem simulation was too appealing to pass up.
The BASIC interpreter lives at languages/basic/ and implements something close to Commodore BASIC V2, with a few modern conveniences thrown in. It supports:
- Line numbers and
GOTO/GOSUB FOR/NEXTloops with proper stack management- Arrays, including 2D string arrays
- String manipulation (
LEFT$,MID$,RIGHT$,LEN) - Timer functions (
TI,TI$) INPUTfor user interactionPRINTwith all its formatting quirks
The Parser Migration
The original parser was hand-rolled—a recursive descent tokenizer that worked fine but was brittle around edge cases. I eventually wrote a Pest-based parser as a replacement:
// pest_parser.rs - drop-in replacement for manual tokenizer
use pest::Parser;
use pest_derive::Parser;
#[derive(Parser)]
#[grammar = "grammar.pest"]
pub struct BasicParser;
pub fn tokenize(input: &str) -> Result<Vec<SpannedToken>, LangError> {
let pairs = BasicParser::parse(Rule::line, input)
.map_err(|e| LangError::syntax(
format!("Parse error: {}", e),
Span::point(0, 1, 1)
))?;
let mut tokens = Vec::new();
for pair in pairs {
match pair.as_rule() {
Rule::line => {
for inner in pair.into_inner() {
match inner.as_rule() {
Rule::line_number => {
let span = span_from_pair(&inner);
let num: f64 = inner.as_str().parse().unwrap();
tokens.push(SpannedToken::new(Token::Number(num), span));
}
Rule::statement => {
for token_pair in inner.into_inner() {
if let Some(token) = parse_token(token_pair)? {
tokens.push(token);
}
}
}
_ => {}
}
}
}
_ => {}
}
}
tokens.push(SpannedToken::new(Token::Eof, eof_span));
Ok(tokens)
}
The Pest parser isn’t currently the default—the manual tokenizer is still doing the work—but having both implementations means I can run comparison tests and gradually migrate. The test file test_pest_parser.rs verifies that both produce identical token streams for the same input.
(Why not just switch to Pest immediately? Because the manual parser has accumulated months of edge-case fixes that I don’t want to lose. “Works and tested” beats “elegant but new.”)
The Expression Evaluator
BASIC’s expression evaluator needs proper operator precedence, which is trickier than it sounds. The classic recursive descent approach works, but you have to be careful about the order of operations:
// Expression precedence (lowest to highest):
// 1. OR
// 2. AND
// 3. NOT
// 4. Comparisons (=, <>, <, >, <=, >=)
// 5. Addition/Subtraction (+, -)
// 6. Multiplication/Division (*, /)
// 7. Exponentiation (^)
// 8. Unary minus (-)
// 9. Functions (SIN, COS, RND, etc.)
The RND function was particularly fun. Commodore BASIC’s RND has weird semantics—RND(0) returns the previous random number, RND(1) generates a new one, and negative arguments seed the generator. I implemented the useful subset (positive arguments generate new numbers) and called it a day.
Timer Functions: TI and TI$
One feature I added for Star Trek compatibility was the timer functions. TI returns “jiffies” since the interpreter started (1/60th of a second on a C64), and TI$ returns a formatted time string in HHMMSS format:
// expression.rs
static TIMER_START: OnceLock<Instant> = OnceLock::new();
pub fn reset_timer() {
// Initialize the timer start point
let _ = TIMER_START.get_or_init(Instant::now);
}
pub fn get_ti() -> f64 {
let start = TIMER_START.get_or_init(Instant::now);
let elapsed = start.elapsed().as_secs_f64();
// Return jiffies (1/60th of a second)
(elapsed * 60.0).floor()
}
pub fn get_ti_string() -> String {
let start = TIMER_START.get_or_init(Instant::now);
let elapsed = start.elapsed().as_secs();
let hours = (elapsed / 3600) % 24;
let minutes = (elapsed % 3600) / 60;
let seconds = elapsed % 60;
format!("{:02}{:02}{:02}", hours, minutes, seconds)
}
This was one of those “simple feature, surprising complexity” situations. The timer needs to survive across multiple interpreter invocations, hence the OnceLock. And the HHMMSS format needs to wrap at 24 hours, which is why there’s a % 24 in there.
Running Star Trek
The Star Trek backend at 555-1702 (NCC-1701 + 1, obviously) auto-loads the game when you connect:
// startrek-basic/index.ts
export class StarTrekBasic extends WorkerBasedInterpreterBackend {
static description = 'Super Star Trek (1971) - BASIC Version';
protected handleWorkerMessage(message: any): void {
const { type, text } = message;
// Check if we're waiting for READY after program load
if (type === 'OUTPUT' && this.waitingForReady &&
text && text.includes('READY.')) {
this.waitingForReady = false;
if (this.shouldAutoRun) {
this.shouldAutoRun = false;
// Auto-run the program after a short delay
setTimeout(() => this.sendRunCommand(), 100);
}
}
super.handleWorkerMessage(message);
}
}
The 100ms delay before auto-running is there because the worker needs a moment to finish initializing. Without it, the RUN command arrives before the interpreter is fully ready, and nothing happens. Ask me how many hours I spent debugging that one.
Forth: The Language of Stacks
Forth is the opposite of BASIC in almost every way. Where BASIC is verbose, Forth is terse. Where BASIC hides the stack, Forth puts it front and center. Where BASIC has keywords, Forth has words that you define yourself.
The Forth interpreter at 555-0400 implements 156 words with 396 tests. (Yes, I counted. The test-to-word ratio is something like 2.5:1, which feels about right for a stack machine where off-by-one errors are catastrophic.)
The Dictionary
Forth’s core data structure is the dictionary—a mapping from word names to their definitions. A word can be a primitive (implemented in Rust), a user-defined sequence of other words, a variable, a constant, or a value:
// interpreter.rs
#[derive(Clone)]
pub enum Word {
Primitive(WordFunc),
/// Compiled definition with instruction list
UserDefined(Rc<Vec<Instruction>>),
/// Variable: stores memory address where value is kept
Variable(usize),
/// Constant: stores the constant value directly
Constant(i32),
/// Value: like Constant but mutable via TO, stores address
Value(usize),
/// Does: a CREATE'd word with DOES> behavior
Does(usize, Rc<Vec<Instruction>>),
}
The UserDefined variant is where colon definitions live. When you write : SQUARE DUP * ;, the interpreter compiles a sequence of Instruction values:
#[derive(Clone, Debug)]
pub enum Instruction {
Literal(i32),
Call(String),
BranchIfZero(usize),
Branch(usize),
DoSetup,
LoopEnd(usize),
// ... many more
}
The Call instruction looks up a word name at runtime. This is slower than storing a direct reference, but it means words can be redefined and the change takes effect immediately—which is the expected Forth behavior.
Control Flow: IF, THEN, ELSE
Forth’s control flow is stack-based, which means the compiler has to track where to patch branch targets. The ControlFlowMarker enum handles this:
#[derive(Clone, Debug)]
enum ControlFlowMarker {
If(usize), // Position of BranchIfZero to patch
Else(usize), // Position of Branch to patch
Begin(usize), // Position to loop back to
While(usize), // Position of BranchIfZero to patch
Do(usize), // Position after DoSetup
// ... more markers
}
When the compiler sees IF, it emits a BranchIfZero instruction with a placeholder target and pushes an If(position) marker onto the control stack. When it sees THEN, it pops the marker and patches the branch to point to the current instruction position.
This gets more complex with nested structures. IF ... IF ... THEN ... THEN requires careful stack management. The tests for nested control flow are extensive:
: NESTED-IF 10 > IF 5 > IF 1 ELSE 2 THEN ELSE 3 THEN ;
DO Loops and I/J
Counted loops in Forth use the return stack to store the loop index and limit. The I word pushes the current index, and J pushes the outer loop’s index (for nested loops):
Instruction::LoopIndex => {
// I - push loop index from return stack
// Return stack has: [index, limit]
let limit = self.return_stack.pop()?;
let index = self.return_stack.pop()?;
self.data_stack.push(index);
self.return_stack.push(index);
self.return_stack.push(limit);
}
Instruction::LoopIndexJ => {
// J - push outer loop index
// Return stack has: [inner_index, inner_limit, outer_index, outer_limit]
let inner_limit = self.return_stack.pop()?;
let inner_index = self.return_stack.pop()?;
let outer_limit = self.return_stack.pop()?;
let outer_index = self.return_stack.pop()?;
self.data_stack.push(outer_index);
// Restore return stack
self.return_stack.push(outer_index);
self.return_stack.push(outer_limit);
self.return_stack.push(inner_index);
self.return_stack.push(inner_limit);
}
The ?DO variant (QDoSetup in the instruction set) is the polite version that skips the loop body entirely if the limit equals the start. Standard DO always executes at least once, which is sometimes not what you want.
Number Base: HEX and DECIMAL
Forth lets you change the number base at any time with HEX and DECIMAL. The interpreter tracks this in a base field and uses it for both input parsing and output:
pub base: i32, // Default 10 (decimal)
// During output:
Instruction::Dot => {
let n = self.data_stack.pop()?;
let formatted = if self.base == 16 {
format!("{:X} ", n)
} else if self.base == 8 {
format!("{:o} ", n)
} else {
format!("{} ", n)
};
output.push_str(&formatted);
}
The base-aware parsing was trickier. Numbers like FF are valid in hex mode but undefined words in decimal. The parser checks self.base before treating something as a word reference.
Scaled Arithmetic: */ and */MOD
Phase 7 of the Forth implementation added scaled arithmetic operations that use 64-bit intermediates to avoid overflow:
// */ - multiply then divide (n1 * n2 / n3) with 64-bit intermediate
let n3 = self.data_stack.pop()? as i64;
let n2 = self.data_stack.pop()? as i64;
let n1 = self.data_stack.pop()? as i64;
if n3 == 0 {
return Err("Division by zero".to_string());
}
let result = ((n1 * n2) / n3) as i32;
self.data_stack.push(result);
This is important for fixed-point arithmetic, which Forth programmers use extensively. Without the 64-bit intermediate, 1000000 1000000 */ would overflow before the division could bring it back to a reasonable range.
Scheme: The Language of Elegance
Scheme is the minimalist Lisp. Where Common Lisp has hundreds of built-in functions, Scheme has a handful of special forms and lets you build everything else from lambdas. It’s beautiful in theory and a pain to implement in practice.
The Scheme interpreter at languages/scheme/ isn’t exposed via a phone number yet, but it’s functional. It handles:
- S-expression parsing
- Lexical scoping with closures
- Special forms:
define,lambda,if,cond,let,let*,letrec,do - Quoting and quasiquoting
- Vectors and strings
- Partial tail call optimization
S-Expression Parsing
Scheme’s syntax is laughably simple: everything is either an atom or a list. The parser just needs to handle parentheses, quotes, and the occasional string or number:
// pest_parser.rs for Scheme
pub fn parse(input: &str) -> Result<Vec<Value>, LangError> {
let pairs = SchemeParser::parse(Rule::program, input)
.map_err(|e| LangError::syntax(
format!("Parse error: {}", e),
Span::point(0, 1, 1)
))?;
let mut values = Vec::new();
for pair in pairs {
match pair.as_rule() {
Rule::expr => {
values.push(parse_expr(pair)?);
}
_ => {}
}
}
Ok(values)
}
The Value type is a discriminated union (Rust enum) that can hold numbers, symbols, strings, pairs, vectors, procedures, and the special nil and void values:
// value.rs
pub enum Value {
Number(f64),
Bool(bool),
String(String),
Char(char),
Symbol(String),
Pair(Box<Value>, Box<Value>),
Vector(Vec<Value>),
Procedure(Procedure),
Nil,
Void,
}
The Pair variant is the building block for lists. A proper list is a chain of pairs ending in Nil: (a . (b . (c . ()))). This representation makes cons, car, and cdr trivial to implement.
Environments and Closures
Scheme’s lexical scoping requires each environment to have a reference to its parent:
// env.rs
pub type EnvRef = Rc<RefCell<Env>>;
pub struct Env {
parent: Option<EnvRef>,
values: HashMap<String, Value>,
}
impl Env {
pub fn with_parent(parent: EnvRef) -> EnvRef {
Rc::new(RefCell::new(Env {
parent: Some(parent),
values: HashMap::new(),
}))
}
pub fn get(env: &EnvRef, name: &str) -> Option<Value> {
if let Some(value) = env.borrow().values.get(name) {
return Some(value.clone());
}
if let Some(parent) = &env.borrow().parent {
return Env::get(parent, name);
}
None
}
}
When a lambda is created, it captures the current environment. When the lambda is called, a new environment is created with the captured environment as its parent, and the parameters are bound in the new environment. This is what makes closures work:
(define (make-counter)
(let ((count 0))
(lambda ()
(set! count (+ count 1))
count)))
(define counter (make-counter))
(counter) ; => 1
(counter) ; => 2
(counter) ; => 3
The inner lambda captures the environment where count is defined. Each call to counter modifies the same count binding.
Tail Call Optimization (Sort Of)
Scheme requires proper tail call optimization—a recursive call in tail position should not grow the stack. I implemented a partial version:
// eval.rs
// Tail Call Optimization (TCO) Implementation
//
// This is a PARTIAL TCO implementation that provides ~5-10x improvement
// over baseline (100-150 recursion depth vs. 10-20), but does not achieve
// R5RS-compliant unlimited depth.
//
// Tail positions handled:
// - if: both branches
// - cond: last expression in each clause
// - begin: last expression only
// - lambda: last expression in body
//
// Limitations:
// - Still builds Rust call stack through multiple layers
// - Deep recursion (1000+) causes stack overflow
The eval_tail_expr function uses an iterative loop instead of recursion for expressions in tail position:
fn eval_tail_expr(expr: &Value, interp: &mut Interpreter) -> Result<Value, String> {
let mut current_expr = expr.clone();
let mut iterations = 0;
loop {
iterations += 1;
if iterations > interp.max_recursion_depth {
return Err("recursion depth limit exceeded".to_string());
}
if let Value::Pair(_, _) = ¤t_expr {
if let Ok(items) = list_to_vec(¤t_expr) {
if !items.is_empty() {
match &items[0] {
Value::Symbol(s) if s == "if" => {
let test = eval(&items[1], interp)?;
current_expr = if !test.is_false() {
items[2].clone()
} else if items.len() == 4 {
items[3].clone()
} else {
return Ok(Value::Void);
};
continue; // Loop instead of recursing
}
// ... more cases
}
}
}
}
return eval(¤t_expr, interp);
}
}
For true R5RS compliance, I’d need a trampoline architecture where the evaluator returns a “continuation” that can be resumed, rather than directly computing the result. That’s a bigger change than I wanted to make, so the current implementation is a pragmatic compromise.
The Lambda Helper
One of the improvements in the Pest parser migration was a make_lambda helper that simplifies lambda creation:
// eval.rs
fn make_lambda(
params: Vec<String>,
body: Vec<Value>,
env: EnvRef,
) -> Value {
Value::Procedure(Procedure::Lambda {
params,
body,
env,
})
}
This gets used in several places—lambda expressions, define with function syntax, let (which desugars to a lambda application), etc. Having it as a helper reduced duplication and made the code easier to follow.
Lessons Learned
1. Start with Tests
The Forth implementation has 396 tests. That sounds excessive until you realize that stack machines are pathologically sensitive to off-by-one errors. Push one too many values? Stack overflow on the next operation. Pop one too few? Silent corruption that shows up five operations later.
For each word, I wrote tests for:
- Normal operation
- Edge cases (empty stack, maximum values)
- Error conditions (underflow, invalid arguments)
- Interaction with other words
The test suite is the specification. When in doubt, I ran the tests.
2. WASM Compilation Is Straightforward
The Rust-to-WASM pipeline via wasm-pack is remarkably smooth. The main gotchas are:
- No file I/O (use browser APIs instead)
- No threads (Web Workers are separate WASM instances)
- Limited stdlib (but most of what you need is there)
Each interpreter crate has a simple WASM interface:
#[wasm_bindgen]
pub struct InterpreterWasm {
interp: Interpreter,
}
#[wasm_bindgen]
impl InterpreterWasm {
#[wasm_bindgen(constructor)]
pub fn new() -> Self {
InterpreterWasm {
interp: Interpreter::new(),
}
}
pub fn input(&mut self, char_code: u8) -> String {
self.interp.input(char_code)
}
pub fn reset(&mut self) {
self.interp = Interpreter::new();
}
}
The TypeScript side loads the WASM module, creates an instance, and calls input() for each character. Output comes back as a string to be sent to the terminal.
3. The Worker Pattern Scales
Having a base WorkerBasedInterpreterBackend class meant that adding new interpreters was mostly copy-paste:
- Write the Rust interpreter
- Create a worker that loads it
- Extend the backend class with the worker path
- Map a phone number to the new backend
The Star Trek BASIC backend is literally 150 lines of TypeScript, most of which is handling the auto-run logic. The actual interpreter communication is all inherited.
4. Language Design Choices Have Consequences
BASIC’s line numbers seemed quaint until I needed to implement GOTO and GOSUB. Suddenly the program is no longer a linear sequence—it’s a random-access data structure.
Forth’s postfix notation felt awkward until I realized it completely eliminates parsing ambiguity. There’s no operator precedence because there are no operators—just words that consume and produce stack values.
Scheme’s minimal syntax is a double-edged sword. It’s easy to parse, but the semantics (proper tail calls, continuations, hygenic macros) are deeply complex. I implemented the subset I needed and stopped.
5. Nostalgia Is a Feature
The most-used backend in the emulator is Star Trek BASIC at 555-1702. People connect, play for a while, get destroyed by Klingons, and come back later. The game is objectively primitive—text-only, simple mechanics, no graphics—but it has something that more sophisticated games often lack: immediate accessibility.
You dial the number. You see ENTER 1,2, OR 3 FOR BEGINNER, INTERMEDIATE, OR EXPERT. You press 1. You’re playing. No tutorials, no cutscenes, no loading screens. Just you, a starship, and some Klingons to destroy.
That’s the magic of these old languages. They let you do things without asking permission.
What’s Next
The interpreters are functional, but there’s always more to do:
- BASIC: Better error messages, more functions, possibly SOUND support if I can figure out the Web Audio API integration
- Forth: The 156-word vocabulary covers most use cases, but
INCLUDEandLOADfor multi-file programs would be nice - Scheme: Full TCO via trampolining, plus hygienic macros if I’m feeling ambitious
And of course, there are more languages to consider. Logo for turtle graphics? Lua for lightweight scripting? REXX for IBM nostalgia? The architecture supports it—I just need the weekends.
For now, you can dial 555-0300 for BASIC, 555-0400 for Forth, and 555-1702 for Star Trek. Documentation for each is available at emulator.ca if you want to brush up on your GOSUB or >R R>.
Related Files
- BASIC Interpreter:
languages/basic/src/interpreter.rs- Main interpreter looppest_parser.rs- Pest-based tokenizer (alternate)expression.rs- Expression evaluator with operator precedenceexecutor.rs- Statement execution
- Forth Interpreter:
languages/forth/src/interpreter.rs- Dictionary and instruction executionstack.rs- Data and return stack implementations
- Scheme Interpreter:
languages/scheme/src/eval.rs- Evaluator with partial TCOenv.rs- Lexical environmentsvalue.rs- Value representation
- Backend Infrastructure:
web/src/backend/worker-based-interpreter-backend.ts- Base class for WASM interpretersbasic-interpreter/index.ts- BASIC backend (555-0300)forth-interpreter/index.ts- Forth backend (555-0400)startrek-basic/index.ts- Star Trek game (555-1702)
The best way to understand a language is to implement it. The best way to appreciate a language is to use one you implemented.