Deep Dive: Kansas City Standard Cassette Storage

The bug was embarrassing: clicks at every bit boundary. The audio sounded like someone typing on a keyboard, not the smooth warble I remembered from 1982. I spent an evening staring at waveforms in Audacity before I found it—my encoder was resetting the sine wave phase to zero at every bit transition instead of letting it continue naturally.

That’s when I realized why I was building a cassette interface at all. Not for the storage—the emulator already has instant save/load. I was building it because the sound of saving a program was part of the experience. The modem warble from a Commodore Datasette told you data was moving. You developed an ear for it; you could tell a good recording from a bad one by the clarity of the tone. Modern storage is silent. Something is lost in that silence.

So I built a Kansas City Standard cassette peripheral that does FSK encoding for real, with a tape transport that moves at real time, and audio feedback that makes the whole thing feel physical.

The November 1975 Compromise

In Kansas City, Missouri, a group of hobbyists gathered to solve a problem: everyone was building cassette interfaces, but none were compatible. Your Altair couldn’t read tapes from your friend’s IMSAI. The result was the Kansas City Standard, also called the Byte Standard after the magazine that promoted it.

The spec is a clock contract between a byte stream and an audio channel:

Mark (binary 1): 8 cycles of 2400 Hz
Space (binary 0): 4 cycles of 1200 Hz
300 baud: 30 characters per second
Frame: 1 start bit, 8 data bits (LSB first), 2 stop bits

Why those frequencies? 2400 is exactly double 1200, which made the analog circuitry simpler—you could use a single oscillator with a frequency divider. Engineering decisions like this are why I love old standards.

At 300 baud with 11 bits per byte, you get about 27 bytes per second. That’s roughly 1.6 KB per minute. An 8 KB BASIC program takes five or six minutes to save. The constraint shapes everything: you don’t casually save your work. You save it once, at the end, after you’ve tested it.

FSK Encoding: Turning Bits into Tones

Frequency-shift keying turns bits into audio. Instead of voltage levels or sharp edges, you get tones. The cassette recorder doesn’t care about DC levels—it faithfully reproduces audio frequencies because that’s what it’s designed for.

The encoder is a direct translation of the spec:

export class FSKEncoder {
  private readonly MARK_FREQ = 2400;  // Binary 1
  private readonly SPACE_FREQ = 1200; // Binary 0

  encode(data: Uint8Array): Float32Array {
    const bits = this.bytesToBits(data);
    const samplesPerBit = Math.floor(this.config.sampleRate / this.config.baudRate);
    const samples = new Float32Array(bits.length * samplesPerBit);

    let sampleIndex = 0;
    for (const bit of bits) {
      const frequency = bit ? this.MARK_FREQ : this.SPACE_FREQ;

      for (let j = 0; j < samplesPerBit; j++) {
        const t = sampleIndex / this.config.sampleRate;
        samples[sampleIndex] = Math.sin(2 * Math.PI * frequency * t);
        sampleIndex++;
      }
    }

    return samples;
  }
}

At 44,100 Hz sample rate and 300 baud, each bit gets about 147 samples. At 2400 Hz that’s roughly 8 complete cycles per bit; at 1200 Hz it’s 4 cycles. These numbers match the original KCS spec exactly.

The framing is standard asynchronous serial:

private bytesToBits(data: Uint8Array): number[] {
  const bits: number[] = [];

  for (const byte of data) {
    bits.push(0);  // Start bit

    for (let j = 0; j < 8; j++) {
      bits.push((byte >> j) & 1);  // Data bits, LSB first
    }

    bits.push(1);  // Stop bit
    bits.push(1);  // Stop bit
  }

  return bits;
}

The start bit (always 0) tells the decoder a byte is coming. The stop bits (always 1) provide recovery time before the next byte. It’s RS-232 framing with a different physical layer.

The Phase Bug (And Why It Matters)

Back to those clicks. My first encoder generated each bit’s sine wave starting at phase zero:

// Broken: restarts phase at each bit
for (let j = 0; j < samplesPerBit; j++) {
  const t = j / this.config.sampleRate;  // t resets to 0
  samples[sampleIndex++] = Math.sin(2 * Math.PI * frequency * t);
}

The problem: when you jump from one frequency to another, the amplitude discontinuity produces a click. The fix is to use a continuously incrementing time index:

// Fixed: phase continues across bits
for (let j = 0; j < samplesPerBit; j++) {
  const t = sampleIndex / this.config.sampleRate;  // t never resets
  samples[sampleIndex++] = Math.sin(2 * Math.PI * frequency * t);
}

Now the waveform transitions smoothly. The frequency changes, but the amplitude doesn’t jump. It’s a small thing—maybe 50 microseconds per transition—but audible. And once I heard the difference, I couldn’t unhear it.

Decoding: Zero-Crossing Detection

Reading data back is where cassette systems earn their reputation for unreliability. Real recorders introduce wow, flutter, noise, and level variations. The decoder has to tolerate all of that.

Zero-crossing analysis works because it ignores amplitude:

private detectFrequency(samples: Float32Array): number {
  let zeroCrossings = 0;

  for (let i = 1; i < samples.length; i++) {
    if ((samples[i - 1] < 0 && samples[i] >= 0) ||
        (samples[i - 1] >= 0 && samples[i] < 0)) {
      zeroCrossings++;
    }
  }

  const duration = samples.length / this.config.sampleRate;
  const estimatedFreq = zeroCrossings / (2 * duration);

  const markDiff = Math.abs(estimatedFreq - this.MARK_FREQ);
  const spaceDiff = Math.abs(estimatedFreq - this.SPACE_FREQ);

  return markDiff < spaceDiff ? 1 : 0;
}

A 2400 Hz wave crosses zero about 4800 times per second; a 1200 Hz wave about 2400 times. Count the crossings over a bit window, estimate the frequency, pick the closer match. It’s remarkably robust—the frequencies are far enough apart that moderate drift doesn’t cause misreads.

The edge case is silence or very low amplitude. The waveform hovers near zero, creating spurious crossings. My fix was a small dead zone in the crossing detector—if the samples on either side of zero are both below a threshold, don’t count it. This matches how real analog detectors worked: they had hysteresis to reject noise.

Tape Transport: State Beyond Sound

A cassette peripheral isn’t just an encoder. It’s a mechanical system with position and state.

export interface TapeImage {
  label: string;
  tapeLength: number;      // Total length in seconds
  currentPosition: number; // Current position in seconds
  baudRate: BaudRate;
  programs: TapeProgram[];
}

export interface TapeProgram {
  name: string;
  position: number;   // Where on the tape
  duration: number;   // How long it takes to read
  data: string;       // Base64-encoded content
  baudRate: BaudRate;
  checksum?: string;
}

Programs live at specific tape offsets. If you want to load the third program, you have to fast-forward past the first two. That’s the mechanical reality that made “LOAD” feel different from a floppy’s instant seek.

The transport simulation runs two clocks. The position clock updates every 100ms with actual elapsed time:

private startTransport(speed: number): void {
  this.transportTimer = window.setInterval(() => {
    this.currentTape.currentPosition += speed * 0.1;

    if (this.currentTape.currentPosition < 0) {
      this.currentTape.currentPosition = 0;
      this.stop();
    } else if (this.currentTape.currentPosition > this.currentTape.tapeLength) {
      this.currentTape.currentPosition = this.currentTape.tapeLength;
      this.stop();
    }

    this.litElement.updateCounter(Math.floor(this.currentTape.currentPosition));
  }, 100);
}

Play and record run at speed 1.0. Rewind runs at -10.0. Fast-forward at +10.0. When you hit the tape’s beginning or end, the transport stops—just like a real deck.

The animation clock runs separately via requestAnimationFrame. The reels spin at a constant visual rate regardless of actual position changes. I tried coupling them early on, and the result felt wrong—the animation would stutter when the position timer fired. Decoupling made the peripheral feel smooth while keeping the position accurate.

Leader and Trailer: The Sync Protocol

Before data comes the leader: two seconds of continuous 2400 Hz tone.

encodeWithLeader(data: Uint8Array, leaderDuration: number = 2.0): Float32Array {
  const leaderSamples = this.generateLeaderTone(leaderDuration);
  const dataSamples = this.encode(data);
  const trailerSamples = this.generateTrailerTone(0.5);

  const result = new Float32Array(
    leaderSamples.length + dataSamples.length + trailerSamples.length
  );

  result.set(leaderSamples, 0);
  result.set(dataSamples, leaderSamples.length);
  result.set(trailerSamples, leaderSamples.length + dataSamples.length);

  return result;
}

That leader isn’t decoration. It’s a sync window for the decoder to lock onto the signal before data arrives. When you press PLAY on an old Datasette, the computer waits for that steady tone before it starts parsing bits. Without it, the decoder might misinterpret the first few bytes because it hasn’t settled on the correct frequency estimate.

The trailer is half a second of silence—time for the system to recognize the transmission has ended and stop the tape motor.

Why Keep It Slow

The cassette peripheral could have been a “save to JSON” button. I kept the analog model because it surfaces the original constraint: reliability versus speed under a weak physical layer.

At 300 baud, saving 8 KB takes six minutes. That’s not a bug—it’s the system working as designed. The warble isn’t nostalgia; it’s proof that bytes are actually moving, one clock tick at a time.

What surprised me was how much the sound matters. Motor hum when the transport engages. The click of the relay when you press PLAY. The rising pitch of fast-forward. These aren’t features I planned; they emerged because I was chasing authenticity and found that silent peripherals feel fake, even when they work correctly.

The cassette recreates a relationship with the machine that modern storage has lost. Whether that’s valuable or merely sentimental, I haven’t decided. But I can tell you this: the first time I saved a program and heard that warble play back correctly, I smiled. That’s enough justification for me.