Deep Dive: Bell 212A and V.32bis Handshakes

The first time I got the V.32bis handshake working, it was too quiet to hear. The timing was correct, the frequencies matched the spec, but the speaker simulation attenuated everything to the point where the screech was a whisper. I spent an afternoon adjusting volume curves before I realized the real problem: I had no mental model for why this particular sequence of tones existed. I was generating sounds, not simulating negotiation.

That distinction matters. A modem handshake is not a sound effect. It is two clocks teaching each other how to share a phone line. Each phase exists because the previous phase proved something about the connection. Once I understood that causal chain, debugging became straightforward: if the handshake sounded wrong, I could identify which phase was misbehaving by what it was supposed to prove.

The Jump from FSK to PSK

Bell 103 at 300 baud uses frequency-shift keying—the data is encoded in which frequency is playing. That works, but you burn bandwidth separating the originate and answer bands. Bell 212A moves to phase-shift keying: the carrier stays at a fixed frequency, and the data lives in the phase relationships. Same wire, four times the throughput.

The causal chain for Bell 212A looks like this:

Answer modem emits a 2400 Hz carrier to announce itself.
Originate modem responds with a 1200 Hz carrier, offset in time.
Both sides run a training sequence—regular phase shifts—so the receiver can lock its timing reference.
Once phase lock is confirmed, data mode begins.

In the emulator’s HandshakeGenerator, that training sequence is explicit:

// Bell 212A training: 10 cycles, 80ms each, ±12 Hz phase modulation
for (let i = 0; i < cfg.trainingCycles; i++) {
  const oFreq = cfg.originateHz + (i % 2 === 0 ? cfg.phaseModulationHz : -cfg.phaseModulationHz);
  const aFreq = cfg.answerHz + (i % 2 === 0 ? -cfg.phaseModulationHz : cfg.phaseModulationHz);
  // ... create overlapping oscillators for both carriers
  time += cfg.trainingCycleMs / 1000;
}

The numbers (1200 Hz originate, 2400 Hz answer, 10 training cycles at 80 ms each, ±12 Hz modulation) come from the actual config. They produce a handshake that sounds like a real Bell 212A lock-in rather than a random warble. The ±12 Hz phase modulation is small enough to be inaudible as a pitch shift, but large enough that the emulator’s timing visibly steps through the training phases.

V.32bis: Training as a Diagnostic Stack

V.32bis pushed 14,400 baud over the same phone lines that Bell 103 used for 300. That forty-eight-fold improvement required solving problems that Bell 212A could ignore: echo cancellation, line equalization, and rate negotiation. Each problem gets its own training phase, and each phase produces a distinct sound.

The answer tone comes first. A 2100 Hz carrier with periodic phase reversals—the V.25 signalling spec says modems should recognize this as “I am a modem, not a fax machine or a human.” The emulator runs three cycles at 450 ms each:

// V.25 answer tone with phase reversals
for (let i = 0; i < cfg.answerToneCycles; i++) {
  const freq = cfg.answerToneHz + (i % 2 === 0 ? 0 : cfg.answerTonePhaseShiftHz);
  const tone = this.createOscillator(freq, cfg.baseVolumeDb);
  tone.start(time).stop(time + cfg.answerToneCycleMs / 1000);
  time += cfg.answerToneCycleMs / 1000;
}

Next comes the echo canceller training. Real V.32bis modems need to measure how much of their own signal bounces back from the line, then subtract that echo from incoming data. The emulator does not implement actual echo cancellation—there is no physical phone line to echo—but it does generate the training sweep so the handshake sounds correct. Twelve steps from 1200 Hz upward, 100 Hz per step, 100 ms per step:

// Echo canceller training sweep
for (let i = 0; i < cfg.echoTrainingSweepSteps; i++) {
  const freq = cfg.echoTrainingSweepStartHz + i * cfg.echoTrainingSweepStepHz;
  const sweep = this.createOscillator(freq, cfg.baseVolumeDb - 1);
  sweep.start(time).stop(time + cfg.echoTrainingSweepStepMs / 1000);
  time += cfg.echoTrainingSweepStepMs / 1000;
}

The equalizer training follows: ten dual-tone bursts at 150 ms each, starting at 1600 Hz and 2000 Hz, stepping in opposite directions. This is where a real modem measures frequency response distortion across the line. Finally, the QAM training block: six cycles of three overlapping tones that establish the phase/amplitude constellation for data encoding.

The full V.32bis profile in the emulator’s config:

V32bis: {
  baudRate: 14400,
  answerToneHz: 2100,
  answerToneCycles: 3,
  answerToneCycleMs: 450,
  answerTonePhaseShiftHz: 3,
  echoTrainingSweepStartHz: 1200,
  echoTrainingSweepSteps: 12,
  echoTrainingSweepStepHz: 100,
  echoTrainingSweepStepMs: 100,
  equalizerBursts: 10,
  equalizerBurstMs: 150,
  equalizerFreq1StartHz: 1600,
  equalizerFreq1StepHz: 40,
  equalizerFreq2StartHz: 2000,
  equalizerFreq2StepHz: -30,
  qamTrainingCycles: 6,
  qamTrainingCycleMs: 130,
  qamFreq1Hz: 1750,
  qamFreq2Hz: 1850,
  qamFreq3Hz: 1950,
  qamFreqStepHz: 50,
  baseVolumeDb: -18,
}

Those numbers produce a handshake that runs about 4.5 seconds—long enough to feel like negotiation, short enough not to stall the emulator. The -18 dB base volume was the fix for my original “too quiet” problem; the speaker simulation path has its own gain structure, and the generators need to output at a consistent level so everything mixes properly.

The Tempo Constraint

The hardest part of getting handshakes right is not the frequencies. It is the tempo.

If the answer tone is too short, the handshake sounds like a glitch instead of a greeting. If the training phases run long, the emulator feels stalled. The numbers in the config are not arbitrary; they were tuned by ear against recordings of real modems, then adjusted for emulator “feel.”

The constraint is not accuracy to the ITU-T spec. The constraint is believable causality. A handshake needs to sound like two devices agreeing to something. Each phase should feel like it accomplishes something before the next phase starts. When I added the 200 ms gap between echo training and equalizer training, the handshake suddenly felt more real—even though that gap does nothing computationally. The ear expects a pause between topics.

What Handshakes Teach About System Design

Building these handshake sequences taught me three things I keep returning to:

Faster modems are not just “more data.” They are more training phases, tighter timing contracts, and more assumptions about line quality. Bell 212A assumes the line is stable enough for phase-lock. V.32bis assumes it can measure echo characteristics in real time. Each jump in speed is a jump in complexity, and that complexity shows up in the handshake duration.

The handshake only needs to match the system’s model. The emulator does not implement real echo cancellation, so the echo training sweep is purely cosmetic. But it still matters—because the audio is part of the emulator’s contract with the user. The screech says “this is a modem,” and skipping phases would break that contract.

Explicit timing tables beat clever DSP. I could have generated handshakes procedurally from a small set of rules, but that would have made debugging harder. The flat config with explicit durations means I can look at the numbers and predict exactly what the audio will sound like. When something goes wrong, I can identify the failing phase by timestamp.

The modem handshake is the sound of system coordination. Two devices that have never spoken before, agreeing on timing, encoding, and error correction in a few seconds of screech. When you hear that noise, you are hearing negotiation compressed into audio.