Deep Dive: Audio Sonification for Code Visualization

The chart shows history; the chord makes you feel the proportion changes.

Late in the build, the audio started pulsing even when nothing changed. The commits were identical, the data was static, and yet the sound would dip and swell every frame. Finding the bug took three hours. The fix was one line. This is the story of how I mapped code history onto sixteen oscillators—and the constraints that made it work.

The core idea is simple: each programming language in the repository gets its own oscillator. Its line count becomes gain. The commit index is the only clock. Once that contract holds, the work becomes keeping the mapping stable while the data moves. But a contract is not an implementation, and most of the lessons came from what didn’t work on the first pass.

The Graph: Small and Minimal

I wanted the smallest audio graph that could still separate sixteen languages by ear. The topology is almost boring: each voice is an oscillator into a gain node, all voices feed a shared low-pass filter and a reverb, and the reverb feeds a master gain that hits the destination.

const MAX_VOICES = 16;
const FILTER_CUTOFF = 2500;
const REVERB_WET = 0.15;

function initAudio() {
  audioCtx = new AudioContext();
  
  masterGain = audioCtx.createGain();
  masterGain.connect(audioCtx.destination);
  
  filter = audioCtx.createBiquadFilter();
  filter.type = 'lowpass';
  filter.frequency.value = FILTER_CUTOFF;
  
  // Reverb adds space without muddying transients
  reverb = createReverb(audioCtx);
  reverbWetGain = audioCtx.createGain();
  reverbWetGain.gain.value = REVERB_WET;
  
  filter.connect(reverb);
  reverb.connect(reverbWetGain);
  reverbWetGain.connect(masterGain);
}

The only browser constraint worth noting is that AudioContext starts suspended until user interaction. That sets the interaction surface: sound can only begin after a click.

The Mud: Why Harmonic Series Fails

The first pitch mapping was textbook: 110 Hz, 220 Hz, 330 Hz, and so on—a harmonic series rooted on A2. It looks elegant on paper and collapses in practice. Real repositories are dominated by one or two languages, which means the loudest oscillators cluster in the lowest registers. The result is a bass-heavy blur where JavaScript and TypeScript smear into one muddy drone.

The fix was a chord progression. Instead of mapping languages to a static frequency array, I cycle through a I–V–vi–IV progression (C–G–Am–F). Each chord has sixteen frequencies spanning roughly three octaves, with the core triad doubled and colour tones filling the upper slots. Languages keep their voice index, but the frequency that index represents glides to the next chord every bar.

// I-V-vi-IV progression: each chord is 16 frequencies across ~3 octaves
const CHORD_PROGRESSION = [
  // I - C Major (C-E-G with doublings and colour tones)
  [130.81, 164.81, 196.00, 261.63, 329.63, 392.00, 523.25, 659.25,
   196.00, 261.63, 329.63, 392.00, 440.00, 493.88, 349.23, 293.66],
  // V - G Major
  [196.00, 246.94, 293.66, 392.00, 493.88, 587.33, 783.99, 987.77,
   293.66, 392.00, 493.88, 587.33, 659.25, 739.99, 523.25, 440.00],
  // vi - A Minor
  [220.00, 261.63, 329.63, 440.00, 523.25, 659.25, 880.00, 1046.50,
   329.63, 440.00, 523.25, 659.25, 739.99, 783.99, 587.33, 493.88],
  // IV - F Major
  [174.61, 220.00, 261.63, 349.23, 440.00, 523.25, 698.46, 880.00,
   261.63, 349.23, 440.00, 523.25, 587.33, 659.25, 466.16, 392.00]
];

Once the top languages are separated by an octave or more and the harmony is moving, the mix stops collapsing into one note. The chord changes give temporal structure; a listener can hear which bar they’re in.

The Thrumming: Phase Alignment Beats

When all oscillators start at the same instant, their phases lock. Close frequencies then beat against each other and you hear a slow pulsing that has nothing to do with the data. The fix is a small random detune so phase relationships decorrelate over time.

const baseDetune = (Math.random() - 0.5) * 6; // ±3 cents
osc.detune.value = baseDetune;

The pitch shift is below perception; the interference pattern vanishes.

Gain Is Perception, Not Proportion

A linear gain mapping makes small languages silent. If Python owns 80% of the lines and Go owns 5%, a direct proportion means Go’s oscillator sits twenty dB below Python’s and disappears into the noise floor. Human loudness perception follows a roughly logarithmic curve, so the gain mapping must compensate.

I used a power curve with an exponent around 0.4. This boosts quiet voices without crushing dominant ones—the compression is gentle enough that you can still hear when Java surges past C++.

const audioSettings = { gainCurvePower: 0.4 };

function computePerceivedGain(proportion) {
  return Math.pow(proportion, audioSettings.gainCurvePower);
}

This keeps 5–10% languages audible, which matters because those are often the interesting shifts in a repo’s history.

Motion: Detune as a Trend Signal

Static volume is not enough to convey change. I added a small pitch bend based on growth rate between commits. Rising languages tilt sharp; shrinking ones tilt flat; new languages get a gentle lift to announce their arrival.

const DETUNE_MAX = 25; // cents

let detune = 0;
if (prevValue > 0) {
  const growthRate = (value - prevValue) / prevValue;
  detune = Math.max(-DETUNE_MAX, Math.min(DETUNE_MAX, growthRate * DETUNE_MAX));
} else if (value > 0) {
  detune = DETUNE_MAX * 0.5; // new language appears
}

voice.osc.detune.linearRampToValueAtTime(baseDetune + detune, rampEnd);

The effect is subtle—never more than a quarter-tone—but it makes the chord feel alive. A voice that’s climbing sounds eager; a voice that’s falling sounds tired.

Space: Stereo as Separation

Sixteen voices in mono is a crowd. Stereo panning makes the mix readable by spreading languages across the soundstage.

const panner = audioCtx.createStereoPanner();
const basePan = (voiceIndex / (MAX_VOICES - 1)) * 2 - 1; // -1 to +1
panner.pan.value = basePan * audioSettings.stereoWidth;

I kept the width under 1.0 so the centre still anchors the mix. The dominant language tends to land somewhere in the middle; the minor players fan out to the edges.

The Pulsing Bug: Double Ramps

Three hours into debugging the mysterious pulse, I finally logged the ramp targets for a single voice. The problem jumped out: each frame was issuing two conflicting ramps. The naive approach was to silence every voice first, then set the active ones. But that meant each active voice received a ramp to zero and then a ramp to its target—within the same frame.

// BUGGY: Every frame issues a ramp to zero, then a ramp to target
function updateAudio() {
  for (const voice of voices) {
    voice.gain.gain.linearRampToValueAtTime(0, rampEnd);
  }
  for (const [lang, data] of activeLanguages) {
    voice.gain.gain.linearRampToValueAtTime(data.gain, rampEnd);
  }
}

The Web Audio scheduler merges overlapping ramps, which caused the gain to dip toward zero before recovering—hence the pulse. The fix was to track which voices were active and issue exactly one ramp per voice per frame. Once the commands stopped conflicting, the pulsing vanished.

The lesson generalises: if you’re scheduling audio parameters imperatively, make sure each parameter receives exactly one command per scheduling window.

Precomputation: Move Math Out of Playback

All of the gain, detune, and intensity math is deterministic: given a commit index and a metric (lines, files, bytes), the output is fixed. I moved the computation into build time, so the browser just reads pre-computed arrays on playback.

function computeAudioData(results, allLanguages) {
  const audioData = [];
  
  for (let i = 0; i < results.length; i++) {
    const commit = results[i];
    const prevCommit = i > 0 ? results[i - 1] : null;
    
    // Compute gains and detunes for each metric
    const frameData = {};
    for (const metric of ['lines', 'files', 'bytes']) {
      frameData[metric] = computeVoiceData(commit, prevCommit, metric);
    }
    audioData.push(frameData);
  }
  
  return audioData;
}

The browser’s per-frame work drops to array lookups and scheduling calls. The trade-off is a larger JSON payload (the audio data adds roughly 15% to the file), but the playback loop becomes trivially simple: index → values → ramps.

Rhythm: Drums as Temporal Anchors

The oscillators alone felt shapeless at high playback speeds. I added a minimal drum pattern—kick on beats 1 and 3, snare on 2 and 4—to give the listener a temporal grid. At 4× speed and above, the pattern switches to half-time (kick on 1, snare on 3) so it doesn’t become frantic.

Cymbals trigger only when the total code mass changes by more than 15% between frames—a sonic marker for significant commits. The threshold prevents every small commit from crashing, but big refactors or merges still announce themselves.

const CYMBAL_THRESHOLD = 0.15;

if (Math.abs(delta) > CYMBAL_THRESHOLD * prevTotal) {
  scheduleCymbal(time);
}

The drums are not data; they are context. They tell you how fast the timeline is moving so you can interpret the chord changes correctly.

Why Keep It

From a product standpoint, the audio is optional. The chart conveys all the information a user needs. But building the audio layer forced a precision that the visual layer didn’t demand. I had to define exactly what “proportion” means, exactly how change should feel, exactly when silence is louder than sound.

The chart tells me what changed. The chord tells me how much it changed—and whether that change was a slow drift or a sudden shift. For a repository’s history, those dynamics are the story.

See also: Building a Code Evolution Analyzer in a Weekend — the full project story