Audio Pipeline Latency — Measure, Understand, Optimize
Time: 60 min | Prerequisites: I2S Audio Visualizer
This tutorial teaches how real-time audio pipelines work on Linux by measuring, calculating, and optimizing end-to-end latency in the Audio Visualizer Full demo. You'll learn where latency comes from, why it matters, and how to trade off latency vs. stability — a core embedded systems skill applicable far beyond audio.
1. The Pipeline
Sound travels through several stages between the microphone and the speaker. Each stage adds delay:
┌─────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐
│ I2S Mic │──▶│ ALSA │──▶│ Ring │──▶│ DSP │──▶│ ALSA │──▶│ Speaker │
│ │ │ Capture │ │ Buffer │ │ (filter, │ │ Playback │ │ │
│ │ │ Period │ │ │ │ EQ, FX) │ │ Buffer │ │ │
└─────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └─────────┘
│ │ │ │ │ │
└── analog ────┘── digital ───┘── software ──┘── digital ───┘── analog ────┘
| Stage | What happens | Latency source |
|---|---|---|
| I2S mic | ADC converts sound to digital | ~1 sample (negligible) |
| ALSA capture period | Kernel collects N samples before waking the app | N / sample_rate |
| Ring buffer | App picks up data on next render frame | 0–1 period (depends on timing) |
| DSP processing | Filters, FFT, EQ, effects | < 1 ms on Pi 4 |
| ALSA playback buffer | Kernel buffers M samples before sending to DAC | M / sample_rate |
| DAC + analog | Digital-to-analog conversion | ~1 sample (negligible) |
The dominant factors are the ALSA period sizes — the rest is negligible on modern hardware.
2. Calculate Theoretical Latency
The minimum latency is determined by the capture period plus the playback buffer:
Latency_min = T_capture + T_playback
T_capture = period_frames / sample_rate
T_playback = playback_buffer / sample_rate
≈ 2 × period_frames / sample_rate (ALSA uses 2 periods by default)
Latency_min ≈ 3 × period_frames / sample_rate
For the default 1024 frames at 48 kHz:
Note
64 ms is perceptible. Clap your hands near the microphone — you'll hear the original clap, then the playback echo ~64 ms later. Musicians notice anything above 10 ms. For telephony, 150 ms is the threshold where conversation feels awkward.
3. Measure It
3.1 Run with default settings
# Build if needed
cd ~/embedded-linux/apps/i2s-audio-viz && make audio_viz_full
# Run in stereo with playback enabled
./audio_viz_full -d hw:1,0 -c 2 -f
Press PLAY to enable audio output. The display shows: - Vis: capture-to-display latency (visual pipeline) - FX: capture-to-speaker latency (audio pipeline)
Open the EQ overlay — the bottom status bar shows the latency breakdown:
3.2 Run with low-latency flag
The -l flag reduces the period to 256 samples:
Tip
Hear the difference: with playback ON, clap near the mic.
- Default (64 ms): you hear a distinct echo after your clap
- Low-latency (16 ms): the echo is barely noticeable
- Ultra-low (-n 64, ~4 ms): echo is gone, but you may hear clicks/pops
3.3 Sweep period sizes
Try each and note the latency and audio quality:
Period (-n) |
Capture | Playback buf | Total | Quality |
|---|---|---|---|---|
| 2048 | 42.7 ms | ~85 ms | ~128 ms | Perfect, obvious echo |
| 1024 (default) | 21.3 ms | ~43 ms | ~64 ms | Perfect, noticeable echo |
| 512 | 10.7 ms | ~21 ms | ~32 ms | Good, slight echo |
256 (-l) |
5.3 ms | ~11 ms | ~16 ms | Good, barely perceptible |
| 128 | 2.7 ms | ~5 ms | ~8 ms | May glitch on loaded system |
| 64 | 1.3 ms | ~3 ms | ~4 ms | Likely glitches (underruns) |
Warning
Below ~128 samples, the CPU has less than 2.7 ms to process each block. If the scheduler delays the audio thread (context switch, CPU spike from the display), the playback buffer runs empty → audible pop/click (underrun). This is the fundamental latency–reliability tradeoff.
4. Where the Time Goes
4.1 ALSA Period (Capture)
ALSA captures audio in periods — the kernel fills a buffer of N samples, then wakes the application. The app can't process audio until the period is complete:
Time ──────────────────────────────────────────▶
Mic: |sample|sample|sample| ... |sample| ← N samples
│◄──────── period ────────────►│
│
└─ kernel wakes app HERE
(N/48000 seconds after first sample)
Smaller period = lower latency but higher CPU overhead (more wakeups/second) and more risk of underruns.
4.2 ALSA Buffer (Playback)
The playback side has a buffer of typically 2 periods. The app writes a period, ALSA starts playing it. While that plays, the app writes the next period:
If the app doesn't write the next period before Period A finishes → underrun (silence gap, audible click).
4.3 Software Processing
All DSP (HP filter, EQ, FFT, effects) runs between capture and playback. At 1024 samples on a Pi 4:
| Operation | Typical time |
|---|---|
| High-pass filter | 0.01 ms |
| 8-band biquad EQ | 0.05 ms |
| FFT (1024-point) | 0.05 ms |
| GCC-PHAT | 0.2 ms |
| Voice FX | 0.02 ms |
| SDL2 rendering | 2–5 ms |
| Total | ~3 ms |
Processing is < 5% of the 21 ms budget — not the bottleneck. The latency is almost entirely ALSA buffering.
5. The Latency–Reliability Tradeoff
This is one of the most important concepts in real-time systems:
Low latency ◄─────────────────────────────────► High reliability
Small buffers Large buffers
Tight deadlines Lots of slack
Glitches possible Always smooth
Professional audio: Consumer audio: VoIP: Streaming:
~2-5 ms ~20-40 ms ~40-150 ms ~1-5 sec
Why can't we just use tiny buffers?
Linux is not a hard real-time OS. The scheduler might: - Run a different process for a few milliseconds - Handle an interrupt (network, USB, display) - Trigger garbage collection or memory management
If the audio thread is delayed by just 3 ms and the period is only 2 ms → underrun.
How professionals solve it
| Technique | How | Used in |
|---|---|---|
SCHED_FIFO |
Real-time scheduler priority | JACK, PipeWire |
| CPU isolation | Pin audio thread to dedicated core | Pro audio |
PREEMPT_RT |
Real-time kernel patches | Low-latency distros |
memlock |
Prevent audio buffers from being swapped | All pro audio |
| Kernel tuning | Disable CPU frequency scaling, disable USB polling | Dedicated audio |
Tip
Try it: Compare latency with and without real-time scheduling:
# Normal scheduling (default)
./audio_viz_full -d hw:1,0 -c 2 -f -n 128
# Real-time scheduling (needs root or rtprio permission)
sudo chrt -f 50 ./audio_viz_full -d hw:1,0 -c 2 -f -n 128
chrt -f 50, the audio thread gets FIFO priority — it preempts everything except the kernel. You should be able to use smaller periods without glitches.
6. Measuring with a Physical Setup
For a precise end-to-end measurement that includes the analog path:
6.1 Loopback Test
- Connect the speaker output close to the microphone (or use a cable)
- Generate a click/impulse (tap the desk)
- The visualizer shows the original impulse AND the playback impulse
- The time difference between them = total pipeline latency
6.2 Oscilloscope Method
For the most accurate measurement: 1. Feed a square wave into the microphone (or tap to create an impulse) 2. Probe the I2S BCLK pin (capture) and the audio output (playback) with an oscilloscope 3. Measure the time between the input edge and the output response
This reveals the true end-to-end latency including kernel scheduling jitter.
7. Connection to Real-Time Systems
This pipeline latency exercise connects directly to the PREEMPT_RT Latency and Jitter Measurement tutorials:
| Audio pipeline concept | General RT concept |
|---|---|
| ALSA period = deadline | Task period |
| Underrun = missed deadline | Deadline miss |
| Buffer size = slack | Worst-case execution time margin |
SCHED_FIFO for audio thread |
Real-time scheduling |
| CPU isolation for audio core | Core affinity |
| Measuring pipeline latency | Measuring control loop latency |
The same principles apply to any real-time embedded system: - Motor control: sensor → compute → actuator pipeline - Robotics: camera → vision → motor command pipeline - Industrial: PLC scan cycle timing - Automotive: CAN bus message → ECU processing → actuator response
Exercises
Tip
Exercise 1: Latency Budget Table
Fill in this table by running audio_viz_full with different -n values and PLAY enabled. Note the displayed latency and whether you hear glitches:
| Period | Calculated latency | Displayed latency | Glitches? |
|---|---|---|---|
| 2048 | ___ ms | ___ ms | Y/N |
| 1024 | ___ ms | ___ ms | Y/N |
| 512 | ___ ms | ___ ms | Y/N |
| 256 | ___ ms | ___ ms | Y/N |
| 128 | ___ ms | ___ ms | Y/N |
| 64 | ___ ms | ___ ms | Y/N |
Tip
Exercise 2: Real-Time Scheduling
Run with -n 128 both with and without sudo chrt -f 50. Does real-time scheduling eliminate the glitches? Why or why not?
Tip
Exercise 3: CPU Load Test
While audio_viz_full is running with -n 256 -l, open another terminal and run stress --cpu 4. Does the audio quality degrade? What about with chrt -f 50?
Tip
Exercise 4: Compare Visual vs Audio Latency With PLAY enabled, clap near the mic and watch the waveform display. The visual response is nearly instant (just the capture period). The audio playback lags behind. Why is the visual latency lower than the audio latency? (Hint: the display doesn't need an output buffer.)
Tip
Exercise 5: Calculate for a Control System A motor control loop reads an encoder, computes PID, and outputs a PWM signal. The loop runs at 10 kHz (100 µs period). If the computation takes 30 µs: - What is the theoretical minimum latency from encoder read to PWM update? - How much slack does the system have? - What happens if an interrupt takes 80 µs?
Compare this to the audio pipeline: what's the equivalent of "period size" and "buffer size" in the motor controller?
See also: Audio Viz Challenges | Signal Processing Reference | PREEMPT_RT Latency | SDL2 UI Patterns