Sound Detection and Beat Recognition
How do we detect beats when we can only take snapshots of a continuous sound wave?
The Analog-to-Digital Problem
Sound is a continuous wave - pressure oscillations in air. But computers work with discrete numbers.
The Bridge: ADC
The Analog-to-Digital Converter (ADC) is the bridge between worlds. It measures voltage at specific moments and converts it to a number.
How Sampling Works
Taking Snapshots
The ADC doesn't see the whole wave - it takes periodic "snapshots":
What happens between samples? We don't know! That information is lost.
Sample Rate
How often we sample matters:
| Sample Rate | What We Can Catch |
|---|---|
| 10 Hz | Very slow changes only |
| 100 Hz | Hand claps, drum hits |
| 1000 Hz | Most music beats |
| 44100 Hz | Full audio (CD quality) |
For beat detection, 50-100 samples/second is usually enough - beats are slow compared to audio.
Nyquist Theorem (Simplified)
To capture a frequency, you need to sample at least 2× that frequency. Music beats are typically 1-3 Hz (60-180 BPM), so even 10 samples/second would work. We use more for better accuracy.
From Raw Samples to Amplitude
The Silence Problem
A microphone outputs voltage centered at a baseline (typically 1.65V = half of 3.3V):
Voltage
↑
3.3V ┤ ┌─ Loud sound
│ ╱╲ ╱╲ │
1.65V┤────────╱──╲────╱──╲│─── ← Baseline (silence)
│ ╱ ╲ ╱ ╲
0V ┤──────╱──────╲╱──────└─ Loud sound
└──────────────────────→ Time
Silence = constant 1.65V = ADC reads ~32768 Sound = voltage swings above AND below baseline
Calculating Amplitude
Amplitude = distance from baseline, regardless of direction:
Raw: 28000 32768 38000 25000
(low) (mid) (high) (low)
↓ ↓ ↓ ↓
Amplitude: 4768 0 5232 7768
(swing) (none) (swing) (bigger swing)
Smoothing
Single samples are noisy. Average a few for stability:
history = [0, 0, 0, 0, 0] # Last 5 samples
def smoothed_amplitude(raw):
amp = abs(raw - BASELINE)
# Shift history, add new value
history.pop(0)
history.append(amp)
# Return average
return sum(history) // len(history)
Peak Detection: Finding the Beat
The Challenge
A clap or drum hit creates a brief spike in amplitude. We need to detect it!
Simple Threshold
Problem: A single spike triggers multiple "beats" because level stays high across several samples!
Adding Cooldown
Ignore beats that happen too close together:
THRESHOLD = 40
COOLDOWN_MS = 100 # Minimum time between beats
last_beat_time = 0
def is_beat(level, current_time):
global last_beat_time
if level > THRESHOLD:
time_since_last = current_time - last_beat_time
if time_since_last > COOLDOWN_MS:
last_beat_time = current_time
return True
return False
Adaptive Threshold
The Problem with Fixed Threshold
Different music, different volumes: - Quiet jazz → level barely reaches 20 - Loud EDM → level constantly above 60
Fixed threshold fails for both!
Solution: Adapt to Environment
Track the average level and trigger when significantly above it:
class AdaptiveDetector:
def __init__(self, sensitivity=1.5):
self.avg_level = 20 # Running average
self.sensitivity = sensitivity # 1.5 = 50% above average
def update(self, level):
# Slow-moving average (adapts over time)
self.avg_level = 0.95 * self.avg_level + 0.05 * level
def is_beat(self, level):
self.update(level)
threshold = self.avg_level * self.sensitivity
return level > threshold
Quiet environment:
Average: 15 → Threshold: 22 → Beat at level 25 ✓
Loud environment:
Average: 50 → Threshold: 75 → Beat at level 80 ✓
Complete Signal Processing Pipeline
Timing Considerations
How Fast Is Fast Enough?
| Stage | Typical Time | Notes |
|---|---|---|
| ADC read | 5-10 µs | Very fast |
| Amplitude calc | 1-2 µs | Simple math |
| Smoothing | 2-5 µs | Array operations |
| Beat detection | 1-2 µs | Comparisons |
| Total | ~15 µs | Can run 60,000+/sec! |
Even running at 50 Hz (every 20ms), processing takes <0.1% of available time.
Latency
From sound to detection:
| Source | Latency | Notes |
|---|---|---|
| Sound through air | 3ms per meter | Speed of sound |
| Microphone | <1ms | Negligible |
| ADC sampling | 10ms average | At 100 Hz sampling |
| Processing | <1ms | Fast calculations |
| LED response | <1ms | NeoPixel protocol |
| Total | ~15-20ms | Imperceptible to humans |
Human perception threshold for audio-visual sync is ~50ms. We're well under!
Common Problems and Solutions
Problem: Missing Fast Beats
Symptom: Some beats not detected Cause: Cooldown too long, or sample rate too low Solution: Decrease cooldown, increase sample rate
Problem: Double Triggers
Symptom: One beat triggers multiple times Cause: Cooldown too short Solution: Increase cooldown to 100-150ms
Problem: False Triggers
Symptom: Triggers on non-beat sounds Cause: Threshold too low Solution: Increase threshold, or use adaptive detection
Problem: No Triggers at All
Symptom: Never detects beats Cause: Threshold too high, or mic not working Solution: Check raw values first, then lower threshold
Further Reading
- Robot Unboxing — First steps with the robot and buzzer
- ADC Basics — How analog-to-digital conversion works
- PWM for Sound — Generating tones with the buzzer
Physics Connection: Frequency vs Amplitude
- Amplitude = How loud (height of wave) — what we detect for beats
- Frequency = Pitch (waves per second) — would need FFT to detect
Beat detection uses amplitude only, which is much simpler than frequency analysis!
Math Connection: Moving Average
The smoothing we use is a Simple Moving Average (SMA): $\(\bar{x}_n = \frac{1}{k} \sum_{i=0}^{k-1} x_{n-i}\)$
More advanced: Exponential Moving Average (EMA) gives more weight to recent samples: $\(\bar{x}_n = \alpha \cdot x_n + (1-\alpha) \cdot \bar{x}_{n-1}\)$