Sound Detection and Beat Recognition

How do we detect beats when we can only take snapshots of a continuous sound wave?

The Analog-to-Digital Problem

Sound is a continuous wave - pressure oscillations in air. But computers work with discrete numbers.

The Bridge: ADC

The Analog-to-Digital Converter (ADC) is the bridge between worlds. It measures voltage at specific moments and converts it to a number.

How Sampling Works

Taking Snapshots

The ADC doesn't see the whole wave - it takes periodic "snapshots":

What happens between samples? We don't know! That information is lost.

Sample Rate

How often we sample matters:

Sample Rate	What We Can Catch
10 Hz	Very slow changes only
100 Hz	Hand claps, drum hits
1000 Hz	Most music beats
44100 Hz	Full audio (CD quality)

For beat detection, 50-100 samples/second is usually enough - beats are slow compared to audio.

Nyquist Theorem (Simplified)

To capture a frequency, you need to sample at least 2× that frequency. Music beats are typically 1-3 Hz (60-180 BPM), so even 10 samples/second would work. We use more for better accuracy.

From Raw Samples to Amplitude

The Silence Problem

A microphone outputs voltage centered at a baseline (typically 1.65V = half of 3.3V):

      Voltage
        ↑
   3.3V ┤                    ┌─ Loud sound
        │         ╱╲      ╱╲ │
   1.65V┤────────╱──╲────╱──╲│───  ← Baseline (silence)
        │       ╱    ╲  ╱    ╲
     0V ┤──────╱──────╲╱──────└─ Loud sound
        └──────────────────────→ Time

Silence = constant 1.65V = ADC reads ~32768 Sound = voltage swings above AND below baseline

Calculating Amplitude

Amplitude = distance from baseline, regardless of direction:

BASELINE = 32768  # Silence value

def get_amplitude(raw_sample):
    return abs(raw_sample - BASELINE)

Raw:       28000  32768  38000  25000
           (low)  (mid)  (high) (low)
             ↓      ↓      ↓      ↓
Amplitude:  4768     0   5232   7768
           (swing) (none) (swing) (bigger swing)

Smoothing

Single samples are noisy. Average a few for stability:

history = [0, 0, 0, 0, 0]  # Last 5 samples

def smoothed_amplitude(raw):
    amp = abs(raw - BASELINE)

    # Shift history, add new value
    history.pop(0)
    history.append(amp)

    # Return average
    return sum(history) // len(history)

Peak Detection: Finding the Beat

The Challenge

A clap or drum hit creates a brief spike in amplitude. We need to detect it!

Simple Threshold

THRESHOLD = 40  # 0-100 scale

def is_beat(level):
    return level > THRESHOLD

Problem: A single spike triggers multiple "beats" because level stays high across several samples!

Adding Cooldown

Ignore beats that happen too close together:

THRESHOLD = 40
COOLDOWN_MS = 100  # Minimum time between beats

last_beat_time = 0

def is_beat(level, current_time):
    global last_beat_time

    if level > THRESHOLD:
        time_since_last = current_time - last_beat_time

        if time_since_last > COOLDOWN_MS:
            last_beat_time = current_time
            return True

    return False

Adaptive Threshold

The Problem with Fixed Threshold

Different music, different volumes: - Quiet jazz → level barely reaches 20 - Loud EDM → level constantly above 60

Fixed threshold fails for both!

Solution: Adapt to Environment

Track the average level and trigger when significantly above it:

class AdaptiveDetector:
    def __init__(self, sensitivity=1.5):
        self.avg_level = 20        # Running average
        self.sensitivity = sensitivity  # 1.5 = 50% above average

    def update(self, level):
        # Slow-moving average (adapts over time)
        self.avg_level = 0.95 * self.avg_level + 0.05 * level

    def is_beat(self, level):
        self.update(level)
        threshold = self.avg_level * self.sensitivity
        return level > threshold

Quiet environment:
    Average: 15  →  Threshold: 22  →  Beat at level 25 ✓

Loud environment:
    Average: 50  →  Threshold: 75  →  Beat at level 80 ✓

Complete Signal Processing Pipeline

Timing Considerations

How Fast Is Fast Enough?

Stage	Typical Time	Notes
ADC read	5-10 µs	Very fast
Amplitude calc	1-2 µs	Simple math
Smoothing	2-5 µs	Array operations
Beat detection	1-2 µs	Comparisons
Total	~15 µs	Can run 60,000+/sec!

Even running at 50 Hz (every 20ms), processing takes <0.1% of available time.

Latency

From sound to detection:

Source	Latency	Notes
Sound through air	3ms per meter	Speed of sound
Microphone	<1ms	Negligible
ADC sampling	10ms average	At 100 Hz sampling
Processing	<1ms	Fast calculations
LED response	<1ms	NeoPixel protocol
Total	~15-20ms	Imperceptible to humans

Human perception threshold for audio-visual sync is ~50ms. We're well under!

Common Problems and Solutions

Problem: Missing Fast Beats

Symptom: Some beats not detected Cause: Cooldown too long, or sample rate too low Solution: Decrease cooldown, increase sample rate

Problem: Double Triggers

Symptom: One beat triggers multiple times Cause: Cooldown too short Solution: Increase cooldown to 100-150ms

Problem: False Triggers

Symptom: Triggers on non-beat sounds Cause: Threshold too low Solution: Increase threshold, or use adaptive detection

Problem: No Triggers at All

Symptom: Never detects beats Cause: Threshold too high, or mic not working Solution: Check raw values first, then lower threshold