Skip to content

CPU Limits and DMA: When Software Isn't Fast Enough

Advanced Reference | For students who want to understand hardware data transfer

This document explains when the CPU becomes a bottleneck for data transfer and how DMA (Direct Memory Access) solves this problem.


The Problem: CPU as Data Mover

In simple embedded programs, the CPU handles everything:

# CPU reads sensor, stores value, repeats
while True:
    value = sensor.read()      # CPU fetches data
    buffer.append(value)       # CPU stores data
    # ... CPU does this thousands of times

This works fine at low speeds. But what happens when you need to: - Sample an ADC at 44.1 kHz (audio)? - Read IMU data at 1000 Hz continuously? - Stream data to/from an SD card?

The CPU becomes the bottleneck.


Calculating CPU Load for Data Transfer

Example 1: IMU Data Collection

Your robot's BMI160 IMU communicates over I2C. Let's calculate the limits.

I2C Protocol Overhead:

I2C transaction to read 6 bytes (accel X, Y, Z):
┌─────────────────────────────────────────────────────────────┐
│  START │ ADDR+W │ ACK │ REG │ ACK │ RESTART │ ADDR+R │ ACK │
│   1    │   8    │  1  │  8  │  1  │    1    │   8    │  1  │
├─────────────────────────────────────────────────────────────┤
│  DATA0 │ ACK │ DATA1 │ ACK │ ... │ DATA5 │ NACK │ STOP    │
│   8    │  1  │   8   │  1  │ ... │   8   │   1  │   1     │
└─────────────────────────────────────────────────────────────┘

Total bits: ~90 bits per 6-byte read
At 400 kHz I2C: 90 / 400,000 = 225 µs hardware time

MicroPython Overhead:

# Measured on RP2350 with MicroPython
import time
from machine import I2C, Pin

i2c = I2C(0, scl=Pin(15), sda=Pin(14), freq=400000)

# Measure single read
start = time.ticks_us()
for _ in range(1000):
    data = i2c.readfrom_mem(0x68, 0x12, 6)
elapsed = time.ticks_diff(time.ticks_us(), start)

print(f"Per read: {elapsed/1000:.1f} µs")
# Result: ~450-550 µs per read (interpreter overhead!)

CPU Load Calculation:

Sample Rate Time per Second CPU Load Feasible?
100 Hz 50 ms 5% ✅ Easy
200 Hz 100 ms 10% ✅ Good
500 Hz 250 ms 25% ⚠️ Tight
1000 Hz 500 ms 50% ⚠️ Marginal
2000 Hz 1000 ms 100% ❌ Impossible

Key insight: At 1000 Hz, the CPU spends 50% of its time just moving IMU data. No time left for control calculations!


Example 2: ADC Sampling

The RP2350's ADC can sample at 500 kHz in hardware. But MicroPython can't keep up:

from machine import ADC
import time

adc = ADC(26)

# Measure ADC read speed
start = time.ticks_us()
for _ in range(10000):
    val = adc.read_u16()
elapsed = time.ticks_diff(time.ticks_us(), start)

print(f"Per read: {elapsed/10000:.1f} µs")
# Result: ~25-50 µs per read
Application Required Rate Python Achievable Gap
Line sensor 100 Hz ✅ 20,000 Hz OK
Battery monitor 10 Hz ✅ 20,000 Hz OK
Audio (mono) 44,100 Hz ❌ 20,000 Hz 2× short
Audio (stereo) 88,200 Hz ❌ 20,000 Hz 4× short
Oscilloscope 1 MHz ❌ 20,000 Hz 50× short

The Solution: DMA (Direct Memory Access)

DMA is a hardware peripheral that moves data without CPU involvement.

┌─────────────────────────────────────────────────────────────┐
│  WITHOUT DMA                                                 │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   ADC ──► CPU ──► RAM                                        │
│           │                                                  │
│           └── CPU busy for EVERY sample                      │
│                                                              │
├─────────────────────────────────────────────────────────────┤
│  WITH DMA                                                    │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   ADC ──────────► DMA ──────────► RAM                        │
│                    │                                         │
│   CPU: [other work] │ [other work] │ [process batch]        │
│                    │                                         │
│                    └── DMA handles transfer, CPU is FREE     │
│                                                              │
└─────────────────────────────────────────────────────────────┘

How DMA Works

  1. Configure: Tell DMA where to read from (peripheral), where to write (RAM), how many bytes
  2. Start: Trigger the DMA channel
  3. Transfer: DMA moves data automatically, byte by byte
  4. Interrupt: DMA signals CPU when done (optional)
  5. Process: CPU processes the collected batch
// Conceptual C code (not MicroPython)
// Configure DMA to read 1000 ADC samples into buffer
dma_channel_configure(
    dma_chan,
    &cfg,
    buffer,          // Write to this RAM address
    &adc_hw->fifo,   // Read from ADC FIFO
    1000,            // Transfer 1000 samples
    true             // Start immediately
);

// CPU is now FREE while DMA collects samples
do_other_work();

// Wait for DMA to finish
dma_channel_wait_for_finish_blocking(dma_chan);

// Now process the 1000 samples
process_buffer(buffer, 1000);

DMA Advantages

Aspect CPU Polling DMA
CPU load during transfer 100% ~0%
Maximum throughput Limited by software Limited by hardware
Timing jitter High (interrupts, GC) Zero (hardware)
Power consumption Higher Lower
Complexity Simple code Requires configuration

When to Use DMA

Decision Framework

┌─────────────────────────────────────────────────────────────┐
│  DO I NEED DMA?                                              │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   Sample rate × Transfer time > 20% CPU?                     │
│                    │                                         │
│          ┌────────┴────────┐                                │
│          │                 │                                 │
│         YES               NO                                 │
│          │                 │                                 │
│          ▼                 ▼                                 │
│   Consider DMA       CPU polling OK                          │
│                                                              │
│   Also consider DMA if:                                      │
│   • Timing jitter is unacceptable                           │
│   • Need deterministic latency                              │
│   • Power consumption matters                               │
│   • Continuous streaming required                           │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Common DMA Use Cases

Application Without DMA With DMA
Audio playback Glitchy, drops samples Smooth, continuous
High-speed ADC Limited to ~20 kHz Up to 500 kHz
SD card access Slow, blocks CPU Fast, background
Display update Slow refresh, tearing Fast, tear-free
Motor encoder May miss pulses Captures all edges

DMA on the RP2350

The RP2350 has 16 DMA channels, each can: - Read from any peripheral or memory - Write to any peripheral or memory - Transfer up to 4GB (practically limited by RAM) - Chain to other channels for complex transfers - Trigger interrupts on completion

RP2350 DMA-Capable Peripherals

Peripheral DMA Read DMA Write Common Use
ADC High-speed sampling
SPI SD cards, displays
I2C Bulk sensor reads
UART Serial streaming
PIO Custom protocols
PWM Waveform generation

MicroPython and DMA

MicroPython uses DMA internally for some operations:

Operation Uses DMA? Notes
neopixel.write() ✅ Yes PIO + DMA
spi.write() ✅ Yes For large transfers
i2c.readfrom() ❌ No CPU polling
adc.read_u16() ❌ No Single sample
machine.I2S ✅ Yes Audio streaming

To use DMA directly, you need: - C/C++ with the Pico SDK, or - Custom MicroPython modules, or - PIO programs (which can use DMA)


Practical Limits for This Course

For ES101, we stay within Python's limits:

Task Our Rate Python Limit Margin
Line sensors 100 Hz 2000 Hz 20× OK
IMU reading 100 Hz 1000 Hz 10× OK
Ultrasonic 10 Hz 100 Hz 10× OK
Battery voltage 1 Hz 1000 Hz 1000× OK
Control loop 100 Hz 500 Hz 5× OK

We don't need DMA because our sample rates are well within Python's capabilities.

But understanding DMA helps you: 1. Know when Python won't be enough 2. Understand what the C SDK provides 3. Make informed decisions in future projects 4. Debug "impossible" timing problems


The Complete Picture

┌─────────────────────────────────────────────────────────────┐
│  DATA TRANSFER HIERARCHY                                     │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   Speed        Method              CPU Load    Use When      │
│   ─────────────────────────────────────────────────────────  │
│   Slowest      Python polling      High        Prototyping   │
│                                                              │
│   ↓            C polling           Medium      Simple apps   │
│                                                              │
│   ↓            Interrupts          Low*        Events        │
│                                               (*short ISRs)  │
│                                                              │
│   ↓            DMA                 ~Zero       Streaming     │
│                                                              │
│   Fastest      DMA + PIO           ~Zero       Custom HW     │
│                                                              │
│   Each level trades complexity for performance.              │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Summary

CPU load = Sample rate × Time per sample

When this exceeds ~20-30%, consider hardware assistance (DMA).

Concept Key Point
CPU bottleneck Software can't keep up with hardware speeds
DMA purpose Move data without CPU involvement
When to use High sample rates, continuous streaming, low jitter
In this course Not needed, but understanding helps design decisions


Further Reading