MCU Real-Time Controller (Pico 2 W + Linux Supervisor)
Time estimate: ~90 minutes Prerequisites: 1D Ball Balancing, PREEMPT_RT Latency
Learning Objectives
By the end of this tutorial you will be able to:
- Offload a hard-real-time control loop to a Raspberry Pi Pico 2 W
- Communicate between Linux and an MCU over UART using a binary protocol
- Supervise and tune an MCU controller from a Linux host
- Compare three real-time approaches (standard Linux, PREEMPT_RT, external MCU) with measurements
Heterogeneous Real-Time Architecture
When a single processor cannot guarantee both general-purpose workloads (networking, UI, logging) and hard-real-time control, the standard solution is a heterogeneous architecture: a Linux SoC handles supervisory tasks while a dedicated MCU runs the time-critical loop on a hardware timer interrupt. The MCU's bare-metal (or RTOS) environment is immune to Linux scheduling jitter, GC pauses, and kernel preemption. Communication between the two uses a deterministic link (UART, SPI, or shared memory), and the Linux side acts as a supervisor — sending setpoints and gains, receiving telemetry, and logging data. This is the same pattern used in industrial PLCs (RTOS controller + SCADA workstation), automotive ECUs (Cortex-M + Cortex-A), and drone flight controllers (STM32 + Linux companion).
This tutorial walks through all three real-time approaches — standard Linux, PREEMPT_RT, and external MCU — so you can compare their jitter characteristics directly.
See also: Real-Time Systems reference
1. Architecture: Why an External MCU?
In 1D Ball Balancing you ran the PID loop on the Pi. The kernel's scheduler, Python's garbage collector, and other processes all compete for CPU time — causing jitter. PREEMPT_RT improves worst-case latency, but Linux is still a general-purpose OS.
The third option is to move the time-critical loop off Linux entirely:
Approach A (Pure Linux): Pi runs PID + reads sensor + drives servo
Approach B (PREEMPT_RT): Same as A, but on RT kernel
Approach C (External MCU): Pico runs PID + sensor + servo; Pi supervises
When to choose each:
| Approach | Best For | Typical Loop Rate |
|---|---|---|
| A: Standard Linux | Soft real-time, best-effort (<100 Hz) | 10–50 Hz |
| B: PREEMPT_RT | Firm real-time, bounded latency (<1 kHz) | 50–500 Hz |
| C: External MCU | Hard real-time, safety-critical (>1 kHz possible) | 200–10,000 Hz |
In Approach C the MCU owns the sensor and actuator — it reads the VL53L0X, computes PID, and drives the servo on a hardware timer interrupt. Linux becomes a supervisor: it sends setpoint and gain changes, receives telemetry, and logs data. A stall on the Pi (CPU load, kernel update, Python GC) cannot affect the control loop.
This is the same architecture used in industrial PLCs, automotive ECUs, and drone flight controllers.
2. Physical Connection — UART Wiring
UART is the primary link because it is deterministic and low-latency — no USB stack, no wireless stack, no buffering surprises.
Wiring
| Pi 4 (UART3) | Direction | Pico 2 W (UART1) |
|---|---|---|
| GPIO4 (pin 7) — TX | → | GP5 — RX |
| GPIO5 (pin 29) — RX | ← | GP4 — TX |
| GND (pin 9) | — | GND |
The VL53L0X and servo connect to the Pico, not the Pi — the Pico owns the sensor/actuator hardware:
- VL53L0X: SDA → GP0, SCL → GP1, VCC → 3V3, GND → GND
- Servo signal → GP15, servo power → external 5–6V supply, GND common
Enable UART3 on the Pi
Add to /boot/firmware/config.txt:
Reboot. Verify the device appears:
UART Device Tree and tty Layer
Decompile the overlay to see what it does:
The overlay enables a UART peripheral and maps it to specific GPIO pins. On the Pi 4 (BCM2711), there are six UARTs:
| UART | Type | Default Device |
|---|---|---|
| UART0 | PL011 (arm,pl011) |
/dev/ttyAMA0 (Bluetooth by default) |
| UART1 | Mini UART (brcm,bcm2835-aux-uart) |
/dev/ttyS0 (serial console) |
| UART2–5 | PL011 | /dev/ttyAMA1–/dev/ttyAMA4 (disabled by default) |
The uart3 overlay enables UART3 as /dev/ttyAMA3, a PL011 peripheral — the same IP block used in Arm reference designs. PL011 is preferred over the mini UART because it has a larger FIFO (16 bytes vs 8), hardware flow control, and no baud rate dependency on the GPU core clock.
The device path /dev/ttyAMA3 maps through the Linux tty subsystem:
The termios settings configure the serial line: raw mode (no line buffering, no echo, no canonical processing) ensures bytes pass through unchanged. The baud rate must match both sides — 115200 baud means each byte takes ~87 µs (10 bits: start + 8 data + stop).
For custom images: Enable CONFIG_SERIAL_AMBA_PL011=y in your kernel config and add the UART + pinctrl nodes to your device tree.
3. Communication Protocol
A simple binary frame keeps parsing fast and deterministic on both sides.
Frame Format
[0xAA] [type] [len] [payload...] [checksum]
│ │ │ │ └─ XOR of all bytes after 0xAA
│ │ │ └─ len bytes of data
│ │ └─ payload length (0–32)
│ └─ message type
└─ sync byte
Message Types
| Type | Direction | ID | Payload |
|---|---|---|---|
| CONFIG | Pi → Pico | 0x01 |
setpoint (i16), Kp×1000 (u16), Ki×1000 (u16), Kd×1000 (u16) — 8 bytes |
| START | Pi → Pico | 0x02 |
(empty) |
| STOP | Pi → Pico | 0x03 |
(empty) |
| ACK | Pico → Pi | 0x10 |
echoed type (1 byte) |
| TELEMETRY | Pico → Pi | 0x20 |
position_mm (i16), angle_deg×10 (i16), loop_dt_us (u16), timestamp_us (u32) — 10 bytes |
| STATUS | Pico → Pi | 0x21 |
state (u8): 0=idle, 1=running, 2=error |
CONFIG uses ACK (the Pico echoes the type byte back). TELEMETRY is fire-and-forget at 200 Hz — no ACK needed.
Checksum
XOR of all bytes after the sync byte (type + len + payload):
4. Pico Firmware (MicroPython)
Protocol Helpers
Save as protocol.py on the Pico:
SYNC = 0xAA
MSG_CONFIG = 0x01
MSG_START = 0x02
MSG_STOP = 0x03
MSG_ACK = 0x10
MSG_TELEM = 0x20
MSG_STATUS = 0x21
import struct
def checksum(data):
cs = 0
for b in data:
cs ^= b
return cs
def build_frame(msg_type, payload=b""):
body = bytes([msg_type, len(payload)]) + payload
return bytes([SYNC]) + body + bytes([checksum(body)])
def parse_frame(buf):
"""Try to parse one frame from buf. Returns (msg_type, payload, remaining) or None."""
idx = buf.find(bytes([SYNC]))
if idx < 0 or len(buf) < idx + 4:
return None
msg_type = buf[idx + 1]
length = buf[idx + 2]
end = idx + 3 + length + 1 # sync + type + len + payload + checksum
if len(buf) < end:
return None
body = buf[idx + 1 : idx + 3 + length]
cs = buf[end - 1]
if checksum(body) != cs:
return None # bad checksum, skip
payload = buf[idx + 3 : idx + 3 + length]
return (msg_type, payload, buf[end:])
Control Loop
Save as main.py on the Pico:
"""Ball-balancing controller — Pico 2 W, 200 Hz hardware timer."""
import machine, struct, time
from protocol import *
# ── Hardware ──────────────────────────────────────
uart = machine.UART(1, baudrate=115200, tx=machine.Pin(4), rx=machine.Pin(5))
i2c = machine.I2C(0, sda=machine.Pin(0), scl=machine.Pin(1), freq=400_000)
VL53_ADDR = 0x29
servo_pwm = machine.PWM(machine.Pin(15))
servo_pwm.freq(50)
# ── VL53L0X minimal driver ───────────────────────
def vl53_init():
"""Basic init — sets continuous measurement, 20 ms budget."""
i2c.writeto_mem(VL53_ADDR, 0x80, b'\x01')
i2c.writeto_mem(VL53_ADDR, 0x80, b'\x00')
# Set measurement timing budget register
i2c.writeto_mem(VL53_ADDR, 0x01, b'\x00')
# Start continuous mode
i2c.writeto_mem(VL53_ADDR, 0x00, b'\x02')
def vl53_read_mm():
"""Read range in mm. Returns -1 if not ready."""
status = i2c.readfrom_mem(VL53_ADDR, 0x13, 1)[0]
if not (status & 0x01):
return -1
data = i2c.readfrom_mem(VL53_ADDR, 0x14, 2)
dist = (data[0] << 8) | data[1]
i2c.writeto_mem(VL53_ADDR, 0x0B, b'\x01') # clear interrupt
return dist if dist < 2000 else -1
# ── Servo ─────────────────────────────────────────
def set_servo_angle(degrees):
degrees = max(60, min(120, degrees))
pulse_us = 1000 + (degrees / 180.0) * 1000
servo_pwm.duty_ns(int(pulse_us * 1000))
# ── PID state ─────────────────────────────────────
setpoint = 150
Kp, Ki, Kd = 0.08, 0.001, 0.05
integral = 0.0
prev_error = 0.0
BASE_ANGLE = 90.0
running = False
# ── Watchdog: stop servo if no supervisor heartbeat ──
last_supervisor_us = time.ticks_us()
WATCHDOG_TIMEOUT_US = 2_000_000 # 2 seconds
# ── UART receive buffer ──────────────────────────
rx_buf = b""
def handle_commands():
"""Non-blocking read and process supervisor commands."""
global rx_buf, setpoint, Kp, Ki, Kd, running, last_supervisor_us, integral, prev_error
avail = uart.any()
if avail:
rx_buf += uart.read(avail)
while True:
result = parse_frame(rx_buf)
if result is None:
break
msg_type, payload, rx_buf = result
last_supervisor_us = time.ticks_us()
if msg_type == MSG_CONFIG and len(payload) == 8:
sp, kp, ki, kd = struct.unpack(">hHHH", payload)
setpoint = sp
Kp = kp / 1000.0
Ki = ki / 1000.0
Kd = kd / 1000.0
integral = 0.0
prev_error = 0.0
uart.write(build_frame(MSG_ACK, bytes([MSG_CONFIG])))
elif msg_type == MSG_START:
running = True
integral = 0.0
prev_error = 0.0
uart.write(build_frame(MSG_ACK, bytes([MSG_START])))
elif msg_type == MSG_STOP:
running = False
set_servo_angle(90)
uart.write(build_frame(MSG_ACK, bytes([MSG_STOP])))
# ── Control loop (called by hardware timer) ──────
last_tick_us = time.ticks_us()
def control_isr(timer):
global integral, prev_error, last_tick_us, running
now = time.ticks_us()
dt_us = time.ticks_diff(now, last_tick_us)
last_tick_us = now
if not running:
return
# Watchdog check
if time.ticks_diff(now, last_supervisor_us) > WATCHDOG_TIMEOUT_US:
running = False
set_servo_angle(90)
uart.write(build_frame(MSG_STATUS, b'\x02')) # error
return
dist = vl53_read_mm()
if dist < 0:
return # sensor not ready this tick
dt = dt_us / 1_000_000.0
error = setpoint - dist
integral += error * dt
integral = max(-500, min(500, integral))
derivative = (error - prev_error) / dt if dt > 0 else 0
prev_error = error
output = Kp * error + Ki * integral + Kd * derivative
angle = BASE_ANGLE + output
set_servo_angle(angle)
# Send telemetry
payload = struct.pack(">hhHI", dist, int(angle * 10), dt_us, now & 0xFFFFFFFF)
uart.write(build_frame(MSG_TELEM, payload))
# ── Main ──────────────────────────────────────────
vl53_init()
set_servo_angle(90)
timer = machine.Timer()
timer.init(freq=200, mode=machine.Timer.PERIODIC, callback=control_isr)
print("Pico controller ready — waiting for supervisor START")
uart.write(build_frame(MSG_STATUS, b'\x00')) # idle
while True:
handle_commands()
time.sleep_ms(10)
RP2350 Hardware Timer
The RP2350 (Pico 2 W) has a single 64-bit counter running at 1 µs resolution, with 4 independent alarm comparators. machine.Timer() uses one of these alarms to fire the ISR at exactly 5 ms intervals (200 Hz), regardless of what the main loop is doing.
Contrast with time.sleep() in the main loop: MicroPython's garbage collector can pause execution for 1–5 ms unpredictably. The hardware timer fires the ISR even during a GC pause — the control loop is decoupled from the interpreter's memory management.
At 200 Hz the ISR has a 5 ms budget. The VL53L0X I2C read takes ~0.5 ms, PID math ~0.1 ms, UART write ~0.2 ms — leaving ample margin. If you push to 1 kHz (1 ms budget), the I2C read becomes the bottleneck (see Challenge 1).
Warning
The VL53L0X I2C address is 0x29 by default. If you also have an MCP9808 or OLED on the same I2C bus, verify addresses don't conflict with i2c.scan().
5. Linux Supervisor (Python)
Save as ~/ball-balance/supervisor.py on the Pi:
#!/usr/bin/env python3
"""Supervisor for Pico ball-balance controller — UART link."""
import serial, struct, time, csv, sys
# ── Protocol (same constants as Pico side) ────────
SYNC = 0xAA
MSG_CONFIG, MSG_START, MSG_STOP = 0x01, 0x02, 0x03
MSG_ACK, MSG_TELEM, MSG_STATUS = 0x10, 0x20, 0x21
def checksum(data):
cs = 0
for b in data:
cs ^= b
return cs
def build_frame(msg_type, payload=b""):
body = bytes([msg_type, len(payload)]) + payload
return bytes([SYNC]) + body + bytes([checksum(body)])
def parse_frame(buf):
idx = buf.find(bytes([SYNC]))
if idx < 0 or len(buf) < idx + 4:
return None
msg_type = buf[idx + 1]
length = buf[idx + 2]
end = idx + 3 + length + 1
if len(buf) < end:
return None
body = buf[idx + 1 : idx + 3 + length]
cs = buf[end - 1]
if checksum(body) != cs:
return None
payload = buf[idx + 3 : idx + 3 + length]
return (msg_type, payload, buf[end:])
# ── UART setup ────────────────────────────────────
PORT = "/dev/ttyAMA3"
BAUD = 115200
ser = serial.Serial(PORT, BAUD, timeout=0.01)
# ── Send configuration ────────────────────────────
SETPOINT = 150
Kp, Ki, Kd = 0.08, 0.001, 0.05
def send_config():
payload = struct.pack(">hHHH", SETPOINT,
int(Kp * 1000), int(Ki * 1000), int(Kd * 1000))
ser.write(build_frame(MSG_CONFIG, payload))
def send_start():
ser.write(build_frame(MSG_START))
def send_stop():
ser.write(build_frame(MSG_STOP))
# ── Data logging ──────────────────────────────────
LOG_FILE = "/tmp/mcu_balance_log.csv"
log = open(LOG_FILE, "w", newline="")
writer = csv.writer(log)
writer.writerow(["time_s", "position_mm", "angle_deg", "loop_dt_us", "pico_ts_us"])
# ── Main loop ─────────────────────────────────────
print(f"Supervisor starting — port={PORT}, baud={BAUD}")
print(f"Config: setpoint={SETPOINT}mm, PID=({Kp}, {Ki}, {Kd})")
send_config()
time.sleep(0.1)
send_start()
print("START sent — receiving telemetry (Ctrl+C to stop)")
rx_buf = b""
t_start = time.monotonic()
count = 0
dt_sum = 0
dt_max = 0
try:
while True:
data = ser.read(256)
if data:
rx_buf += data
while True:
result = parse_frame(rx_buf)
if result is None:
break
msg_type, payload, rx_buf = result
if msg_type == MSG_TELEM and len(payload) == 10:
pos, angle10, dt_us, ts = struct.unpack(">hhHI", payload)
angle = angle10 / 10.0
elapsed = time.monotonic() - t_start
writer.writerow([f"{elapsed:.3f}", pos, f"{angle:.1f}",
dt_us, ts])
count += 1
dt_sum += dt_us
if dt_us > dt_max:
dt_max = dt_us
if count % 200 == 0:
avg_dt = dt_sum / count
print(f" [{elapsed:7.1f}s] pos={pos:4d}mm "
f"angle={angle:5.1f}° "
f"avg_dt={avg_dt:.0f}µs max_dt={dt_max}µs "
f"n={count}")
elif msg_type == MSG_ACK:
acked = payload[0] if payload else 0
print(f" ACK for 0x{acked:02X}")
elif msg_type == MSG_STATUS:
state = payload[0] if payload else 0xFF
states = {0: "idle", 1: "running", 2: "error"}
print(f" Pico status: {states.get(state, 'unknown')}")
time.sleep(0.001)
except KeyboardInterrupt:
send_stop()
log.close()
avg_dt = dt_sum / count if count else 0
print(f"\nStopped. Received {count} telemetry frames.")
print(f" Average loop dt: {avg_dt:.0f} µs")
print(f" Maximum loop dt: {dt_max} µs")
print(f"Log saved to {LOG_FILE}")
UART Device Path and the tty Subsystem
serial.Serial("/dev/ttyAMA3", 115200) opens the PL011 UART3 peripheral through the Linux tty subsystem. Under the hood, pyserial sets the termios structure to raw mode:
- No canonical processing (no line buffering, no backspace handling)
- No echo (bytes are not sent back)
- No signal characters (Ctrl+C does not generate SIGINT on the serial port)
- VMIN=0, VTIME=1 (non-blocking read with 100 ms timeout)
These settings ensure the binary protocol bytes pass through unchanged. Without raw mode, the tty layer would interpret 0x0A as a newline and 0x03 as Ctrl+C — corrupting the protocol.
The baud rate (115200) must match both sides. At 115200 baud with 8N1 framing, each byte takes ~87 µs. A 14-byte telemetry frame takes ~1.2 ms to transmit — negligible compared to the 5 ms control period.
For custom images: The PL011 driver needs CONFIG_SERIAL_AMBA_PL011=y and the UART3 node enabled in the device tree.
6. Deploy and Run
Flash the Pico
Install mpremote if needed:
Copy files to the Pico:
Wire Everything
- UART: Pi GPIO4 → Pico GP5, Pi GPIO5 → Pico GP4, common GND
- VL53L0X: SDA → Pico GP0, SCL → Pico GP1, 3V3, GND
- Servo: signal → Pico GP15, power from external 5–6V supply, GND common with Pico
Start the Supervisor
You should see:
Supervisor starting — port=/dev/ttyAMA3, baud=115200
Config: setpoint=150mm, PID=(0.08, 0.001, 0.05)
ACK for 0x01
START sent — receiving telemetry (Ctrl+C to stop)
ACK for 0x02
[ 1.0s] pos= 148mm angle= 90.2° avg_dt=5001µs max_dt=5023µs n=200
Checkpoint
The Pico reports telemetry at ~200 Hz, the ball balances, and the Pi logs data. The loop_dt_us should be very close to 5000 µs (5 ms) with minimal variation.
7. Three-Way RT Comparison
This is the core experiment. Run the same ball-balancing task with all three approaches and measure timing:
- A: Standard Linux — the Python PID from 1D Ball Balancing on a standard kernel
- B: PREEMPT_RT — the same Python PID on an RT kernel
- C: Pico 2 W MCU — this tutorial
Normal Conditions
| Metric | A: Standard Linux | B: PREEMPT_RT | C: Pico 2 W MCU |
|---|---|---|---|
| Target loop rate (Hz) | 50 | 50 | 200 |
| Average loop time | _ | _ | _ |
| Maximum loop time | _ | _ | _ |
| Jitter (max − min) | _ | _ | _ |
| Outliers (>2× target) | _ | _ | _ |
| Ball stability (1–5) | _ | _ | _ |
Stress Test
Run stress-ng --cpu 4 --timeout 60s on the Pi during each approach. Approach C should be unaffected — the control loop runs on the Pico, not the Pi.
| Metric (under stress) | A: Standard | B: PREEMPT_RT | C: Pico MCU |
|---|---|---|---|
| Maximum loop time | _ | _ | _ |
| Jitter | _ | _ | _ |
| Ball drops? | _ | _ | _ |
Analyze the MCU Log
python3 -c "
import csv
data = list(csv.DictReader(open('/tmp/mcu_balance_log.csv')))
dts = [int(r['loop_dt_us']) for r in data]
avg = sum(dts) / len(dts)
mx, mn = max(dts), min(dts)
outliers = sum(1 for d in dts if d > 10000) # >2x target
print(f'MCU loop timing ({len(dts)} samples):')
print(f' Average: {avg:.0f} us')
print(f' Min: {mn} us')
print(f' Max: {mx} us')
print(f' Jitter: {mx - mn} us')
print(f' Outliers (>10ms): {outliers}')
"
Checkpoint
The Pico's timing should be nearly constant (~5000 µs ± a few µs), unaffected by stress-ng on the Pi. Standard Linux will show significant jitter increase under load. PREEMPT_RT will be better than standard but still affected.
8. UART vs WiFi Latency (Optional)
The Pico 2 W has WiFi. What if you replace the UART supervisor link with UDP over WiFi?
The control loop itself still runs on the Pico — unaffected. But supervisor commands (CONFIG, START, STOP) now travel over the wireless stack, and telemetry arrives at the Pi with additional delay.
Test Procedure
- Connect the Pico 2 W to the same WiFi network as the Pi
- Modify the supervisor to use UDP sockets instead of serial
- Measure command-response round-trip: send CONFIG with a timestamp, Pico echoes it back
Round-Trip Latency
| Metric | UART (115200) | WiFi UDP |
|---|---|---|
| Average round-trip | _ | _ |
| Maximum round-trip | _ | _ |
| Jitter | _ | _ |
Expected results:
- UART at 115200: ~0.5–1 ms per message (physical wire, no stack)
- WiFi UDP: ~5–20 ms per message (wireless association, TCP/IP stack, buffering)
The control loop timing on the Pico is identical in both cases — only the supervisor link changes. This demonstrates why wired communication matters for control, and why offloading the loop to the MCU makes the link latency irrelevant for control quality.
What Just Happened?
┌─────────────────────────────────────┐
│ Pi 4 (Linux Supervisor) │
│ - Sends CONFIG (setpoint, PID) │
│ - Receives TELEMETRY (200 Hz) │
│ - Logs to CSV │
│ - Displays status │
│ - Can run stress-ng without │
│ affecting control quality │
└──────────────┬──────────────────────┘
│ UART (115200, ~1ms)
┌──────────────┴──────────────────────┐
│ Pico 2 W (RT Controller) │
│ - Hardware timer ISR at 200 Hz │
│ - Reads VL53L0X (I2C) │
│ - Computes PID │
│ - Drives servo (PWM) │
│ - Sends telemetry frame │
│ - Watchdog: stops if supervisor │
│ disappears for 2 seconds │
└─────────────────────────────────────┘
Key insight: decoupling the real-time control from Linux means the Pi's CPU load, kernel updates, Python garbage collection, and display rendering cannot affect the control loop. The MCU runs on a hardware timer interrupt — deterministic to microsecond precision.
This is the same pattern used everywhere hard real-time meets complex software:
- Industrial PLC — RTOS controller + SCADA workstation
- Automotive ECU — Cortex-M actuator controller + Cortex-A infotainment
- Drone flight controller — STM32 running PX4 + Linux companion for navigation
Challenges
Challenge 1: Increase to 1 kHz
Change the Pico timer to freq=1000 (1 ms period). Does MicroPython keep up? Measure the actual loop_dt_us — if it exceeds 1000 µs, which operation is the bottleneck? (Hint: the VL53L0X measurement timing budget may need adjustment, and I2C at 400 kHz takes ~0.5 ms per read.)
Challenge 2: C Firmware
Rewrite the Pico control loop in C using the Pico SDK. Compare MicroPython vs C loop jitter on the Pico itself. The C version should achieve consistent sub-microsecond jitter at 1 kHz.
Challenge 3: SPI Shared Memory Bridge
Instead of UART, connect the Pico to the Pi via SPI with the Pico as SPI peripheral. Implement a shared register interface — the Pi reads/writes registers mapped to setpoint, gains, and telemetry. How much does latency improve compared to UART?
Deliverable
- [ ] Pico 2 W running PID control loop with UART telemetry
- [ ] Linux supervisor receiving and logging telemetry
- [ ] Three-way comparison table filled in (standard, RT, MCU)
- [ ] Stress test table showing MCU approach is unaffected by Pi load
- [ ] Brief note: when would you choose each approach?