Lecture 05: Engineering Practice
Obuda University -- Embedded Systems
Week 9 | Labs completed: State Machines, HW Abstraction
Your Learning Arc (Lectures)
- Lecture 1 (Week 1): What is Embedded?
- Lecture 2 (Week 3): Sensors, Signals & Actuators
- Lecture 3 (Week 5): Actuators & Control
- Lecture 4 (Week 7): Software Architecture
- Lecture 5 (Week 9): Engineering Practice -- you are here
- Lecture 6 (Week 11): Course Synthesis & Demo Prep
Learning Arc (Labs)
- Foundation (Weeks 1--2): What is Embedded?, GPIO, Sensors ✓
- Sensing & Signals (Weeks 3--4): ADC, I2C, PWM basics, Ultrasonic ✓
- Movement & Control (Weeks 5--6): Motor physics, P-control, IMU ✓
- Software (Weeks 7--8): State Machines, Abstraction ✓
- Engineering (Weeks 9--10): Software Engineering, Integration ← you are here
- Synthesis (Weeks 11--12): Course Synthesis, Demo
Recap: Labs 7--8
| Lab | What You Did | What You Discovered |
|---|---|---|
| Lab 7 | Implemented state machines (IDLE, FOLLOW, AVOID, SEARCH) | One variable replaces five booleans; transitions are explicit |
| Lab 8 | Opened picobot library source; traced abstraction layers; measured call overhead | Libraries are structured code with deliberate design decisions about what to expose, hide, and organize |
Core discovery: The library is just organized code. YOUR code needs the same organization.
The Open Question
You now understand how the picobot library is organized internally.
But what about YOUR code?
Your main.py has been growing week by week. Motor control, sensor reading, control logic, display updates, and configuration values -- all mixed together.
How do professional embedded engineers structure software that needs to grow and be maintained?
The Narrative
From working code to professional code to working systems.
Your code works → but it's messy → DRY/SRP clean it up → config.py centralizes constants → defensive coding handles failure → testing proves it works → but wait, everything worked alone, why does it break together? → integration challenges → timing budget → failure modes → watchdog → edge case testing.
The engineering practices that make the difference between a demo that works once and a product that works every time.
What You Already Know (From Other Courses)
| From... | You already know... | Today we use it for... |
|---|---|---|
| Programming | Functions, modules, DRY principle | Code organization, config.py, SRP |
| Programming | Debugging, testing, version control | Embedded testing strategies, git for firmware |
| Control Theory | System modeling, simulation before hardware | Test-driven approach: simulate → test → deploy |
| Electronics | Component tolerances, failure modes | Defensive coding, sensor validation, watchdogs |
| Lectures 1--4 | GPIO, ADC, PWM, P-control, state machines, abstraction | The building blocks you now need to organize professionally |
Today's Map
Part 1 -- Writing Better Code 1. DRY & SRP -- Code Organization 2. config.py & Defensive Coding 3. Testing 4. Version Control
Part 2 -- Making It All Work Together 5. The Integration Challenge 6. Failure Modes & Recovery 7. Real Failure Case Studies 8. Testing Strategies 9. Exercises
Part 1
DRY, SRP, and Code Organization
Why Software Engineering Matters in Embedded
In embedded systems, bad software architecture is not just ugly -- it's dangerous. A tangled codebase means you can't confidently change one thing without breaking another. In production, that means bugs that only appear in the field, recalls that cost millions, and safety incidents.
The software patterns we cover today -- DRY, SRP, separation of concerns -- are not academic rules. They're the minimum standard for shipping code that other engineers can read, test, and maintain. MISRA C (the automotive coding standard) has 175 rules. NASA's Power of 10 has strict rules about function length and global variables.
These exist because people died when software was "good enough."
DRY: Don't Repeat Yourself
Here's a pattern I see in every student project around Week 7. The same PWM clamping formula appears in three different functions. It works fine -- until you fix a bug in one copy and forget the other two. This is called a DRY violation, and it's the single most common source of "but I already fixed that" bugs in professional codebases.
The rule is simple: if you write the same logic twice, extract it into a function.
Bad -- same motor control logic copy-pasted:
# In line_follow():
pwm_left = max(0, min(65535, int(speed_l * 257)))
motor_left.duty_u16(pwm_left)
# In obstacle_avoid():
pwm_left = max(0, min(65535, int(speed_l * 257))) # Same code again!
motor_left.duty_u16(pwm_left)
# In turn_90():
pwm_left = max(0, min(65535, int(speed_l * 257))) # And again!
DRY: The Fix
Good -- one function, called from everywhere:
def set_motor_speed(left, right):
"""Set motor speeds with clamping and scaling."""
pwm_l = max(0, min(65535, int(left * 257)))
pwm_r = max(0, min(65535, int(right * 257)))
motor_left.duty_u16(pwm_l)
motor_right.duty_u16(pwm_r)
# Now everywhere just calls:
set_motor_speed(speed_l, speed_r)
Bug in clamping logic? Fix it in one place, not five. Change PWM scaling? Change it once. Every copy-paste is a future bug.
DRY Violation: A Real Robot Example
Spot the problem -- the same conversion logic appears in two separate places:
# DRY violation in a real robot project:
# File: main.py
if line_position < -0.5:
left_motor.duty_u16(int(65535 * 0.3))
right_motor.duty_u16(int(65535 * 0.8))
# ... 50 lines later ...
if obstacle_cleared:
left_motor.duty_u16(int(65535 * 0.3)) # Same conversion!
right_motor.duty_u16(int(65535 * 0.8)) # Copy-pasted!
What happens when you want to change 0.3 to 0.4?
You change it in one place and forget the other. Guaranteed.
SRP: Single Responsibility Principle
SRP says each module should have one reason to change. But here's where judgment matters: you can take this too far. I've seen codebases with 200 tiny files where you can't understand any single behavior without opening fifteen modules. The art is finding the right granularity. For your robot: sensor reading, motor control, control logic, and configuration are four natural boundaries. That's probably enough. Don't create a utils.py with one function in it.
Each module does one thing. Not "one thing and also this other thing because it was convenient."
| Module | Does | Does NOT |
|---|---|---|
sensors.py |
Read and normalize sensor values | Control motors |
motors.py |
Set motor speeds, handle deadband | Read sensors |
controller.py |
Compute corrections from error | Talk to hardware |
main.py |
Orchestrate the overall flow | Contain low-level I/O |
When the line sensor stops working, you know exactly where to look: sensors.py.
When motors behave strangely, you open motors.py.
Responsibility isolation = fast debugging.
The SRP Test
Can you describe what a module does in one sentence without using the word "and"?
- "sensors.py reads and normalizes sensor data" -- borderline OK (related tasks)
- "sensors.py reads sensors and controls motors" -- SRP violation
If you need "and" to describe a module, it has too many responsibilities.
SRP Violation: A Concrete Example
This function does TWO things -- and belongs in two different modules:
# SRP violation: sensors.py that also controls motors
def read_and_react(sensors, motors): # Does TWO things!
pos = sensors.read_line()
if pos < 0:
motors.turn_right() # Sensor module controls motors!?
The fix: sensors.py reads data. main.py decides what to do with it.
# sensors.py -- only reads
def read_line_position(sensors):
return sensors.read_line()
# main.py -- decides what to do
pos = read_line_position(sensors)
if pos < 0:
motors.turn_right()
Project Structure for Embedded
As your code grows beyond a single file, use a consistent structure:
project/
+-- main.py # Main loop, state machine
+-- config.py # All tunable parameters
+-- lib/
| +-- sensors.py # Sensor reading module
| +-- motors.py # Motor control module
| +-- display.py # OLED display module
+-- data/ # Logged data, calibration
+-- README.md # What, why, how
Each file has a clear role. A new team member can understand the layout in seconds.
This is the structure you will implement in Lab 09.
The config.py Pattern and Defensive Coding
The Problem: Magic Numbers
Every embedded project accumulates constants scattered throughout the code with no explanation.
Bad -- magic numbers everywhere:
Six months from now, no one knows where these values came from.
The config.py Pattern
Good -- all constants in one place with documented reasoning:
# config.py -- All tunable parameters in one place
# === Timing (from Lab 05 measurements) ===
CONTROL_PERIOD_MS = 10 # 100 Hz, based on line crossing analysis
ULTRASONIC_PERIOD_MS = 100 # 10 Hz, limits blocking impact
# === Motor (from Lab 02 characterization) ===
MOTOR_DEADBAND = 80 # PWM below this = no movement
BASE_SPEED = 150 # Comfortable cruising speed
MAX_SPEED = 255 # Absolute maximum PWM value
# === Control (from Lab 06 tuning) ===
KP = 45 # Tuned 2024-01-15, see data/kp_tuning.csv
KD = 5 # Derivative gain, reduces overshoot
Each constant explains where the value came from.
Using config.py in Code

Clean import -- your main loop reads like a specification:
from config import CONTROL_PERIOD_MS, BASE_SPEED, KP
while True:
if time.ticks_diff(now, last) >= CONTROL_PERIOD_MS:
correction = KP * error
set_motor_speed(
BASE_SPEED + correction,
BASE_SPEED - correction
)
- Want to change the speed? Edit
config.py. - Want to retune PID? Edit
config.py. - Want to see all tunable values at once? Open
config.py.
Benefits of Centralized Configuration
Why config.py is worth the effort:
- Change in ONE place -- update a pin number, speed, or threshold once, not across 5 files
- Tune without searching -- all Kp, deadband, and timing values in one spot
- New team member reads config.py -- understands the entire setup in minutes
- Version control shows parameter history -- a diff on config.py tells you exactly what was tuned and when
# config.py -- single source of truth
PIN_MOTOR_LEFT_PWM = 10
PIN_MOTOR_LEFT_DIR = 12
KP = 0.35
BASE_SPEED = 60
LOOP_PERIOD_MS = 10
If a value might change, it belongs in config.py. If it never changes, it still belongs there for documentation.
Defensive Coding: Validate Sensor Readings
Embedded systems do not have a user to click "OK" on an error dialog. Your code must handle problems gracefully.
Sanity-check every sensor read:
def read_distance():
"""Read ultrasonic distance with sanity check."""
raw = ultrasonic.distance_cm()
if raw is None or raw < 0 or raw > 400:
return None # Invalid reading
return raw
Never trust raw sensor data. The ultrasonic can return
Noneon timeout. The ADC can spike on noise. The IMU can report garbage during impact.
Defensive Coding: Last-Known-Good Pattern
When a sensor fails, use the last valid reading instead of crashing:
last_good_distance = 100 # Safe default
distance = read_distance()
if distance is not None:
last_good_distance = distance
else:
distance = last_good_distance # Stale but sane
The robot keeps running with stale-but-safe data rather than crashing or passing None to motor control.
Defensive Coding: Clamp Outputs
Never send invalid values to hardware:
def safe_motor_speed(speed):
"""Clamp to valid range, handle edge cases."""
if speed is None:
return 0 # Safe default
return max(0, min(255, int(speed)))
Three lines of defense: 1. Validate inputs -- catch bad sensor data early 2. Use fallbacks -- last-known-good when sensors fail 3. Clamp outputs -- never exceed hardware limits
Defensive Coding: Safe Sensor Reading with Caching
Cache last valid values so you never feed garbage to your controller:
last_valid = [0] * 5 # cache last good values
def safe_read_sensors():
raw = read_line_sensors()
for i in range(5):
if 0 < raw[i] < 4095: # valid range
last_valid[i] = raw[i]
# else: keep last valid value
return last_valid
If one sensor returns garbage from electrical noise, the cached value keeps the robot on track.
Defensive Coding: Line-Following Recovery
When the line is lost, do not stop -- use the last known direction:
def follow_line():
error = get_line_error()
if error is None: # lost the line!
# Don't stop -- use last known direction
if last_error > 0:
turn_right(recovery_speed)
else:
turn_left(recovery_speed)
else:
last_error = error
apply_correction(error)
Never trust raw input blindly. Always have a fallback. Whether it is a cached sensor value, a recovery maneuver, or a safe default -- your code must have a plan for when reality does not match expectations.
Code Comments: Explain WHY, Not WHAT
The code already tells the reader WHAT it does. Comments explain WHY.
| Bad Comment | Good Comment |
|---|---|
PERIOD = 10 # Set period to 10 |
PERIOD_MS = 10 # 100 Hz -- must be >10x line crossing time |
speed = 0 # Set speed to zero |
speed = 0 # Safe state: stop if sensor returns None |
x = x + 1 # Increment x |
retry_count += 1 # Retry up to 3x before declaring sensor dead |
If you need a comment to explain WHAT a line does, consider renaming the variable instead.
Testing
Test Levels for Embedded
In industry, embedded testing is a whole discipline. Unit tests run on the host PC (no hardware needed). Hardware-in-the-loop (HIL) tests run real code against simulated sensors. System tests run on the actual product. For a car ECU, there are often more lines of test code than product code.
Every hour of testing saves ten hours of debugging. Three natural test levels:
| Level | What You Test | Example | How |
|---|---|---|---|
| Unit | Single function in isolation | test_sensor_normalize() |
Call function, check output |
| Integration | Components working together | Sensor + Controller + Motor | Run subsystem, verify behavior |
| System | Full mission on real hardware | Complete lap on the track | Run the robot, measure performance |
Most tests should be unit tests -- fast, isolated, and they tell you exactly what broke.
The Testing Pyramid
Most of your tests should be at the bottom -- fast and specific:
/\
/ \
/ Sys \ System: "Robot completes 10 laps"
/ tem \
/----------\
/ Integration\ Integration: "Sensors + Motors
/ \ work without timing conflicts"
/----------------\
/ Unit Tests \ Unit: "normalize() maps
/ \ 5000 -> 0, 50000 -> 100"
/______________________\
Unit tests are fast, isolated, and tell you exactly what broke. System tests tell you if it all works together -- but not where it fails.
A Simple Test Script Pattern
You do not need a testing framework. A simple script is enough:
# test_motors.py -- Run on the Pico to verify motor behavior
from motors import set_motor_speed
from config import MOTOR_DEADBAND
import time
def test_motor_deadband():
"""Verify motor starts moving at expected PWM."""
print("=== Motor Deadband Test ===")
for pwm in range(0, 100, 5):
set_motor_speed(pwm, pwm)
time.sleep(0.5)
moving = is_robot_moving() # Check via IMU
print(f" PWM={pwm:3d}: {'MOVING' if moving else 'stopped'}")
set_motor_speed(0, 0)
print("=== Done ===\n")
Test scripts live in your project. They are not throwaway code.
Boundary and Edge Case Testing
The most valuable tests check boundaries -- where things go wrong:
def test_safe_motor_speed():
"""Verify clamping at boundaries."""
assert safe_motor_speed(0) == 0 # Minimum
assert safe_motor_speed(255) == 255 # Maximum
assert safe_motor_speed(-10) == 0 # Below minimum
assert safe_motor_speed(300) == 255 # Above maximum
assert safe_motor_speed(None) == 0 # Invalid input
assert safe_motor_speed(3.7) == 3 # Float truncation
print("safe_motor_speed: ALL PASSED")
If it passes with normal inputs but fails at the edges, the bug is just waiting for a bad sensor read.
Sample Size Matters
How many trials do you need when testing your robot?
| Trials | What You Have |
|---|---|
| 1 | An anecdote -- proves nothing |
| 3 | A maybe -- could be luck |
| 5+ | Data you can start to trust |
| 10+ | Solid statistics with confidence |
Why? Variance decreases with more samples.
The more you run, the closer your measured mean gets to the true performance.
"It worked once" is not evidence. "It worked 10 times with these measurements" is.
Visualizing Your Data
Good presentation vs bad presentation:
BAD: "Kp = 0.5 went faster."
GOOD:
Kp = 0.3: |||||||||||||||||||||||||||||| 28.6s (std = 0.6)
Kp = 0.5: ||||||||||||||||||||||||||||| 27.3s (std = 2.1)
The bar chart tells you: - Kp = 0.5 is slightly faster on average - Kp = 0.5 has 3.5x more variation
Numbers without context mislead. Always show spread, not just averages.
Version Control
Git for Embedded Projects
Version control is not optional for engineering work.
| Practice | Why |
|---|---|
| Commit messages reference measurements | "Tune Kp to 45 based on step response" |
| Tag working configurations | git tag v1.0-line-follow-working |
| Never commit credentials or secrets | WiFi passwords, API keys stay out |
| Commit config.py changes separately | Easy to revert parameter changes |
Git Workflow for Embedded Projects
Hardware state cannot be rolled back -- but your software state can.
- Commit working states -- after each successful test, commit. "Working line follow at Kp=0.35" is a valuable checkpoint.
- Branch for experiments -- try a new control algorithm on a branch. If it fails, return to the working state.
- Never commit broken code to main -- main should always be a working robot.
- Commit messages that help future you -- "Increased Kp from 0.3 to 0.5, oscillates at speed > 70" is useful. "Fixed stuff" is not.
Think of Git as your engineering notebook for code. Every commit is a dated entry. Main is your "known good." Branches are experiments. Tags are milestones.
Good vs Bad Commit Messages
Good -- specific, references data, explains WHY:
Tune Kp to 45 based on step response (see data/kp.csv)
Refactor motor control into motors.py (SRP)
Fix off-by-one in sensor normalization (white=0, not 1)
Bad -- vague, no context, useless to future-you:
A vague commit message is a message to your future self that says "figure it out yourself."
The Basic Workflow
Workflow loop: Edit → Test → git add → git commit → Push → repeat. If tests fail, loop back to Edit immediately.
- Make a change
- Run your test script on the Pico
- If it passes, stage and commit with a meaningful message
- Push when you have a stable version
Do not commit broken code. Do not push untested code.
What Belongs in Version Control
| Include | Exclude |
|---|---|
Source code (.py) |
Large data files (.csv > 1 MB) |
Configuration (config.py) |
Credentials, passwords, API keys |
| Test scripts | Compiled binaries (.mpy, .uf2) |
Documentation (README.md) |
IDE-specific settings |
Use .gitignore to prevent accidental commits:
Part 1 Summary
Five Principles for Better Code
-
DRY -- Don't Repeat Yourself. Every duplicated block is a future bug. Extract into a function, call from everywhere.
-
SRP -- Single Responsibility. Each module does one thing. Describe it in one sentence without "and."
-
config.py centralizes constants with documented reasoning -- no more magic numbers scattered across files.
-
Defensive coding assumes sensors will fail -- validate inputs, use last-known-good fallbacks, clamp outputs.
-
Test and commit. Unit tests catch bugs before the robot hits a wall. Git commit messages are documentation for your future self.
Part 2
Making It All Work Together
The Transition
Each piece works alone. Sensors read correctly. Motors respond to commands. State machines switch cleanly. Modules are organized.
But put them together -- timing conflicts, resource contention, unexpected interactions.
The challenge is not the parts. It is the interactions between parts.
The Integration Challenge
Components Work Alone. Systems Fail Together.

You have tested each subsystem individually. Sensors read correctly. Motors respond. State machine works. Display shows data.
Now connect them all -- and watch things break.
Individual Testing: Integration Testing:
[ok] Sensors work ? Sensors + Motors together
[ok] Motors work ? Timing conflicts
[ok] Display works ? Resource contention
[ok] State machine works ? Edge cases between subsystems
This is not a failure of your code. This is the nature of integration.
The Integration Iceberg
Ask any engineer what the hardest part of a project is, and they'll say integration. Not because the pieces are hard -- but because the interactions between pieces are unpredictable. This is true at every scale: chip, board, system, system-of-systems. The skill you're practicing this week is the most valuable and hardest-to-teach skill in engineering.
When you demo your robot, you show the happy path.
Iceberg: above the waterline -- the happy path you demo. Below -- timing conflicts, resource contention, sensor noise, edge cases, race conditions, power issues, thermal drift, mechanical wear. The hidden mass is 4x larger.
Everything below the waterline is invisible until you systematically test under stress.
The Symptom vs The Cause
One visible symptom, multiple hidden causes:
What you see: "The robot oscillates on curves"
─────────────────────────────────
What's actually | Sensor readings are noisy |
wrong: | Motor response is nonlinear |
| Loop timing varies 8-15 ms |
| Battery voltage dropped 0.3V |
| One wheel has more friction |
The symptom (oscillation) is visible. The causes are hidden beneath the surface.
Integration debugging = looking below the waterline -- measuring timing, logging sensors, checking power levels.
Timing Conflicts
The ultrasonic sensor blocks for up to 25 ms. During that time, the line-following control loop does not run. The robot drifts off the line.
# This "works" in isolation but kills line-following
def check_obstacle():
distance = read_ultrasonic() # Blocks 0-25 ms!
return distance < 15
Not because the line-follower is broken -- it was starved of CPU time.
What if the OLED update (10-50 ms) overlaps with a sensor read?
Resource Contention
Multiple devices sharing a bus:
Bus diagram: OLED, IMU, and Sensor all connected to the same I2C SDA/SCL bus. Simultaneous access = contention.
If you update the OLED display while the IMU is being read, one gets corrupted data -- or the I2C bus hangs entirely.
Shared resources need coordination. Schedule access so only one device uses the bus at a time.
Unexpected Physical Interactions
These are not software bugs. They are system-level interactions:
- Motor PWM creates electrical noise that corrupts ADC readings
- Vibration from motors causes gyro readings to spike
- IR from the sun overwhelms the IR line sensor
- Battery voltage drops under motor load, affecting sensor thresholds
They only appear when everything runs together.
When something breaks during integration, ask: "What interaction caused this?" The bug is usually between modules, not inside them.
Integration Testing Methodology
When combining subsystems, do NOT add everything at once. Follow this process:
Step 1: Test each component alone (unit test)
[Sensors] ok [Motors] ok [Display] ok [IMU] ok
Step 2: Add components ONE at a time
[Sensors + Motors] --> test
Step 3: After each addition, run ALL previous tests
[Sensors + Motors + Display] --> test sensors, test motors, test display
Step 4: If something breaks, the LAST addition caused it
[Sensors + Motors + Display + IMU] --> Motors jitter?
--> IMU is the suspect. Investigate the interaction.
Incremental integration gives you a bisection strategy for free.
Why Incremental Integration Works
Adding all subsystems at once:
Everything added at once --> 5 things broke
Which component caused which problem? Unknown.
Debugging time: hours
Adding subsystems one at a time:
+ Sensors --> OK
+ Motors --> OK
+ Display --> Motors jitter (FOUND IT)
+ IMU --> OK
+ Ultrasonic --> OK
Debugging time: minutes
Each step isolates exactly one variable. This is the scientific method applied to engineering.
Detailed Timing Budget Analysis

Control loop target: 10 ms (100 Hz)
| Operation | Time (us) | Cumulative | % Budget |
|---|---|---|---|
| Line sensors (x4) | 20 | 20 | 0.2% |
| IMU read (I2C) | 800 | 820 | 8.2% |
| PID calculation | 50 | 870 | 8.7% |
| Motor update | 100 | 970 | 9.7% |
| State machine | 30 | 1000 | 10.0% |
| Ultrasonic (NB) | 50 | 1050 | 10.5% |
| OLED (1 in 50) | 600 | 1650 | 16.5% |
| MARGIN | 8350 | 83.5% |
A healthy system uses less than 50% of its timing budget. Margin absorbs worst-case spikes, garbage collection pauses, and future features.
What Eats Your Timing Budget
Not all operations cost the same. I/O dominates:
Fast (< 100 us):
+-- GPIO read (line sensors) ~1 us each
+-- PID calculation ~50 us
+-- State machine logic ~30 us
+-- GPIO write (motor PWM) ~10 us
Slow (100-1000 us):
+-- I2C IMU read ~800 us
+-- I2C OLED partial update ~600 us
+-- Non-blocking ultrasonic ~50 us (just check, no wait)
Dangerous (> 1000 us):
+-- Blocking ultrasonic read ~25,000 us !!
+-- Full OLED screen refresh ~45,000 us !!
+-- print() over USB serial ~10,000 us !!
If it talks to a bus or a wire, measure it. Assumptions about timing are usually wrong.
Measuring Your Timing Budget
One of the most important integration tests: does everything fit?
while True:
t_start = time.ticks_us()
read_line_sensors()
read_ultrasonic_nb()
compute_pid()
update_motors()
check_state_transitions()
update_display_if_due()
t_total = time.ticks_diff(time.ticks_us(), t_start)
if t_total > LOOP_PERIOD_US:
print(f"WARNING: loop {t_total} us (budget: {LOOP_PERIOD_US})")
If your loop takes 15 ms but your PID assumes 10 ms, your gains are effectively wrong. The robot oscillates, and you waste time retuning when the real problem is a timing violation.
Failure Modes and Recovery
Common Failure Modes
A robust system is not one that never fails -- it is one that fails gracefully.
| Failure | Cause | Detection | Recovery |
|---|---|---|---|
| Line lost | Sharp turn, noise, gap | No sensors see line | Slow down, search pattern |
| Collision | Sensor fail, too close | IMU impact spike | Stop, reverse, reassess |
| Stuck | Obstacle, wheel trapped | No progress despite motors on | Turn, try alternate path |
| Drift | Motor mismatch, wheel slip | Gyro heading diverges | Gyro-assisted correction |
| Battery low | Normal use over time | ADC voltage drops | Warn user, reduce speed |
Defensive Coding Recap
Remember the defensive patterns from Part 1 -- validate, fallback, clamp -- they become critical during integration. Every sensor read should be validated. Every output should be clamped. Every failure should have a safe fallback.
During integration, these patterns catch the interaction bugs that never appear in unit testing.
The Watchdog Concept
Every commercial embedded product ships with a watchdog. Not because the engineers expect their code to crash, but because they know it will. Cosmic rays flip bits. Power glitches corrupt RAM. Timing races cause deadlocks. The watchdog is your last line of defense.
What if your main loop itself hangs? A try/except cannot catch an infinite loop or a deadlocked I2C bus.
from machine import WDT
# Enable watchdog with 5-second timeout
wdt = WDT(timeout=5000)
while True:
read_sensors()
update_motors()
check_state()
wdt.feed() # "I'm still alive"
# If this loop hangs for > 5 seconds,
# hardware automatically resets the Pico
A hung robot that resets and recovers is far better than one that stays frozen.
Watchdog Timer: Key Points
- Enable the watchdog AFTER initialization -- startup may take longer than the timeout
- Feed it in the main loop -- if the loop stalls, the system resets automatically
- Standard practice in production -- every shipping embedded product uses a watchdog
- Better to reset and restart than stuck in unknown state -- a reset takes milliseconds; a stuck robot stays stuck forever
Normal operation: Hung system:
Loop runs Loop stuck
wdt.feed() (no feed)
Loop runs (no feed)
wdt.feed() TIMEOUT --> RESET
Loop runs System reboots and recovers
A try/except cannot catch an infinite loop or a deadlocked I2C bus. The watchdog is your last line of defense.
Real Failure Case Studies
Case 1: Motor PWM Noise Corrupts ADC
Symptom: Line sensor readings jump randomly when motors run at high speed. Robot follows the line when motors are slow but loses it at full speed.
Root cause: Motor PWM switching creates electromagnetic noise on the power rail. The ADC picks up this noise as false readings.
Mitigations:
Hardware fix:
+-- Add 100 nF capacitor across motor terminals
+-- Route sensor wires away from motor wires
Software fix:
+-- Read ADC multiple times and average
+-- Read ADC during PWM "off" phase (brief motor silence)
+-- Reject readings that change by > 30% in one cycle
When software filtering is not enough, the fix is in hardware. Integration problems often cross the hardware/software boundary.
Case 2: I2C Bus Hangs
Symptom: Robot freezes after 2-5 minutes of operation. The OLED display shows the last frame. Motors stay at their last speed. Only a power cycle recovers.
Root cause: OLED and IMU share the I2C bus. Occasionally, their access overlaps, and the bus enters an invalid state. The SDA line is held low, blocking all further communication.
Mitigations:
# Schedule access -- never overlap
if loop_count % 50 == 0:
update_oled() # Only every 50th loop
else:
read_imu() # All other loops
# Add timeout + bus recovery
try:
data = i2c.readfrom(IMU_ADDR, 6)
except OSError:
recover_i2c_bus() # Toggle SCL to release SDA
Case 3: Works in Lab, Fails in Competition
Symptom: Robot follows the line perfectly on the lab bench. At the competition, it loses the line within seconds.
Root cause: Not one thing -- multiple environment differences:
Lab conditions: Competition conditions:
+-- Fluorescent lighting +-- Stage lighting (different IR)
+-- White paper track +-- Glossy poster track (reflections)
+-- Full battery +-- Battery at 70% after practice runs
+-- Room temperature 22C +-- Venue temperature 28C (sensor drift)
+-- Flat table +-- Slightly uneven surface
Mitigation: Test under varied conditions BEFORE the event.
- Different lighting (cover windows, use flashlight)
- Different surfaces (tape on cardboard, on wood, on plastic)
- Low battery (run until it fails, note the voltage)
Testing Strategies
Define Success Before You Test
The most common testing mistake: running the robot and declaring "it works" based on a feeling.
| Requirement | Criterion | How to Measure |
|---|---|---|
| Line following | < 5 line losses per lap | Count losses (sensor log) |
| Obstacle detection | Stop within 10 cm | Measure with ruler |
| Lap time | < 30 seconds | Stopwatch or timestamp log |
| Consistency | Std dev < 10% of mean | Multiple timed runs |
| Battery resilience | Works at 3.3V supply | Test with drained battery |
Write down what "pass" means before you power on the robot.
Edge Case Testing
The happy path is not enough. Plan for the unhappy paths:
test_cases = [
("Line lost briefly", "recover within 2 seconds"),
("Line lost permanently", "stop after timeout"),
("Obstacle appears suddenly", "stop within 10 cm"),
("Button pressed during turn", "stop immediately"),
("Low battery (3.3V)", "reduce speed, warn user"),
("Sensor returns garbage", "use last known good value"),
("I2C bus hangs", "timeout and recover"),
("Two failures at once", "enter safe state"),
]
How do you test "line lost"? Remove the tape. Pick up the robot. Cover the sensors. These are your edge case tools.
Edge Case Test Matrix
Beyond the happy path -- systematic boundary conditions that reveal hidden problems:
| Condition | What to Test | Why It Matters |
|---|---|---|
| Low battery | Run at 6.5V instead of 7.4V | Motor response changes, sensors affected |
| Sharp turn | 90-degree corner at full speed | Tests sensor-to-motor latency |
| Line gap | 5 cm break in the line | Tests recovery behavior |
| Surface change | Matte to glossy surface | Sensor calibration may be wrong |
| Cold start | Run immediately after power-on | Sensors may need warm-up time |
| Long run | 5+ minutes continuously | Memory leaks, battery drain, thermal effects |
| Interference | Another robot nearby | IR/light interference |
Pick three rows most relevant to YOUR robot. Test them explicitly. Document the results.
The Integration Checklist
Sensors: - All line sensors read correctly (verify with test script) - Ultrasonic returns valid distances (2-400 cm range) - IMU readings are stable (no drift spikes at startup)
Control:
- State machine handles all transitions (test each one)
- No state has a "dead end" (every state can transition somewhere)
- Timing budget fits within loop period (measure with ticks_ms)
Edge cases and Reliability: - Line lost and recovered / Obstacle detected and avoided - Button stop works from any state / Sensor failure handled - 10 consecutive runs without failure
Worked Example: Diagnosing an Integration Bug
Symptom: "Motors jitter every 500 ms"
Step 1 -- Isolate: Comment out subsystems one at a time.
Result: jitter disappears. The OLED is the suspect.
Worked Example (continued)
Step 2 -- Measure:
t0 = time.ticks_us()
update_display()
t1 = time.ticks_us()
print(f"Display took {time.ticks_diff(t1, t0)} us")
Result: OLED update takes 45 ms. Control loop runs at 20 ms. Display blocks the loop for over two full cycles.
Step 3 -- Fix: Reduce update rate.
Step 4 -- Verify: Measure loop time again. Run 10 laps. Count jitter events -- should be zero.
Exercises
Exercise 1: Code Review -- Find the Problems
Here is a deliberately bad robot controller. Your task: find every DRY, SRP, and defensive coding violation.
# main.py - Robot controller (DELIBERATELY BAD)
from machine import Pin, PWM, ADC, I2C
import time
# Read left sensor and decide
left = ADC(Pin(26)).read_u16()
if left > 30000:
PWM(Pin(10)).duty_u16(int(65535 * 0.8))
PWM(Pin(11)).duty_u16(int(65535 * 0.3))
# Read right sensor and decide
right = ADC(Pin(27)).read_u16()
if right > 30000:
PWM(Pin(10)).duty_u16(int(65535 * 0.3))
PWM(Pin(11)).duty_u16(int(65535 * 0.8))
# Display on OLED
i2c = I2C(1, scl=Pin(15), sda=Pin(14))
# ... display code mixed with control logic ...
# Log data
print(f"L={left},R={right}") # In the control loop!
Code Review: Problem 1 -- Magic Numbers
How many unexplained constants can you count?
left = ADC(Pin(26)).read_u16() # Why pin 26?
if left > 30000: # Why 30000?
PWM(Pin(10)).duty_u16(int(65535 * 0.8)) # Why 0.8?
PWM(Pin(11)).duty_u16(int(65535 * 0.3)) # Why 0.3?
Answer: At least 8 magic numbers: pin numbers (26, 27, 10, 11, 15, 14), threshold (30000), and speed values (0.8, 0.3).
Every one of these should be in config.py with a comment explaining why.
Code Review: Problem 2 -- DRY Violations
The PWM conversion int(65535 * speed) appears four times:
PWM(Pin(10)).duty_u16(int(65535 * 0.8)) # Line 7
PWM(Pin(11)).duty_u16(int(65535 * 0.3)) # Line 8
PWM(Pin(10)).duty_u16(int(65535 * 0.3)) # Line 12
PWM(Pin(11)).duty_u16(int(65535 * 0.8)) # Line 13
If the conversion logic changes (say, you need to add clamping), you must change it in four places.
Fix: Extract a set_motor_speed(left, right) function.
Code Review: Problem 3 -- SRP Violations
main.py does everything:
- Reads sensors (ADC)
- Controls motors (PWM)
- Manages display (I2C, OLED)
- Logs data (print)
- Contains control logic (if/else)
That is at least 5 responsibilities in one file.
Can you describe it in one sentence without "and"? No.
Code Review: Problem 4 -- Missing Defensive Coding
left = ADC(Pin(26)).read_u16()
if left > 30000:
# What if left is garbage? No validation!
# What if ADC returns 65535 due to noise? No check!
PWM(Pin(10)).duty_u16(int(65535 * 0.8))
# What if PWM value exceeds range? No clamping!
Missing: - No sensor validation (what if ADC returns garbage?) - No output clamping (what if calculations overflow?) - No try/except (if code crashes, motors keep running)
Code Review: Problem 5 -- Performance Issue
print() over serial can take 10-30 ms.
If your control loop runs at 100 Hz (10 ms period), a single print() can double the loop time.
Fix: Log only every Nth iteration, or use a non-blocking approach.
Exercise 2: Failure Mode Analysis (FMEA)
Your robot must perform ALL of the following simultaneously:
Six subsystems in a 2x3 grid: line follow, obstacle detect, display status, data logging, stop button, motor control -- all sharing one CPU, one I2C bus, one power supply, one 10 ms loop.
All six subsystems share the same CPU, the same I2C bus, the same power supply, and the same 10 ms control loop.
FMEA: Fill the Failure Mode Table
For each row, identify the cause, how you would detect it, and how the robot should respond.
| # | Failure Mode | Cause | Detection | Mitigation |
|---|---|---|---|---|
| 1 | Line lost on sharp curve | |||
| 2 | Ultrasonic timeout | |||
| 3 | ||||
| 4 | ||||
| 5 |
Rows 3-5 are yours to define. Think about: I2C conflicts, battery drop, flash write blocking, button debounce, motor noise on sensors.
FMEA: Example Answers
Compare your answers:
| # | Failure Mode | Cause | Detection | Mitigation |
|---|---|---|---|---|
| 1 | Line lost on sharp curve | Sensors overshoot the line at speed | All 4 sensors read "no line" | Slow down, arc back toward last known direction |
| 2 | Ultrasonic timeout | Sensor aimed at soft surface (absorbs sound) | No echo within 30 ms | Return last-known-good, set flag |
| 3 | OLED + IMU I2C conflict | Both accessed in same loop iteration | I2C OSError exception | Schedule OLED to non-critical loop iterations |
| 4 | Flash write blocks loop | Writing data log stalls CPU for 5-20 ms | Loop time exceeds budget | Buffer writes, flush during idle periods only |
| 5 | Battery sag under motor load | High current draw drops voltage | ADC voltage < 3.3V threshold | Reduce motor speed, warn on display |
Exercise 3: Timing Budget Check
Given the scenario, estimate whether everything fits in 10 ms:
Operation Estimated Time
------------------------------------------
Line sensors (x4) 20 us
Ultrasonic (non-blocking) 50 us
PID calculation 50 us
Motor update 100 us
State machine 30 us
Data log (buffered) 20 us
Button check 5 us
OLED update (1 in 50) 600 us
------------------------------------------
Total (worst case): 875 us
Budget: 10,000 us
Margin: 9,125 us (91%)
Does this look safe? Yes -- but only if the OLED does not run every loop and the ultrasonic is non-blocking.
What Breaks the Timing Budget?
Now consider what happens if you are NOT careful:
Operation Careful Careless
-----------------------------------------------------
Ultrasonic read 50 us 25,000 us (blocking!)
OLED update 600 us 45,000 us (every loop!)
Data logging 20 us 15,000 us (flush every loop!)
print() for debugging 0 us 10,000 us (left in code!)
-----------------------------------------------------
Total 875 us 95,000 us
Budget: 10,000 us
Careful: 8.75% used --> smooth control
Careless: 950% used --> robot is uncontrollable
The difference between a working robot and a broken robot is often just scheduling discipline.
Exercise 4: Pre-Flight Checklist
Before running your integrated robot, work through this checklist. Add your own items.
Hardware: - [ ] Battery charged above ___V - [ ] All sensor cables connected and secured - [ ] Wheels turn freely, no mechanical binding - [ ] Track surface clean, tape visible under current lighting
Software:
- [ ] All debug print() statements removed or rate-limited
- [ ] Watchdog timer enabled
- [ ] Fail-safe try/except wraps main loop
- [ ] OLED update rate reduced (not every loop)
Integration: - [ ] Each subsystem tested individually first - [ ] Timing budget measured and within limits - [ ] Ran 3+ consecutive laps without failure - [ ] Tested with battery at 50% charge
Pre-Flight Checklist: Your Additions
Add at least 3 more items specific to YOUR robot design:
My pre-flight items: - [ ] ______ - [ ] _____ - [ ] _______
Think about: - What failed during YOUR lab sessions? - What environment differences could affect YOUR sensors? - What is the ONE thing that, if it fails, stops everything?
A checklist is not bureaucracy. Pilots use them because memory is unreliable under pressure. Your demo day is a high-pressure environment.
Summary & Quick Checks
Ten Things to Remember
Software Engineering (Part 1):
-
DRY -- Don't Repeat Yourself. Every duplicated block is a future bug. Extract into a function, call from everywhere.
-
SRP -- Single Responsibility. Each module does one thing. Describe it in one sentence without "and."
-
config.py centralizes constants with documented reasoning -- no more magic numbers scattered across files.
-
Defensive coding assumes sensors will fail -- validate inputs, use last-known-good fallbacks, clamp outputs.
-
Test and commit. Unit tests catch bugs before the robot hits a wall. Git commit messages are documentation for your future self.
Ten Things to Remember (continued)
System Integration (Part 2):
-
Integration reveals hidden interactions -- components that work alone often conflict when combined. The bug is usually between modules, not inside them.
-
Add one component at a time -- incremental integration gives you a bisection strategy for free. If something breaks, the last addition caused it.
-
Know your timing budget -- measure every operation. A healthy system uses less than 50% of its loop period. Margin absorbs worst-case spikes.
-
Anticipate real-world failures -- motor noise, I2C hangs, environment changes. Test under varied conditions, not just the lab bench.
-
Defend in depth -- validate inputs, use fallback values, clamp outputs, wrap in try/except, and feed the watchdog. Every layer catches what the previous one missed.
Quick Check 1
What does DRY stand for and why does it matter?
Don't Repeat Yourself. Duplicated code means duplicated bugs -- change one, forget the other.
How do you test if a module violates SRP?
The "one sentence without AND" test. If you need "and" to describe what it does, it has too many responsibilities.
Quick Check 2
Name three things that belong in config.py.
Pin assignments, sensor thresholds, PID gains, timing periods, speed limits -- any tunable constant with documented reasoning.
Why should you wrap your main loop in try/except on an embedded system?
Because an unhandled exception stops Python execution but PWM outputs keep running at their last duty cycle. The robot drives off the table.
Quick Check 3
Why do components that work individually often fail when combined?
Because integration introduces interactions that do not exist in isolation:
- Timing conflicts -- one component starves another of CPU time
- Resource contention -- shared buses, shared memory, shared power
- Physical coupling -- electrical noise, vibration, thermal effects
- State dependencies -- one module's output is another's input, and edge cases compound
The system is more than the sum of its parts -- and so are its bugs.
Quick Check 4
What is the purpose of a timing budget analysis?
To verify that ALL operations fit within the control loop period BEFORE you discover the problem on the track.
Without timing budget: With timing budget:
"Why does it oscillate?" "OLED takes 45 ms. That exceeds
"Maybe retune PID?" our 10 ms budget. Move OLED to
"Try different gains?" slower schedule."
Hours of wrong guesses. Fixed in 5 minutes.
A timing budget turns a mysterious behavioral problem into a simple arithmetic check.
Quick Check 5
Name the three defensive coding patterns.
- Validate -- Check every sensor reading against physically possible range. Reject garbage before it enters your control logic.
- Fallback -- When a sensor fails, use the last known good value. Stale data is better than no data or garbage data.
- Clamp -- Never send a value outside the hardware's valid range to an actuator.
max(0, min(255, speed))prevents damage.
Quick Check 6
What does a watchdog timer do and why is it important?
A watchdog timer is a hardware countdown. Your code must "feed" it periodically. If the code hangs and stops feeding, the watchdog resets the entire system.
How many trials do you need to trust a measurement?
At least 5-10. One trial is an anecdote. Three is a maybe. Five or more gives you data you can start to trust. Always compute mean, standard deviation, and range.
Bridge to Labs & Next Lecture
What You Now Have
After today and Lab 9, your robot has:
| Layer | Status |
|---|---|
| Hardware | Sensors, motors, display -- all connected |
| Architecture | config.py, modules, SRP, DRY |
| Control | P-control, state machines, defensive coding |
| Testing | Boundary tests, timing budget, pre-flight checklist |
| Integration | Incremental, measured, verified |
Your integrated robot works. You can prove it with data.
Hands-On Next
Lab 9 (This week): Project Integration
- Combine all subsystems into a working robot
- Apply incremental integration: add one component at a time
- Create
config.pywith all constants - Measure your timing budget
- Define success criteria and validate with data
Lab 10 (Next week): Project Work
- Free working time on your project
- Refine, test, and polish your integrated robot
- Apply the pre-flight checklist before every run
- Prepare for demo day with evidence-based results
"You're integrating subsystems on a robot. The same discipline assembles satellites from components built on three continents."
Bridge to Next Lecture
Lecture 6 (Week 11) is the final lecture. We step back from the details.
Not "what did you build" but "what did you learn?"
You have spent 9 weeks building a robot from scratch -- from blinking an LED to a multi-sensor autonomous system with data-driven validation.
The robot was the vehicle -- the engineering mindset is the destination.
Tutorial: Project Integration