Lecture 05: Engineering Practice

Obuda University -- Embedded Systems

Week 9 | Labs completed: State Machines, HW Abstraction

Your Learning Arc (Lectures)

Lecture 1 (Week 1): What is Embedded?
Lecture 2 (Week 3): Sensors, Signals & Actuators
Lecture 3 (Week 5): Actuators & Control
Lecture 4 (Week 7): Software Architecture
Lecture 5 (Week 9): Engineering Practice -- you are here
Lecture 6 (Week 11): Course Synthesis & Demo Prep

Learning Arc (Labs)

Foundation (Weeks 1--2): What is Embedded?, GPIO, Sensors ✓
Sensing & Signals (Weeks 3--4): ADC, I2C, PWM basics, Ultrasonic ✓
Movement & Control (Weeks 5--6): Motor physics, P-control, IMU ✓
Software (Weeks 7--8): State Machines, Abstraction ✓
Engineering (Weeks 9--10): Software Engineering, Integration ← you are here
Synthesis (Weeks 11--12): Course Synthesis, Demo

Recap: Labs 7--8

Lab	What You Did	What You Discovered
Lab 7	Implemented state machines (IDLE, FOLLOW, AVOID, SEARCH)	One variable replaces five booleans; transitions are explicit
Lab 8	Opened picobot library source; traced abstraction layers; measured call overhead	Libraries are structured code with deliberate design decisions about what to expose, hide, and organize

Core discovery: The library is just organized code. YOUR code needs the same organization.

The Open Question

You now understand how the picobot library is organized internally.

But what about YOUR code?

Your main.py has been growing week by week. Motor control, sensor reading, control logic, display updates, and configuration values -- all mixed together.

How do professional embedded engineers structure software that needs to grow and be maintained?

The Narrative

From working code to professional code to working systems.

Your code works → but it's messy → DRY/SRP clean it up → config.py centralizes constants → defensive coding handles failure → testing proves it works → but wait, everything worked alone, why does it break together? → integration challenges → timing budget → failure modes → watchdog → edge case testing.

The engineering practices that make the difference between a demo that works once and a product that works every time.

What You Already Know (From Other Courses)

From...	You already know...	Today we use it for...
Programming	Functions, modules, DRY principle	Code organization, config.py, SRP
Programming	Debugging, testing, version control	Embedded testing strategies, git for firmware
Control Theory	System modeling, simulation before hardware	Test-driven approach: simulate → test → deploy
Electronics	Component tolerances, failure modes	Defensive coding, sensor validation, watchdogs
Lectures 1--4	GPIO, ADC, PWM, P-control, state machines, abstraction	The building blocks you now need to organize professionally

Today's Map

Part 1 -- Writing Better Code 1. DRY & SRP -- Code Organization 2. config.py & Defensive Coding 3. Testing 4. Version Control

Part 2 -- Making It All Work Together 5. The Integration Challenge 6. Failure Modes & Recovery 7. Real Failure Case Studies 8. Testing Strategies 9. Exercises

Part 1

DRY, SRP, and Code Organization

Why Software Engineering Matters in Embedded

In embedded systems, bad software architecture is not just ugly -- it's dangerous. A tangled codebase means you can't confidently change one thing without breaking another. In production, that means bugs that only appear in the field, recalls that cost millions, and safety incidents.

The software patterns we cover today -- DRY, SRP, separation of concerns -- are not academic rules. They're the minimum standard for shipping code that other engineers can read, test, and maintain. MISRA C (the automotive coding standard) has 175 rules. NASA's Power of 10 has strict rules about function length and global variables.

These exist because people died when software was "good enough."

DRY: Don't Repeat Yourself

Here's a pattern I see in every student project around Week 7. The same PWM clamping formula appears in three different functions. It works fine -- until you fix a bug in one copy and forget the other two. This is called a DRY violation, and it's the single most common source of "but I already fixed that" bugs in professional codebases.

The rule is simple: if you write the same logic twice, extract it into a function.

Bad -- same motor control logic copy-pasted:

# In line_follow():
pwm_left = max(0, min(65535, int(speed_l * 257)))
motor_left.duty_u16(pwm_left)

# In obstacle_avoid():
pwm_left = max(0, min(65535, int(speed_l * 257)))  # Same code again!
motor_left.duty_u16(pwm_left)

# In turn_90():
pwm_left = max(0, min(65535, int(speed_l * 257)))  # And again!

DRY: The Fix

Good -- one function, called from everywhere:

def set_motor_speed(left, right):
    """Set motor speeds with clamping and scaling."""
    pwm_l = max(0, min(65535, int(left * 257)))
    pwm_r = max(0, min(65535, int(right * 257)))
    motor_left.duty_u16(pwm_l)
    motor_right.duty_u16(pwm_r)

# Now everywhere just calls:
set_motor_speed(speed_l, speed_r)

Bug in clamping logic? Fix it in one place, not five. Change PWM scaling? Change it once. Every copy-paste is a future bug.

DRY Violation: A Real Robot Example

Spot the problem -- the same conversion logic appears in two separate places:

# DRY violation in a real robot project:
# File: main.py
if line_position < -0.5:
    left_motor.duty_u16(int(65535 * 0.3))
    right_motor.duty_u16(int(65535 * 0.8))
# ... 50 lines later ...
if obstacle_cleared:
    left_motor.duty_u16(int(65535 * 0.3))  # Same conversion!
    right_motor.duty_u16(int(65535 * 0.8))  # Copy-pasted!

What happens when you want to change 0.3 to 0.4?

You change it in one place and forget the other. Guaranteed.

SRP: Single Responsibility Principle

SRP says each module should have one reason to change. But here's where judgment matters: you can take this too far. I've seen codebases with 200 tiny files where you can't understand any single behavior without opening fifteen modules. The art is finding the right granularity. For your robot: sensor reading, motor control, control logic, and configuration are four natural boundaries. That's probably enough. Don't create a utils.py with one function in it.

Each module does one thing. Not "one thing and also this other thing because it was convenient."

Module	Does	Does NOT
`sensors.py`	Read and normalize sensor values	Control motors
`motors.py`	Set motor speeds, handle deadband	Read sensors
`controller.py`	Compute corrections from error	Talk to hardware
`main.py`	Orchestrate the overall flow	Contain low-level I/O

When the line sensor stops working, you know exactly where to look: sensors.py. When motors behave strangely, you open motors.py.

Responsibility isolation = fast debugging.

The SRP Test

Can you describe what a module does in one sentence without using the word "and"?

"sensors.py reads and normalizes sensor data" -- borderline OK (related tasks)
"sensors.py reads sensors and controls motors" -- SRP violation

If you need "and" to describe a module, it has too many responsibilities.

SRP Violation: A Concrete Example

This function does TWO things -- and belongs in two different modules:

# SRP violation: sensors.py that also controls motors
def read_and_react(sensors, motors):  # Does TWO things!
    pos = sensors.read_line()
    if pos < 0:
        motors.turn_right()  # Sensor module controls motors!?

The fix: sensors.py reads data. main.py decides what to do with it.

# sensors.py -- only reads
def read_line_position(sensors):
    return sensors.read_line()

# main.py -- decides what to do
pos = read_line_position(sensors)
if pos < 0:
    motors.turn_right()

Project Structure for Embedded

As your code grows beyond a single file, use a consistent structure:

project/
+-- main.py           # Main loop, state machine
+-- config.py         # All tunable parameters
+-- lib/
|   +-- sensors.py    # Sensor reading module
|   +-- motors.py     # Motor control module
|   +-- display.py    # OLED display module
+-- data/             # Logged data, calibration
+-- README.md         # What, why, how

Each file has a clear role. A new team member can understand the layout in seconds.

This is the structure you will implement in Lab 09.

The config.py Pattern and Defensive Coding

The Problem: Magic Numbers

Every embedded project accumulates constants scattered throughout the code with no explanation.

Bad -- magic numbers everywhere:

# In main.py
time.sleep_ms(10)         # Why 10?
if distance < 15:         # Why 15?
    set_motor_speed(80)   # Why 80?

Six months from now, no one knows where these values came from.

The config.py Pattern

Good -- all constants in one place with documented reasoning:

# config.py -- All tunable parameters in one place

# === Timing (from Lab 05 measurements) ===
CONTROL_PERIOD_MS = 10      # 100 Hz, based on line crossing analysis
ULTRASONIC_PERIOD_MS = 100  # 10 Hz, limits blocking impact

# === Motor (from Lab 02 characterization) ===
MOTOR_DEADBAND = 80         # PWM below this = no movement
BASE_SPEED = 150            # Comfortable cruising speed
MAX_SPEED = 255             # Absolute maximum PWM value

# === Control (from Lab 06 tuning) ===
KP = 45                     # Tuned 2024-01-15, see data/kp_tuning.csv
KD = 5                      # Derivative gain, reduces overshoot

Each constant explains where the value came from.

Using config.py in Code

bg right:35%

Clean import -- your main loop reads like a specification:

from config import CONTROL_PERIOD_MS, BASE_SPEED, KP

while True:
    if time.ticks_diff(now, last) >= CONTROL_PERIOD_MS:
        correction = KP * error
        set_motor_speed(
            BASE_SPEED + correction,
            BASE_SPEED - correction
        )

Want to change the speed? Edit config.py.
Want to retune PID? Edit config.py.
Want to see all tunable values at once? Open config.py.

Benefits of Centralized Configuration

Why config.py is worth the effort:

Change in ONE place -- update a pin number, speed, or threshold once, not across 5 files
Tune without searching -- all Kp, deadband, and timing values in one spot
New team member reads config.py -- understands the entire setup in minutes
Version control shows parameter history -- a diff on config.py tells you exactly what was tuned and when

# config.py -- single source of truth
PIN_MOTOR_LEFT_PWM = 10
PIN_MOTOR_LEFT_DIR = 12
KP = 0.35
BASE_SPEED = 60
LOOP_PERIOD_MS = 10

If a value might change, it belongs in config.py. If it never changes, it still belongs there for documentation.

Defensive Coding: Validate Sensor Readings

Embedded systems do not have a user to click "OK" on an error dialog. Your code must handle problems gracefully.

Sanity-check every sensor read:

def read_distance():
    """Read ultrasonic distance with sanity check."""
    raw = ultrasonic.distance_cm()
    if raw is None or raw < 0 or raw > 400:
        return None  # Invalid reading
    return raw

Never trust raw sensor data. The ultrasonic can return None on timeout. The ADC can spike on noise. The IMU can report garbage during impact.

Defensive Coding: Last-Known-Good Pattern

When a sensor fails, use the last valid reading instead of crashing:

last_good_distance = 100  # Safe default

distance = read_distance()
if distance is not None:
    last_good_distance = distance
else:
    distance = last_good_distance  # Stale but sane

The robot keeps running with stale-but-safe data rather than crashing or passing None to motor control.

Defensive Coding: Clamp Outputs

Never send invalid values to hardware:

def safe_motor_speed(speed):
    """Clamp to valid range, handle edge cases."""
    if speed is None:
        return 0  # Safe default
    return max(0, min(255, int(speed)))

Three lines of defense: 1. Validate inputs -- catch bad sensor data early 2. Use fallbacks -- last-known-good when sensors fail 3. Clamp outputs -- never exceed hardware limits

Defensive Coding: Safe Sensor Reading with Caching

Cache last valid values so you never feed garbage to your controller:

last_valid = [0] * 5  # cache last good values

def safe_read_sensors():
    raw = read_line_sensors()
    for i in range(5):
        if 0 < raw[i] < 4095:  # valid range
            last_valid[i] = raw[i]
        # else: keep last valid value
    return last_valid

If one sensor returns garbage from electrical noise, the cached value keeps the robot on track.

Defensive Coding: Line-Following Recovery

When the line is lost, do not stop -- use the last known direction:

def follow_line():
    error = get_line_error()
    if error is None:  # lost the line!
        # Don't stop -- use last known direction
        if last_error > 0:
            turn_right(recovery_speed)
        else:
            turn_left(recovery_speed)
    else:
        last_error = error
        apply_correction(error)

Never trust raw input blindly. Always have a fallback. Whether it is a cached sensor value, a recovery maneuver, or a safe default -- your code must have a plan for when reality does not match expectations.

Code Comments: Explain WHY, Not WHAT

The code already tells the reader WHAT it does. Comments explain WHY.

Bad Comment	Good Comment
`PERIOD = 10 # Set period to 10`	`PERIOD_MS = 10 # 100 Hz -- must be >10x line crossing time`
`speed = 0 # Set speed to zero`	`speed = 0 # Safe state: stop if sensor returns None`
`x = x + 1 # Increment x`	`retry_count += 1 # Retry up to 3x before declaring sensor dead`

If you need a comment to explain WHAT a line does, consider renaming the variable instead.

Testing

Test Levels for Embedded

In industry, embedded testing is a whole discipline. Unit tests run on the host PC (no hardware needed). Hardware-in-the-loop (HIL) tests run real code against simulated sensors. System tests run on the actual product. For a car ECU, there are often more lines of test code than product code.

Every hour of testing saves ten hours of debugging. Three natural test levels:

Level	What You Test	Example	How
Unit	Single function in isolation	`test_sensor_normalize()`	Call function, check output
Integration	Components working together	Sensor + Controller + Motor	Run subsystem, verify behavior
System	Full mission on real hardware	Complete lap on the track	Run the robot, measure performance

Most tests should be unit tests -- fast, isolated, and they tell you exactly what broke.

The Testing Pyramid

Most of your tests should be at the bottom -- fast and specific:

                  /\
                 /  \
                / Sys \       System: "Robot completes 10 laps"
               /  tem  \
              /----------\
             / Integration\    Integration: "Sensors + Motors
            /              \    work without timing conflicts"
           /----------------\
          /    Unit Tests    \    Unit: "normalize() maps
         /                    \    5000 -> 0, 50000 -> 100"
        /______________________\

Unit tests are fast, isolated, and tell you exactly what broke. System tests tell you if it all works together -- but not where it fails.

A Simple Test Script Pattern

You do not need a testing framework. A simple script is enough:

# test_motors.py -- Run on the Pico to verify motor behavior
from motors import set_motor_speed
from config import MOTOR_DEADBAND
import time

def test_motor_deadband():
    """Verify motor starts moving at expected PWM."""
    print("=== Motor Deadband Test ===")
    for pwm in range(0, 100, 5):
        set_motor_speed(pwm, pwm)
        time.sleep(0.5)
        moving = is_robot_moving()  # Check via IMU
        print(f"  PWM={pwm:3d}: {'MOVING' if moving else 'stopped'}")
    set_motor_speed(0, 0)
    print("=== Done ===\n")

Test scripts live in your project. They are not throwaway code.

Boundary and Edge Case Testing

The most valuable tests check boundaries -- where things go wrong:

def test_safe_motor_speed():
    """Verify clamping at boundaries."""
    assert safe_motor_speed(0) == 0       # Minimum
    assert safe_motor_speed(255) == 255   # Maximum
    assert safe_motor_speed(-10) == 0     # Below minimum
    assert safe_motor_speed(300) == 255   # Above maximum
    assert safe_motor_speed(None) == 0    # Invalid input
    assert safe_motor_speed(3.7) == 3     # Float truncation
    print("safe_motor_speed: ALL PASSED")

If it passes with normal inputs but fails at the edges, the bug is just waiting for a bad sensor read.

Sample Size Matters

How many trials do you need when testing your robot?

Trials	What You Have
1	An anecdote -- proves nothing
3	A maybe -- could be luck
5+	Data you can start to trust
10+	Solid statistics with confidence

Why? Variance decreases with more samples.

The more you run, the closer your measured mean gets to the true performance.

"It worked once" is not evidence. "It worked 10 times with these measurements" is.

Visualizing Your Data

Good presentation vs bad presentation:

BAD:  "Kp = 0.5 went faster."

GOOD:
Kp = 0.3: |||||||||||||||||||||||||||||| 28.6s  (std = 0.6)
Kp = 0.5: |||||||||||||||||||||||||||||  27.3s  (std = 2.1)

The bar chart tells you: - Kp = 0.5 is slightly faster on average - Kp = 0.5 has 3.5x more variation

Numbers without context mislead. Always show spread, not just averages.

Version Control

Git for Embedded Projects

Version control is not optional for engineering work.

Practice	Why
Commit messages reference measurements	`"Tune Kp to 45 based on step response"`
Tag working configurations	`git tag v1.0-line-follow-working`
Never commit credentials or secrets	WiFi passwords, API keys stay out
Commit config.py changes separately	Easy to revert parameter changes

Git Workflow for Embedded Projects

Hardware state cannot be rolled back -- but your software state can.

Commit working states -- after each successful test, commit. "Working line follow at Kp=0.35" is a valuable checkpoint.
Branch for experiments -- try a new control algorithm on a branch. If it fails, return to the working state.
Never commit broken code to main -- main should always be a working robot.
Commit messages that help future you -- "Increased Kp from 0.3 to 0.5, oscillates at speed > 70" is useful. "Fixed stuff" is not.

Think of Git as your engineering notebook for code. Every commit is a dated entry. Main is your "known good." Branches are experiments. Tags are milestones.

Good vs Bad Commit Messages

Good -- specific, references data, explains WHY:

Tune Kp to 45 based on step response (see data/kp.csv)

Refactor motor control into motors.py (SRP)

Fix off-by-one in sensor normalization (white=0, not 1)

Bad -- vague, no context, useless to future-you:

fix stuff

updated code

changes

A vague commit message is a message to your future self that says "figure it out yourself."

The Basic Workflow

Workflow loop: Edit → Test → git add → git commit → Push → repeat. If tests fail, loop back to Edit immediately.

Make a change
Run your test script on the Pico
If it passes, stage and commit with a meaningful message
Push when you have a stable version

Do not commit broken code. Do not push untested code.

What Belongs in Version Control

Include	Exclude
Source code (`.py`)	Large data files (`.csv` > 1 MB)
Configuration (`config.py`)	Credentials, passwords, API keys
Test scripts	Compiled binaries (`.mpy`, `.uf2`)
Documentation (`README.md`)	IDE-specific settings

Use .gitignore to prevent accidental commits:

*.mpy
*.uf2
__pycache__/
.env
data/*.csv

Part 1 Summary

Five Principles for Better Code

DRY -- Don't Repeat Yourself. Every duplicated block is a future bug. Extract into a function, call from everywhere.
SRP -- Single Responsibility. Each module does one thing. Describe it in one sentence without "and."
config.py centralizes constants with documented reasoning -- no more magic numbers scattered across files.
Defensive coding assumes sensors will fail -- validate inputs, use last-known-good fallbacks, clamp outputs.
Test and commit. Unit tests catch bugs before the robot hits a wall. Git commit messages are documentation for your future self.

Part 2

Making It All Work Together

The Transition

Each piece works alone. Sensors read correctly. Motors respond to commands. State machines switch cleanly. Modules are organized.

But put them together -- timing conflicts, resource contention, unexpected interactions.

The challenge is not the parts. It is the interactions between parts.

The Integration Challenge

Components Work Alone. Systems Fail Together.

bg right:35%

You have tested each subsystem individually. Sensors read correctly. Motors respond. State machine works. Display shows data.

Now connect them all -- and watch things break.

Individual Testing:              Integration Testing:

 [ok] Sensors work                ? Sensors + Motors together
 [ok] Motors work                 ? Timing conflicts
 [ok] Display works               ? Resource contention
 [ok] State machine works         ? Edge cases between subsystems

This is not a failure of your code. This is the nature of integration.

The Integration Iceberg

Ask any engineer what the hardest part of a project is, and they'll say integration. Not because the pieces are hard -- but because the interactions between pieces are unpredictable. This is true at every scale: chip, board, system, system-of-systems. The skill you're practicing this week is the most valuable and hardest-to-teach skill in engineering.

When you demo your robot, you show the happy path.

Iceberg: above the waterline -- the happy path you demo. Below -- timing conflicts, resource contention, sensor noise, edge cases, race conditions, power issues, thermal drift, mechanical wear. The hidden mass is 4x larger.

Everything below the waterline is invisible until you systematically test under stress.

The Symptom vs The Cause

One visible symptom, multiple hidden causes:

What you see:     "The robot oscillates on curves"
                  ─────────────────────────────────
What's actually   | Sensor readings are noisy       |
wrong:            | Motor response is nonlinear     |
                  | Loop timing varies 8-15 ms      |
                  | Battery voltage dropped 0.3V    |
                  | One wheel has more friction      |

The symptom (oscillation) is visible. The causes are hidden beneath the surface.

Integration debugging = looking below the waterline -- measuring timing, logging sensors, checking power levels.

Timing Conflicts

The ultrasonic sensor blocks for up to 25 ms. During that time, the line-following control loop does not run. The robot drifts off the line.

# This "works" in isolation but kills line-following
def check_obstacle():
    distance = read_ultrasonic()  # Blocks 0-25 ms!
    return distance < 15

Not because the line-follower is broken -- it was starved of CPU time.

What if the OLED update (10-50 ms) overlaps with a sensor read?

Resource Contention

Multiple devices sharing a bus:

Bus diagram: OLED, IMU, and Sensor all connected to the same I2C SDA/SCL bus. Simultaneous access = contention.

If you update the OLED display while the IMU is being read, one gets corrupted data -- or the I2C bus hangs entirely.

Shared resources need coordination. Schedule access so only one device uses the bus at a time.

Unexpected Physical Interactions

These are not software bugs. They are system-level interactions:

Motor PWM creates electrical noise that corrupts ADC readings
Vibration from motors causes gyro readings to spike
IR from the sun overwhelms the IR line sensor
Battery voltage drops under motor load, affecting sensor thresholds

They only appear when everything runs together.

When something breaks during integration, ask: "What interaction caused this?" The bug is usually between modules, not inside them.

Integration Testing Methodology

When combining subsystems, do NOT add everything at once. Follow this process:

Step 1: Test each component alone (unit test)
        [Sensors] ok    [Motors] ok    [Display] ok    [IMU] ok

Step 2: Add components ONE at a time
        [Sensors + Motors] --> test

Step 3: After each addition, run ALL previous tests
        [Sensors + Motors + Display] --> test sensors, test motors, test display

Step 4: If something breaks, the LAST addition caused it
        [Sensors + Motors + Display + IMU] --> Motors jitter?
        --> IMU is the suspect. Investigate the interaction.

Incremental integration gives you a bisection strategy for free.

Why Incremental Integration Works

Adding all subsystems at once:

  Everything added at once --> 5 things broke
  Which component caused which problem? Unknown.
  Debugging time: hours

Adding subsystems one at a time:

  + Sensors          --> OK
  + Motors           --> OK
  + Display          --> Motors jitter (FOUND IT)
  + IMU              --> OK
  + Ultrasonic       --> OK
  Debugging time: minutes

Each step isolates exactly one variable. This is the scientific method applied to engineering.

Detailed Timing Budget Analysis

bg right:35%

Control loop target: 10 ms (100 Hz)

Operation	Time (us)	Cumulative	% Budget
Line sensors (x4)	20	20	0.2%
IMU read (I2C)	800	820	8.2%
PID calculation	50	870	8.7%
Motor update	100	970	9.7%
State machine	30	1000	10.0%
Ultrasonic (NB)	50	1050	10.5%
OLED (1 in 50)	600	1650	16.5%
MARGIN		8350	83.5%

A healthy system uses less than 50% of its timing budget. Margin absorbs worst-case spikes, garbage collection pauses, and future features.

What Eats Your Timing Budget

Not all operations cost the same. I/O dominates:

Fast (< 100 us):
  +-- GPIO read (line sensors)    ~1 us each
  +-- PID calculation             ~50 us
  +-- State machine logic         ~30 us
  +-- GPIO write (motor PWM)      ~10 us

Slow (100-1000 us):
  +-- I2C IMU read                ~800 us
  +-- I2C OLED partial update     ~600 us
  +-- Non-blocking ultrasonic     ~50 us (just check, no wait)

Dangerous (> 1000 us):
  +-- Blocking ultrasonic read    ~25,000 us !!
  +-- Full OLED screen refresh    ~45,000 us !!
  +-- print() over USB serial     ~10,000 us !!

If it talks to a bus or a wire, measure it. Assumptions about timing are usually wrong.

Measuring Your Timing Budget

One of the most important integration tests: does everything fit?

while True:
    t_start = time.ticks_us()

    read_line_sensors()
    read_ultrasonic_nb()
    compute_pid()
    update_motors()
    check_state_transitions()
    update_display_if_due()

    t_total = time.ticks_diff(time.ticks_us(), t_start)
    if t_total > LOOP_PERIOD_US:
        print(f"WARNING: loop {t_total} us (budget: {LOOP_PERIOD_US})")

If your loop takes 15 ms but your PID assumes 10 ms, your gains are effectively wrong. The robot oscillates, and you waste time retuning when the real problem is a timing violation.

Failure Modes and Recovery

Common Failure Modes

A robust system is not one that never fails -- it is one that fails gracefully.

Failure	Cause	Detection	Recovery
Line lost	Sharp turn, noise, gap	No sensors see line	Slow down, search pattern
Collision	Sensor fail, too close	IMU impact spike	Stop, reverse, reassess
Stuck	Obstacle, wheel trapped	No progress despite motors on	Turn, try alternate path
Drift	Motor mismatch, wheel slip	Gyro heading diverges	Gyro-assisted correction
Battery low	Normal use over time	ADC voltage drops	Warn user, reduce speed

Defensive Coding Recap

Remember the defensive patterns from Part 1 -- validate, fallback, clamp -- they become critical during integration. Every sensor read should be validated. Every output should be clamped. Every failure should have a safe fallback.

During integration, these patterns catch the interaction bugs that never appear in unit testing.

Sensor --> [Validate] --> [Fallback if invalid] --> Control Logic --> [Clamp] --> Actuator

The Watchdog Concept

Every commercial embedded product ships with a watchdog. Not because the engineers expect their code to crash, but because they know it will. Cosmic rays flip bits. Power glitches corrupt RAM. Timing races cause deadlocks. The watchdog is your last line of defense.

What if your main loop itself hangs? A try/except cannot catch an infinite loop or a deadlocked I2C bus.

from machine import WDT

# Enable watchdog with 5-second timeout
wdt = WDT(timeout=5000)

while True:
    read_sensors()
    update_motors()
    check_state()

    wdt.feed()  # "I'm still alive"

    # If this loop hangs for > 5 seconds,
    # hardware automatically resets the Pico

A hung robot that resets and recovers is far better than one that stays frozen.

Watchdog Timer: Key Points

Enable the watchdog AFTER initialization -- startup may take longer than the timeout
Feed it in the main loop -- if the loop stalls, the system resets automatically
Standard practice in production -- every shipping embedded product uses a watchdog
Better to reset and restart than stuck in unknown state -- a reset takes milliseconds; a stuck robot stays stuck forever

Normal operation:          Hung system:

  Loop runs                  Loop stuck
  wdt.feed()                 (no feed)
  Loop runs                  (no feed)
  wdt.feed()                 TIMEOUT --> RESET
  Loop runs                  System reboots and recovers

A try/except cannot catch an infinite loop or a deadlocked I2C bus. The watchdog is your last line of defense.

Real Failure Case Studies

Case 1: Motor PWM Noise Corrupts ADC

Symptom: Line sensor readings jump randomly when motors run at high speed. Robot follows the line when motors are slow but loses it at full speed.

Root cause: Motor PWM switching creates electromagnetic noise on the power rail. The ADC picks up this noise as false readings.

Mitigations:

Hardware fix:
  +-- Add 100 nF capacitor across motor terminals
  +-- Route sensor wires away from motor wires

Software fix:
  +-- Read ADC multiple times and average
  +-- Read ADC during PWM "off" phase (brief motor silence)
  +-- Reject readings that change by > 30% in one cycle

When software filtering is not enough, the fix is in hardware. Integration problems often cross the hardware/software boundary.

Case 2: I2C Bus Hangs

Symptom: Robot freezes after 2-5 minutes of operation. The OLED display shows the last frame. Motors stay at their last speed. Only a power cycle recovers.

Root cause: OLED and IMU share the I2C bus. Occasionally, their access overlaps, and the bus enters an invalid state. The SDA line is held low, blocking all further communication.

Mitigations:

# Schedule access -- never overlap
if loop_count % 50 == 0:
    update_oled()        # Only every 50th loop
else:
    read_imu()           # All other loops

# Add timeout + bus recovery
try:
    data = i2c.readfrom(IMU_ADDR, 6)
except OSError:
    recover_i2c_bus()    # Toggle SCL to release SDA

Case 3: Works in Lab, Fails in Competition

Symptom: Robot follows the line perfectly on the lab bench. At the competition, it loses the line within seconds.

Root cause: Not one thing -- multiple environment differences:

Lab conditions:              Competition conditions:
+-- Fluorescent lighting     +-- Stage lighting (different IR)
+-- White paper track         +-- Glossy poster track (reflections)
+-- Full battery             +-- Battery at 70% after practice runs
+-- Room temperature 22C     +-- Venue temperature 28C (sensor drift)
+-- Flat table               +-- Slightly uneven surface

Mitigation: Test under varied conditions BEFORE the event.

Different lighting (cover windows, use flashlight)
Different surfaces (tape on cardboard, on wood, on plastic)
Low battery (run until it fails, note the voltage)

Testing Strategies

Define Success Before You Test

The most common testing mistake: running the robot and declaring "it works" based on a feeling.

Requirement	Criterion	How to Measure
Line following	< 5 line losses per lap	Count losses (sensor log)
Obstacle detection	Stop within 10 cm	Measure with ruler
Lap time	< 30 seconds	Stopwatch or timestamp log
Consistency	Std dev < 10% of mean	Multiple timed runs
Battery resilience	Works at 3.3V supply	Test with drained battery

Write down what "pass" means before you power on the robot.

Edge Case Testing

The happy path is not enough. Plan for the unhappy paths:

test_cases = [
    ("Line lost briefly",          "recover within 2 seconds"),
    ("Line lost permanently",      "stop after timeout"),
    ("Obstacle appears suddenly",  "stop within 10 cm"),
    ("Button pressed during turn", "stop immediately"),
    ("Low battery (3.3V)",         "reduce speed, warn user"),
    ("Sensor returns garbage",     "use last known good value"),
    ("I2C bus hangs",              "timeout and recover"),
    ("Two failures at once",       "enter safe state"),
]

How do you test "line lost"? Remove the tape. Pick up the robot. Cover the sensors. These are your edge case tools.

Edge Case Test Matrix

Beyond the happy path -- systematic boundary conditions that reveal hidden problems:

Condition	What to Test	Why It Matters
Low battery	Run at 6.5V instead of 7.4V	Motor response changes, sensors affected
Sharp turn	90-degree corner at full speed	Tests sensor-to-motor latency
Line gap	5 cm break in the line	Tests recovery behavior
Surface change	Matte to glossy surface	Sensor calibration may be wrong
Cold start	Run immediately after power-on	Sensors may need warm-up time
Long run	5+ minutes continuously	Memory leaks, battery drain, thermal effects
Interference	Another robot nearby	IR/light interference

Pick three rows most relevant to YOUR robot. Test them explicitly. Document the results.

The Integration Checklist

Sensors: - All line sensors read correctly (verify with test script) - Ultrasonic returns valid distances (2-400 cm range) - IMU readings are stable (no drift spikes at startup)

Control: - State machine handles all transitions (test each one) - No state has a "dead end" (every state can transition somewhere) - Timing budget fits within loop period (measure with ticks_ms)

Edge cases and Reliability: - Line lost and recovered / Obstacle detected and avoided - Button stop works from any state / Sensor failure handled - 10 consecutive runs without failure

Worked Example: Diagnosing an Integration Bug

Symptom: "Motors jitter every 500 ms"

Step 1 -- Isolate: Comment out subsystems one at a time.

while True:
    read_sensors()
    compute_pid()
    update_motors()
    # update_display()    # <-- commented out

Result: jitter disappears. The OLED is the suspect.

Worked Example (continued)

Step 2 -- Measure:

t0 = time.ticks_us()
update_display()
t1 = time.ticks_us()
print(f"Display took {time.ticks_diff(t1, t0)} us")

Result: OLED update takes 45 ms. Control loop runs at 20 ms. Display blocks the loop for over two full cycles.

Step 3 -- Fix: Reduce update rate.

if time.ticks_diff(now, last_display) > 500:
    update_display()
    last_display = now

Step 4 -- Verify: Measure loop time again. Run 10 laps. Count jitter events -- should be zero.

Exercises

Exercise 1: Code Review -- Find the Problems

Here is a deliberately bad robot controller. Your task: find every DRY, SRP, and defensive coding violation.

# main.py - Robot controller (DELIBERATELY BAD)
from machine import Pin, PWM, ADC, I2C
import time

# Read left sensor and decide
left = ADC(Pin(26)).read_u16()
if left > 30000:
    PWM(Pin(10)).duty_u16(int(65535 * 0.8))
    PWM(Pin(11)).duty_u16(int(65535 * 0.3))

# Read right sensor and decide
right = ADC(Pin(27)).read_u16()
if right > 30000:
    PWM(Pin(10)).duty_u16(int(65535 * 0.3))
    PWM(Pin(11)).duty_u16(int(65535 * 0.8))

# Display on OLED
i2c = I2C(1, scl=Pin(15), sda=Pin(14))
# ... display code mixed with control logic ...

# Log data
print(f"L={left},R={right}")  # In the control loop!

Code Review: Problem 1 -- Magic Numbers

How many unexplained constants can you count?

left = ADC(Pin(26)).read_u16()       # Why pin 26?
if left > 30000:                      # Why 30000?
    PWM(Pin(10)).duty_u16(int(65535 * 0.8))  # Why 0.8?
    PWM(Pin(11)).duty_u16(int(65535 * 0.3))  # Why 0.3?

Answer: At least 8 magic numbers: pin numbers (26, 27, 10, 11, 15, 14), threshold (30000), and speed values (0.8, 0.3).

Every one of these should be in config.py with a comment explaining why.

Code Review: Problem 2 -- DRY Violations

The PWM conversion int(65535 * speed) appears four times:

PWM(Pin(10)).duty_u16(int(65535 * 0.8))   # Line 7
PWM(Pin(11)).duty_u16(int(65535 * 0.3))   # Line 8
PWM(Pin(10)).duty_u16(int(65535 * 0.3))   # Line 12
PWM(Pin(11)).duty_u16(int(65535 * 0.8))   # Line 13

If the conversion logic changes (say, you need to add clamping), you must change it in four places.

Fix: Extract a set_motor_speed(left, right) function.

Code Review: Problem 3 -- SRP Violations

main.py does everything:

Reads sensors (ADC)
Controls motors (PWM)
Manages display (I2C, OLED)
Logs data (print)
Contains control logic (if/else)

That is at least 5 responsibilities in one file.

Can you describe it in one sentence without "and"? No.

Code Review: Problem 4 -- Missing Defensive Coding

left = ADC(Pin(26)).read_u16()
if left > 30000:
    # What if left is garbage? No validation!
    # What if ADC returns 65535 due to noise? No check!
    PWM(Pin(10)).duty_u16(int(65535 * 0.8))
    # What if PWM value exceeds range? No clamping!

Missing: - No sensor validation (what if ADC returns garbage?) - No output clamping (what if calculations overflow?) - No try/except (if code crashes, motors keep running)

Code Review: Problem 5 -- Performance Issue

# Log data
print(f"L={left},R={right}")  # In the control loop!

print() over serial can take 10-30 ms.

If your control loop runs at 100 Hz (10 ms period), a single print() can double the loop time.

Fix: Log only every Nth iteration, or use a non-blocking approach.

Exercise 2: Failure Mode Analysis (FMEA)

Your robot must perform ALL of the following simultaneously:

Six subsystems in a 2x3 grid: line follow, obstacle detect, display status, data logging, stop button, motor control -- all sharing one CPU, one I2C bus, one power supply, one 10 ms loop.

All six subsystems share the same CPU, the same I2C bus, the same power supply, and the same 10 ms control loop.

FMEA: Fill the Failure Mode Table

For each row, identify the cause, how you would detect it, and how the robot should respond.

#	Failure Mode	Cause	Detection	Mitigation
1	Line lost on sharp curve
2	Ultrasonic timeout
3
4
5

Rows 3-5 are yours to define. Think about: I2C conflicts, battery drop, flash write blocking, button debounce, motor noise on sensors.

FMEA: Example Answers

Compare your answers:

#	Failure Mode	Cause	Detection	Mitigation
1	Line lost on sharp curve	Sensors overshoot the line at speed	All 4 sensors read "no line"	Slow down, arc back toward last known direction
2	Ultrasonic timeout	Sensor aimed at soft surface (absorbs sound)	No echo within 30 ms	Return last-known-good, set flag
3	OLED + IMU I2C conflict	Both accessed in same loop iteration	I2C OSError exception	Schedule OLED to non-critical loop iterations
4	Flash write blocks loop	Writing data log stalls CPU for 5-20 ms	Loop time exceeds budget	Buffer writes, flush during idle periods only
5	Battery sag under motor load	High current draw drops voltage	ADC voltage < 3.3V threshold	Reduce motor speed, warn on display

Exercise 3: Timing Budget Check

Given the scenario, estimate whether everything fits in 10 ms:

Operation               Estimated Time
------------------------------------------
Line sensors (x4)            20 us
Ultrasonic (non-blocking)    50 us
PID calculation              50 us
Motor update                100 us
State machine                30 us
Data log (buffered)          20 us
Button check                  5 us
OLED update (1 in 50)       600 us
------------------------------------------
Total (worst case):         875 us
Budget:                  10,000 us
Margin:                   9,125 us (91%)

Does this look safe? Yes -- but only if the OLED does not run every loop and the ultrasonic is non-blocking.

What Breaks the Timing Budget?

Now consider what happens if you are NOT careful:

Operation                   Careful        Careless
-----------------------------------------------------
Ultrasonic read              50 us        25,000 us  (blocking!)
OLED update                 600 us        45,000 us  (every loop!)
Data logging                 20 us        15,000 us  (flush every loop!)
print() for debugging         0 us        10,000 us  (left in code!)
-----------------------------------------------------
Total                       875 us        95,000 us

Budget: 10,000 us

Careful:   8.75% used    --> smooth control
Careless: 950% used      --> robot is uncontrollable

The difference between a working robot and a broken robot is often just scheduling discipline.

Exercise 4: Pre-Flight Checklist

Before running your integrated robot, work through this checklist. Add your own items.

Hardware: - [ ] Battery charged above ___V - [ ] All sensor cables connected and secured - [ ] Wheels turn freely, no mechanical binding - [ ] Track surface clean, tape visible under current lighting

Software: - [ ] All debug print() statements removed or rate-limited - [ ] Watchdog timer enabled - [ ] Fail-safe try/except wraps main loop - [ ] OLED update rate reduced (not every loop)

Integration: - [ ] Each subsystem tested individually first - [ ] Timing budget measured and within limits - [ ] Ran 3+ consecutive laps without failure - [ ] Tested with battery at 50% charge

Pre-Flight Checklist: Your Additions

Add at least 3 more items specific to YOUR robot design:

My pre-flight items: - [ ] ______ - [ ] _____ - [ ] _______

Think about: - What failed during YOUR lab sessions? - What environment differences could affect YOUR sensors? - What is the ONE thing that, if it fails, stops everything?

A checklist is not bureaucracy. Pilots use them because memory is unreliable under pressure. Your demo day is a high-pressure environment.

Summary & Quick Checks

Ten Things to Remember

Software Engineering (Part 1):

DRY -- Don't Repeat Yourself. Every duplicated block is a future bug. Extract into a function, call from everywhere.
SRP -- Single Responsibility. Each module does one thing. Describe it in one sentence without "and."
config.py centralizes constants with documented reasoning -- no more magic numbers scattered across files.
Defensive coding assumes sensors will fail -- validate inputs, use last-known-good fallbacks, clamp outputs.
Test and commit. Unit tests catch bugs before the robot hits a wall. Git commit messages are documentation for your future self.

Ten Things to Remember (continued)

System Integration (Part 2):

Integration reveals hidden interactions -- components that work alone often conflict when combined. The bug is usually between modules, not inside them.
Add one component at a time -- incremental integration gives you a bisection strategy for free. If something breaks, the last addition caused it.
Know your timing budget -- measure every operation. A healthy system uses less than 50% of its loop period. Margin absorbs worst-case spikes.
Anticipate real-world failures -- motor noise, I2C hangs, environment changes. Test under varied conditions, not just the lab bench.
Defend in depth -- validate inputs, use fallback values, clamp outputs, wrap in try/except, and feed the watchdog. Every layer catches what the previous one missed.

Quick Check 1

What does DRY stand for and why does it matter?

Don't Repeat Yourself. Duplicated code means duplicated bugs -- change one, forget the other.

How do you test if a module violates SRP?

The "one sentence without AND" test. If you need "and" to describe what it does, it has too many responsibilities.

Quick Check 2

Name three things that belong in config.py.

Pin assignments, sensor thresholds, PID gains, timing periods, speed limits -- any tunable constant with documented reasoning.

Why should you wrap your main loop in try/except on an embedded system?

Because an unhandled exception stops Python execution but PWM outputs keep running at their last duty cycle. The robot drives off the table.

Quick Check 3

Why do components that work individually often fail when combined?

Because integration introduces interactions that do not exist in isolation:

Timing conflicts -- one component starves another of CPU time
Resource contention -- shared buses, shared memory, shared power
Physical coupling -- electrical noise, vibration, thermal effects
State dependencies -- one module's output is another's input, and edge cases compound

The system is more than the sum of its parts -- and so are its bugs.

Quick Check 4

What is the purpose of a timing budget analysis?

To verify that ALL operations fit within the control loop period BEFORE you discover the problem on the track.

Without timing budget:          With timing budget:
  "Why does it oscillate?"        "OLED takes 45 ms. That exceeds
  "Maybe retune PID?"             our 10 ms budget. Move OLED to
  "Try different gains?"          slower schedule."
  Hours of wrong guesses.         Fixed in 5 minutes.

A timing budget turns a mysterious behavioral problem into a simple arithmetic check.

Quick Check 5

Name the three defensive coding patterns.

Validate -- Check every sensor reading against physically possible range. Reject garbage before it enters your control logic.
Fallback -- When a sensor fails, use the last known good value. Stale data is better than no data or garbage data.
Clamp -- Never send a value outside the hardware's valid range to an actuator. max(0, min(255, speed)) prevents damage.

Quick Check 6

What does a watchdog timer do and why is it important?

A watchdog timer is a hardware countdown. Your code must "feed" it periodically. If the code hangs and stops feeding, the watchdog resets the entire system.

How many trials do you need to trust a measurement?

At least 5-10. One trial is an anecdote. Three is a maybe. Five or more gives you data you can start to trust. Always compute mean, standard deviation, and range.

Bridge to Labs & Next Lecture

What You Now Have

After today and Lab 9, your robot has:

Layer	Status
Hardware	Sensors, motors, display -- all connected
Architecture	config.py, modules, SRP, DRY
Control	P-control, state machines, defensive coding
Testing	Boundary tests, timing budget, pre-flight checklist
Integration	Incremental, measured, verified

Your integrated robot works. You can prove it with data.

Hands-On Next

Lab 9 (This week): Project Integration

Combine all subsystems into a working robot
Apply incremental integration: add one component at a time
Create config.py with all constants
Measure your timing budget
Define success criteria and validate with data

Lab 10 (Next week): Project Work

Free working time on your project
Refine, test, and polish your integrated robot
Apply the pre-flight checklist before every run
Prepare for demo day with evidence-based results

"You're integrating subsystems on a robot. The same discipline assembles satellites from components built on three continents."

Bridge to Next Lecture

Lecture 6 (Week 11) is the final lecture. We step back from the details.

Not "what did you build" but "what did you learn?"

You have spent 9 weeks building a robot from scratch -- from blinking an LED to a multi-sensor autonomous system with data-driven validation.

The robot was the vehicle -- the engineering mindset is the destination.

Tutorial: Project Integration