Skip to content

Software Architectures

How to Design Embedded Software

When designing embedded software, architecture is one of the most critical—but often overlooked—elements. Unlike desktop or web applications, embedded systems must respond to real-world events in real-time, using limited memory and processing power. The software must not only be correct—it must be reliable, responsive, and efficient.

But here's the thing: software architectures in embedded systems aren't laws—they're tools. They're meant to guide your design, not restrict it. In practice, blending multiple approaches or bending the rules a little often leads to more practical, maintainable code. Simplicity should always win over unnecessary complexity.

Bare-Metal Programming

This guide focuses on bare-metal (also called firmware-level) programming—where your code runs directly on the microcontroller hardware without an operating system. There's no scheduler, no threads, no OS services. Just your code and the hardware. This is the foundation of embedded systems and essential knowledge for electrical engineers.

What about the bootloader? Even in bare-metal systems, there's usually a small piece of code called a bootloader that runs first when the microcontroller powers on. For example, the RP2040/RP2350 on your Pico has a built-in bootloader in ROM that handles USB programming (UF2 mode). The bootloader then hands control to your application code. Think of it as the "starter motor" that gets your main code running—you don't write it, but it's good to know it exists.

Let's walk through the most common architectural patterns in bare-metal embedded systems, understand their benefits and limitations, and see how to evolve from basic loops to more structured, reactive designs.

Architecture Overview

As your embedded application grows in complexity, you'll naturally progress through different architectural patterns. Here's the typical evolution:

┌─────────────────┐     ┌─────────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Round-Robin   │ ──► │  Round-Robin + ISR  │ ──► │  State Machine  │ ──► │      RTOS       │
2│  (Super Loop)   │     │   (Foreground/      │     │  (Event-Driven) │     │  (Multi-task)   │
│                 │     │    Background)      │     │                 │     │                 │
└─────────────────┘     └─────────────────────┘     └─────────────────┘     └─────────────────┘
     Simplest              More Responsive            Most Structured         OS-managed tasks
     Polling-based         Hardware-triggered         Organized behavior      FreeRTOS, Zephyr

Each pattern builds on the previous one. You don't always need the most complex solution—choose the simplest architecture that meets your requirements.

What About RTOS and Other Operating Systems?

Beyond bare-metal, there's RTOS (Real-Time Operating System) which provides task scheduling: - Cooperative multitasking: Tasks voluntarily yield control - Preemptive multitasking: OS interrupts tasks based on priority (FreeRTOS, Zephyr)

RTOS still runs on microcontrollers (MCUs) without memory protection. For more details, see RTOS Introduction.

Full operating systems (Linux, Windows) require microprocessors (MPUs) with MMU (Memory Management Unit) for memory protection and virtual memory. That's covered in the Embedded Linux course—not here.

Think Before You Code

Before writing a single line of code, take a step back. Good embedded software design starts with understanding the problem and planning the system behavior. Code should be a translation of logic, not a place to invent it as you go.

Starting Simple: Round-Robin (Super Loop)

In bare-metal systems—where no operating system is involved—the most common and intuitive structure is called the round-robin or super loop. Your program starts by initializing hardware, and then enters an infinite loop where it repeatedly checks sensors, processes data, and controls outputs.

Let’s start with a simple line detection example using a single front optocoupler sensor. The robot is programmed to detect when it crosses a dark line (black tape) using this sensor. The LED on the board will turn on whenever the black surface is detected.

state-machine interact

import time
import machine

# Define onboard LED
onboard_led = machine.Pin("LED", machine.Pin.OUT)

# Use a single front-facing optocoupler sensor
tracking_rc = machine.Pin(4, machine.Pin.IN)  # Right-center/front sensor

OPTO_BLACK_SURFACE = 0  # non-reflective (black line)
OPTO_WHITE_SURFACE = 1  # reflective (floor)

def tracking_handler(pin=None):
    onboard_led.on()
    print(">>> Line detected!")

while True:
    print("Searching for line...")

    # Poll the sensor
    if tracking_rc.value() == OPTO_BLACK_SURFACE:
        tracking_handler()

    time.sleep(0.3)  # polling delay
    onboard_led.off()

In this example, we’re continuously polling the sensor inside a loop. This follows a simple embedded software pattern:
Read → Process → Write

  • Read: Check the sensor state
  • Process: If black surface is detected, call the handler
  • Write: Turn on/off the LED or log output

image

Try it!

Try slowly and quickly move the robot forward and backward over the black line. You’ll notice:

  • The onboard LED lights up when the sensor detects the line.

  • The onboard sensor’s blue LED (hardware indicator) may light up immediately, while the onboard LED controlled by software is a bit slower to respond—especially at higher speeds.

state-machine interact

This \(Δt\) delay occurs because the polling loop checks the sensor at intervals during each iteration of the while loop when reading the value of the input. This can cause the robot to detect the line at an inaccurate position due to the polling interval. To improve accuracy, we should detect the edges of the line as soon as they occur using optocouplers. By detecting high and low values during polling, we can ensure more precise detection.

However, if the robot moves fast enough that the duration of the low state is short enough to fall between polling intervals, it might miss the line detection entirely.


Responding in Real Time: Enter Interrupts

What if the robot could respond to the line detection the moment it happens, without waiting for the next round in the loop?

That's where interrupts come in.

Interrupts are like a doorbell for your microcontroller. You tell the hardware:

"If this input changes, ring this bell, and run this little function immediately."

That little function is called an Interrupt Service Routine (ISR).

Foreground/Background Architecture

When you combine a main loop with interrupts, you create what's called a Foreground/Background architecture: - Background: The main while True loop that runs continuously (lower priority) - Foreground: The ISRs that interrupt the background when events occur (higher priority)

This is the most common architecture in bare-metal embedded systems. The hardware handles the timing-critical work (foreground), while the main loop handles everything else (background).

In our case, instead of constantly checking the optocoupler sensor in a loop, we can ask the hardware to watch it for us, and call our line-detection function whenever the signal falls—which means the robot just crossed onto a black line.

Here’s the same line detection code, but now using an interrupt instead of polling:

import time
import machine

# Define onboard LED
onboard_led = machine.Pin("LED", machine.Pin.OUT)

# Use a single front-facing optocoupler sensor
tracking_rc = machine.Pin(4, machine.Pin.IN)

OPTO_BLACK_SURFACE = 0  # black line
OPTO_WHITE_SURFACE = 1  # white floor

def tracking_handler(pin=None):
    onboard_led.on()
    print(">>> Line detected!")

# Set up interrupt: trigger on falling edge (from 1 to 0)
tracking_rc.irq(handler=tracking_handler, trigger=machine.Pin.IRQ_FALLING)

while True:
    time.sleep(0.3)
    onboard_led.off()
    print("Searching for line...")

Try it!

Try slowly and quickly move the robot forward and backward over the black line. You’ll notice that the onboard LED now lights up at the same time as the blue LEDs.


How does it work

We use the irq (interrupt request) method to subscribe with a callback function. When the trigger event occurs, the callback function is called immediately by the ISR (Interrupt Service Routine).

Now, when the optocoupler changes its value, the tracking_handler() function runs right away—without polling. The robot no longer needs to constantly check for input; the hardware will notify it as soon as something happens.

sw-arch-round-robin-w-isr.dark

Note that the ISR will be executed asynchronously from the main execution loop, as the interrupt can occur at any time. This asynchronous nature is why the ISR is not depicted as connected in the diagram.

❓Question:
What part of the code controls how long the onboard LED stays on, and why might the LED appear to stay on for shorter or longer periods?

Tip

The green onboard LED is switched on inside the interrupt callback and turned off in the main loop. Because the ISR (Interrupt Service Routine) runs asynchronously, the exact moment the green LED turns on depends on when the interrupt occurs during the 0.3-second sleep interval.


Why Interrupts matter

Interrupts solve several critical problems in embedded systems:

  • Immediate response: The ISR is called instantly when the sensor sees a black surface.
  • Missed events: No more losing input between polls
  • Faster response time: ISRs are triggered within microseconds
  • Lower CPU usage: The processor can sleep instead of looping endlessly
  • Better scalability: You don’t need to check dozens of inputs every loop

This is the power of interrupts: hardware-driven, real-time response.

Feature Round-Robin Round-Robin + Interrupts
Simplicity ✅ Very Simple ✅ Simple, a bit more complex
Responsiveness ❌ Poor (polling) ✅ Good
Use of Hardware Features ❌ None ✅ Timers, GPIO interrupts, etc.
Timing Precision ❌ Low ✅ Medium (depends on ISR timing)
Response Time Depends on loop timing and polling delay Near-instantaneous (low interrupt latency)
Input Handling Polling-based, may miss fast changes Event-driven, reacts immediately
CPU Usage High — constant polling keeps CPU always active Low — CPU can enter sleep modes between interrupt events
Complexity Low — easy to understand and implement Moderate — requires managing ISRs and shared resources
Risk of Missed Input High for short pulses Very low if ISR is short and reliable
Ideal For Simple applications, education, basic systems Responsive tasks, real-time inputs, power-sensitive designs
Important

But there's a catch: ISRs must be short and simple. You shouldn't do complex processing inside an interrupt. Instead, set a flag or update a variable, and let the main loop handle the heavy lifting.


Best Practice: Flag-Based Event Handling

The golden rule for ISRs is: get in, set a flag, get out. Here's why and how:

import machine

# Event flag - shared between ISR and main loop
line_detected = False

def tracking_handler(pin):
    global line_detected
    line_detected = True  # Just set the flag, nothing else!

# Set up interrupt
tracking_rc = machine.Pin(4, machine.Pin.IN)
tracking_rc.irq(handler=tracking_handler, trigger=machine.Pin.IRQ_FALLING)

while True:
    if line_detected:
        line_detected = False  # Clear the flag
        # Now do the heavy work in the main loop
        print("Line detected! Processing...")
        # Complex calculations, motor control, etc.

    # Other background tasks...

This pattern separates event detection (ISR) from event processing (main loop):

In the ISR (Foreground) In the Main Loop (Background)
Set flags Check and clear flags
Update counters Process the events
Store timestamps Control motors
Keep it fast! Handle complex logic
Why Keep ISRs Short?
  • While an ISR runs, other interrupts may be blocked
  • Long ISRs cause missed events and timing problems
  • The main loop freezes until the ISR completes
  • Rule of thumb: ISRs should complete in microseconds, not milliseconds

But Wait—Now We Have a New Problem

Let’s say detecting the line with to optocouplers should change the robot’s direction. You now need to keep track of which mode you’re in and respond accordingly.

In small programs, developers often resort to a mix of global variables, nested if statements, and switch-case chains. But this quickly leads to messy, hard-to-maintain code.

What we need is a better way to manage the robot’s behavior. A way to model its modes, and define how it should transition between them in response to events.


Finite State Machines (FSMs)

A Finite State Machine is a model that represents all the modes (or "states") your system can be in, and defines how it transitions between them.

Instead of asking “What do I do now?”, your code starts by asking:
“What state am I in?”

From there, it decides what to do next based on the current state and the incoming event.

For example, in an FSM, you might define:

  • State: STOPPED

    • On button press → Transition to MOVING
  • State: MOVING

    • On button press → Transition to STOPPED

This structure is predictable, traceable, and easy to extend. If you later want to add a third state (like "PAUSED"), it slots naturally into the design.

You can even model it visually using UML state diagrams, which help you understand complex behaviors at a glance.


Choosing the Right Architecture

Here's a quick decision guide to help you choose the appropriate architecture for your project:

Your Situation Recommended Architecture
Learning the basics, simple blinking LED Round-Robin
Need to respond to button presses or sensors Round-Robin + Interrupts
Multiple inputs that can't be missed Foreground/Background with flags
Robot has different modes (stopped, moving, turning) State Machine
Complex behavior with many states and transitions State Machine + Interrupts
Need precise timing for multiple tasks Consider RTOS
Start Simple

Always start with the simplest architecture that works. You can always add complexity later. A working simple solution is better than a broken complex one.

The Bare-Metal Advantage

Why learn bare-metal programming when operating systems exist?

  1. Understanding: You learn how things really work at the hardware level
  2. Control: Complete control over timing and resources
  3. Efficiency: No OS overhead—your code runs faster with less memory
  4. Predictability: You know exactly what your code does and when
  5. Foundation: Essential knowledge before moving to RTOS or complex systems

Real-World Examples

Application Why Bare-Metal? Trade-off
TV Remote Control Runs on tiny battery for years, needs instant button response Simple functionality, no complex features
Motor Controller Microsecond-precise PWM timing for smooth motor control Single dedicated task, can't multitask easily
ECG Heart Monitor Predictable sampling rate critical for accurate readings Requires careful timing design
Car Airbag System Must deploy in <15ms, no time for OS overhead Safety-critical, extensively tested
LED Light Bulb Costs must be minimal, runs on cheapest possible chip Limited memory, basic features only
Industrial Sensor Runs for 10+ years on battery, sleeps 99% of time Complex networking needs external modules

When Bare-Metal Makes Sense

Use bare-metal when: - Cost per unit is critical (mass production) - Battery life is paramount - Timing must be predictable and fast - The system does one thing well - Memory is extremely limited (< 32KB RAM)

Consider an RTOS when: - You need to run multiple independent tasks - Communication stacks are complex (TCP/IP, Bluetooth) - Tasks have different timing requirements - Code maintainability is more important than raw performance

As an electrical engineer, understanding bare-metal firmware gives you insight into how hardware and software interact—knowledge that's valuable whether you're debugging a circuit, optimizing power consumption, or designing a new system.


Professional Context: Industrial & Automotive Architectures

Your super loop and state machine work for a robot. Professional systems use standardized, layered architectures designed for safety, certification, and team collaboration across multiple suppliers. Here's how they compare:

Architecture Comparison

Feature Bare-Metal (yours) Industrial PLC AUTOSAR Classic Safety-Critical
Structure Super loop + ISR Cyclic tasks Layered components Partitioned, isolated
Scheduling Manual / none Cyclic, priority OS tasks + runnables Time-partitioned
Configuration Code only IDE/ladder logic XML (ARXML) Formal specification
Code reuse Copy-paste Function blocks Standardized interfaces Certified libraries
Multi-vendor No standard IEC 61131-3 AUTOSAR standard DO-178C/ISO 26262
Verification Manual test PLC simulation Config validation Formal methods
Safety None PLe/SIL 3 optional ASIL-D capable Required, certified
Development 1 developer Team Multiple suppliers Certified process

AUTOSAR: Automotive Software Architecture

Modern cars don't use super loops. They use AUTOSAR—a standardized architecture for automotive ECUs:

Your robot (monolithic):
    main.py
    └── Everything in one file
        ├── Sensor reading
        ├── Motor control
        ├── State machine
        └── Display update

AUTOSAR Classic (layered):
    ┌─────────────────────────────────────────────────────────┐
    │              Application Layer (SWC)                    │
    │   ┌─────────┐  ┌─────────┐  ┌─────────┐               │
    │   │ Sensor  │  │  Motor  │  │ Diag    │  ← Your logic │
    │   │   SWC   │  │   SWC   │  │  SWC    │               │
    │   └────┬────┘  └────┬────┘  └────┬────┘               │
    ├────────┼────────────┼───────────┼──────────────────────┤
    │        └────────────┴───────────┘                      │
    │              RTE (Runtime Environment)                  │
    │              ← Virtual bus, generated code              │
    ├─────────────────────────────────────────────────────────┤
    │              Basic Software (BSW)                       │
    │   ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐         │
    │   │  COM   │ │  NvM   │ │  Dem   │ │  Os    │         │
    │   │ (Comm) │ │(Memory)│ │(Diag)  │ │(Sched) │         │
    │   └────────┘ └────────┘ └────────┘ └────────┘         │
    ├─────────────────────────────────────────────────────────┤
    │              MCAL (Microcontroller Abstraction)         │
    │   ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐         │
    │   │  DIO   │ │  ADC   │ │  PWM   │ │  CAN   │         │
    │   └────────┘ └────────┘ └────────┘ └────────┘         │
    └─────────────────────────────────────────────────────────┘
                            Hardware

    Key benefits:
    - SWC (Software Component) from Supplier A
    - BSW (Basic Software) from Supplier B
    - MCAL from chip vendor
    - All integrate via standardized interfaces (RTE)

Safety Architecture: Freedom From Interference

Safety-critical systems must prove components don't interfere:

Your robot:
    Motor code bug → robot crashes → annoying

    No isolation:
    ┌─────────────────────────────────┐
    │  Motor │ Sensor │ Display │ ... │ ← All share memory
    └─────────────────────────────────┘
    Bug in display code could corrupt motor variables!

Automotive ASIL-D (ISO 26262):
    Must prove: Steering software unaffected by infotainment bugs

    Memory protection:
    ┌──────────────────┐  ┌──────────────────┐
    │  ASIL-D Partition │  │  QM Partition    │
    │  (Safety-critical)│  │  (Non-safety)    │
    │  ├── Steering     │  │  ├── Radio       │
    │  ├── Braking      │  │  ├── Display     │
    │  └── Airbag       │  │  └── Navigation  │
    └──────────────────┘  └──────────────────┘
           │                      │
           └──────┬───────────────┘
         Hardware Memory Protection Unit (MPU)

    Infotainment crash → only infotainment restarts
    Steering keeps working (proven by analysis)

Aerospace: Time Partitioning (ARINC 653)

Aircraft systems use strict time separation:

Your robot (cooperative):
    while True:
        read_sensors()   # Takes unknown time
        update_motors()  # Could be delayed
        # No guarantees!

ARINC 653 (time-partitioned):
    ┌────────────────────────────────────────────────────┐
    │  Time    │ 0ms  10ms  20ms  30ms  40ms  50ms ...   │
    ├──────────┼─────────────────────────────────────────┤
    │ Flight   │ ████       ████       ████              │
    │ Control  │                                         │
    ├──────────┼─────────────────────────────────────────┤
    │ Engine   │      ████       ████       ████         │
    │ Monitor  │                                         │
    ├──────────┼─────────────────────────────────────────┤
    │ Display  │           ██         ██         ██      │
    │          │                                         │
    └──────────┴─────────────────────────────────────────┘

    Each partition:
    - Gets guaranteed time slice (can't be stolen)
    - Isolated memory (can't corrupt others)
    - Independent failure (one crash doesn't affect others)

    Flight control ALWAYS runs at 0ms, 30ms, 60ms...
    Even if display partition crashes

Model-Based Development

Professional systems are designed in models, not just code:

Your approach:
    1. Think about behavior
    2. Write Python code
    3. Test manually
    4. Debug → modify code → repeat

Model-Based Development (Simulink/Stateflow):
    1. Design in graphical model
    2. Simulate behavior (no hardware needed)
    3. Generate C code automatically
    4. Code matches model (by construction)

    ┌─────────────────────────────────────────────────┐
    │  Simulink Model                                 │
    │  ┌─────┐    ┌─────┐    ┌─────┐                │
    │  │Sensor├───►│Control├───►│Motor│               │
    │  └─────┘    └─────┘    └─────┘                │
    │             ← Click to simulate                 │
    └─────────────────────────────────────────────────┘
                      ▼ Code Generation
    ┌─────────────────────────────────────────────────┐
    │  /* Auto-generated - DO NOT EDIT */            │
    │  void control_step(void) {                     │
    │      rtY.motor = rtP.gain * rtU.sensor;        │
    │  }                                             │
    └─────────────────────────────────────────────────┘

    Benefits:
    - Simulate before building hardware
    - Code guaranteed correct by construction
    - Certified code generators (DO-178C, ISO 26262)
    - Model IS the documentation

Industrial PLC Architecture (IEC 61131-3)

Factory automation uses standardized programming:

Your robot:
    while True:
        if button and not running:
            start_motor()
            running = True
        # ... more spaghetti

IEC 61131-3 PLC (Structured Text / Function Blocks):
    ┌─────────────────────────────────────────────────┐
    │  Program Organization Units (POUs)              │
    ├─────────────────────────────────────────────────┤
    │  PROGRAM MainControl                            │
    │  VAR                                           │
    │      StartButton: BOOL;                        │
    │      Motor: FB_Motor;                          │
    │      Sequence: FB_Sequence;                    │
    │  END_VAR                                       │
    │                                                │
    │  (* Cyclic execution every 10ms *)             │
    │  Sequence(Start := StartButton,               │
    │           Motor := Motor);                     │
    │  END_PROGRAM                                   │
    └─────────────────────────────────────────────────┘

    Task configuration:
    ├── Task_Fast (1ms cycle) ─── Motion control
    ├── Task_Normal (10ms cycle) ─── Logic, I/O
    └── Task_Slow (100ms cycle) ─── HMI, logging

    Key differences from your code:
    - Deterministic cycle times (guaranteed)
    - Standardized function blocks (reusable)
    - Visual programming available (Ladder, FBD, SFC)
    - Built-in I/O mapping, diagnostics

Mixed-Criticality Systems

Modern systems combine different safety levels:

Your robot: Everything same priority, same reliability needs

Real system (car door ECU):
    ┌────────────────────────────────────────────────┐
    │  Component          │ ASIL │ Consequence       │
    ├─────────────────────┼──────┼───────────────────┤
    │  Child lock control │ ASIL-B │ Child injury    │
    │  Window pinch detect│ ASIL-B │ Finger injury   │
    │  Mirror adjustment  │ QM    │ Inconvenience    │
    │  Ambient lighting   │ QM    │ Annoying         │
    └────────────────────────────────────────────────┘

    Challenge: Run all on ONE microcontroller
    Solution: Software partitioning + MPU + certified OS

    ASIL-B code: Certified compiler, full coverage, reviewed
    QM code: Normal development process
    Both coexist safely (proven by architecture analysis)

Development Process Comparison

Aspect Your Project Industrial Automotive (ASPICE) Aerospace (DO-178C)
Requirements "Make it work" Documented Formal, traced Formal, verified
Design In your head Diagrams ARXML + SysML Formal models
Coding Just write it Standards MISRA-C, AUTOSAR Certified subset
Testing Manual Automated Coverage targets MC/DC 100%
Review Maybe yourself Peer review Independent review Independent team
Traceability None Partial Full Bidirectional
Change control Git commit Tickets Impact analysis Formal process

What the Industry Uses

Manufacturer Product Application
Vector MICROSAR AUTOSAR basic software
ETAS ISOLAR/RTA AUTOSAR tools and OS
dSPACE TargetLink Production code generation
MathWorks Embedded Coder Simulink to C code
Wind River VxWorks Safety-critical RTOS
Green Hills INTEGRITY Separation kernel, DO-178C
Siemens TIA Portal PLC engineering
Elektrobit EB tresos Automotive software platform
ANSYS SCADE Certified model-based dev

Hardware Limits Principle

What Software Can and Cannot Fix

Software CAN improve: - Code organization → layers, modules, components - Maintainability → separation of concerns, interfaces - Reusability → standardized APIs, function blocks - Testability → modular design, dependency injection - Team collaboration → clear interfaces between components

Software CANNOT fix: - No memory protection → MPU/MMU hardware required for isolation - No deterministic timing → need RTOS or time-partitioned OS - No formal verification → requires specialized tools and process - No safety certification → requires qualified tools, process evidence - Python GC pauses → use C/C++ for real-time guarantees - Single point of failure → need hardware redundancy

The lesson: A well-structured super loop can handle a robot. A car with 100+ ECUs from 50+ suppliers needs AUTOSAR standardization. An aircraft with DO-178C Level A certification needs proven separation, certified tools, and formal methods. The principles (modularity, separation) are the same—but the rigor, tooling, and hardware support are completely different.

Real Example: Adding a Feature

Task Your Robot AUTOSAR Vehicle
"Add temperature monitoring" 10 lines of code, done
Requirements None Formal requirement in DOORS
Architecture Add to main.py New SWC, update ARXML
Interfaces Direct function call RTE port definition
Implementation Write Python Generate code + implement
Testing Try it Unit test + integration test
Review Self Peer review + safety review
Documentation Comment maybe Updated design docs
Time 30 minutes 2-4 weeks

Your 30-minute change is fine for a robot. The 2-week process ensures a car's temperature monitoring won't interfere with braking or steering—ever, in any condition, for 15 years.


Further Reading


Next Steps