Software Architectures
How to Design Embedded Software
When designing embedded software, architecture is one of the most critical—but often overlooked—elements. Unlike desktop or web applications, embedded systems must respond to real-world events in real-time, using limited memory and processing power. The software must not only be correct—it must be reliable, responsive, and efficient.
But here's the thing: software architectures in embedded systems aren't laws—they're tools. They're meant to guide your design, not restrict it. In practice, blending multiple approaches or bending the rules a little often leads to more practical, maintainable code. Simplicity should always win over unnecessary complexity.
Bare-Metal Programming
This guide focuses on bare-metal (also called firmware-level) programming—where your code runs directly on the microcontroller hardware without an operating system. There's no scheduler, no threads, no OS services. Just your code and the hardware. This is the foundation of embedded systems and essential knowledge for electrical engineers.
What about the bootloader? Even in bare-metal systems, there's usually a small piece of code called a bootloader that runs first when the microcontroller powers on. For example, the RP2040/RP2350 on your Pico has a built-in bootloader in ROM that handles USB programming (UF2 mode). The bootloader then hands control to your application code. Think of it as the "starter motor" that gets your main code running—you don't write it, but it's good to know it exists.
Let's walk through the most common architectural patterns in bare-metal embedded systems, understand their benefits and limitations, and see how to evolve from basic loops to more structured, reactive designs.
Architecture Overview
As your embedded application grows in complexity, you'll naturally progress through different architectural patterns. Here's the typical evolution:
┌─────────────────┐ ┌─────────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Round-Robin │ ──► │ Round-Robin + ISR │ ──► │ State Machine │ ──► │ RTOS │
2│ (Super Loop) │ │ (Foreground/ │ │ (Event-Driven) │ │ (Multi-task) │
│ │ │ Background) │ │ │ │ │
└─────────────────┘ └─────────────────────┘ └─────────────────┘ └─────────────────┘
Simplest More Responsive Most Structured OS-managed tasks
Polling-based Hardware-triggered Organized behavior FreeRTOS, Zephyr
Each pattern builds on the previous one. You don't always need the most complex solution—choose the simplest architecture that meets your requirements.
What About RTOS and Other Operating Systems?
Beyond bare-metal, there's RTOS (Real-Time Operating System) which provides task scheduling: - Cooperative multitasking: Tasks voluntarily yield control - Preemptive multitasking: OS interrupts tasks based on priority (FreeRTOS, Zephyr)
RTOS still runs on microcontrollers (MCUs) without memory protection. For more details, see RTOS Introduction.
Full operating systems (Linux, Windows) require microprocessors (MPUs) with MMU (Memory Management Unit) for memory protection and virtual memory. That's covered in the Embedded Linux course—not here.
Think Before You Code
Before writing a single line of code, take a step back. Good embedded software design starts with understanding the problem and planning the system behavior. Code should be a translation of logic, not a place to invent it as you go.
Starting Simple: Round-Robin (Super Loop)
In bare-metal systems—where no operating system is involved—the most common and intuitive structure is called the round-robin or super loop. Your program starts by initializing hardware, and then enters an infinite loop where it repeatedly checks sensors, processes data, and controls outputs.
Let’s start with a simple line detection example using a single front optocoupler sensor. The robot is programmed to detect when it crosses a dark line (black tape) using this sensor. The LED on the board will turn on whenever the black surface is detected.
import time
import machine
# Define onboard LED
onboard_led = machine.Pin("LED", machine.Pin.OUT)
# Use a single front-facing optocoupler sensor
tracking_rc = machine.Pin(4, machine.Pin.IN) # Right-center/front sensor
OPTO_BLACK_SURFACE = 0 # non-reflective (black line)
OPTO_WHITE_SURFACE = 1 # reflective (floor)
def tracking_handler(pin=None):
onboard_led.on()
print(">>> Line detected!")
while True:
print("Searching for line...")
# Poll the sensor
if tracking_rc.value() == OPTO_BLACK_SURFACE:
tracking_handler()
time.sleep(0.3) # polling delay
onboard_led.off()
In this example, we’re continuously polling the sensor inside a loop. This follows a simple embedded software pattern:
Read → Process → Write
- Read: Check the sensor state
- Process: If black surface is detected, call the handler
- Write: Turn on/off the LED or log output
Try it!
Try slowly and quickly move the robot forward and backward over the black line. You’ll notice:
-
The onboard LED lights up when the sensor detects the line.
-
The onboard sensor’s blue LED (hardware indicator) may light up immediately, while the onboard LED controlled by software is a bit slower to respond—especially at higher speeds.
This \(Δt\) delay occurs because the polling loop checks the sensor at intervals during each iteration of the while loop when reading the value of the input. This can cause the robot to detect the line at an inaccurate position due to the polling interval. To improve accuracy, we should detect the edges of the line as soon as they occur using optocouplers. By detecting high and low values during polling, we can ensure more precise detection.
However, if the robot moves fast enough that the duration of the low state is short enough to fall between polling intervals, it might miss the line detection entirely.
Responding in Real Time: Enter Interrupts
What if the robot could respond to the line detection the moment it happens, without waiting for the next round in the loop?
That's where interrupts come in.
Interrupts are like a doorbell for your microcontroller. You tell the hardware:
"If this input changes, ring this bell, and run this little function immediately."
That little function is called an Interrupt Service Routine (ISR).
Foreground/Background Architecture
When you combine a main loop with interrupts, you create what's called a Foreground/Background architecture:
- Background: The main while True loop that runs continuously (lower priority)
- Foreground: The ISRs that interrupt the background when events occur (higher priority)
This is the most common architecture in bare-metal embedded systems. The hardware handles the timing-critical work (foreground), while the main loop handles everything else (background).
In our case, instead of constantly checking the optocoupler sensor in a loop, we can ask the hardware to watch it for us, and call our line-detection function whenever the signal falls—which means the robot just crossed onto a black line.
Here’s the same line detection code, but now using an interrupt instead of polling:
import time
import machine
# Define onboard LED
onboard_led = machine.Pin("LED", machine.Pin.OUT)
# Use a single front-facing optocoupler sensor
tracking_rc = machine.Pin(4, machine.Pin.IN)
OPTO_BLACK_SURFACE = 0 # black line
OPTO_WHITE_SURFACE = 1 # white floor
def tracking_handler(pin=None):
onboard_led.on()
print(">>> Line detected!")
# Set up interrupt: trigger on falling edge (from 1 to 0)
tracking_rc.irq(handler=tracking_handler, trigger=machine.Pin.IRQ_FALLING)
while True:
time.sleep(0.3)
onboard_led.off()
print("Searching for line...")
Try it!
Try slowly and quickly move the robot forward and backward over the black line. You’ll notice that the onboard LED now lights up at the same time as the blue LEDs.
How does it work
We use the irq (interrupt request) method to subscribe with a callback function. When the trigger event occurs, the callback function is called immediately by the ISR (Interrupt Service Routine).
Now, when the optocoupler changes its value, the tracking_handler() function runs right away—without polling. The robot no longer needs to constantly check for input; the hardware will notify it as soon as something happens.
Note that the ISR will be executed asynchronously from the main execution loop, as the interrupt can occur at any time. This asynchronous nature is why the ISR is not depicted as connected in the diagram.
❓Question:
What part of the code controls how long the onboard LED stays on, and why might the LED appear to stay on for shorter or longer periods?
Tip
The green onboard LED is switched on inside the interrupt callback and turned off in the main loop. Because the ISR (Interrupt Service Routine) runs asynchronously, the exact moment the green LED turns on depends on when the interrupt occurs during the 0.3-second sleep interval.
Why Interrupts matter
Interrupts solve several critical problems in embedded systems:
- Immediate response: The ISR is called instantly when the sensor sees a black surface.
- Missed events: No more losing input between polls
- Faster response time: ISRs are triggered within microseconds
- Lower CPU usage: The processor can sleep instead of looping endlessly
- Better scalability: You don’t need to check dozens of inputs every loop
This is the power of interrupts: hardware-driven, real-time response.
| Feature | Round-Robin | Round-Robin + Interrupts |
|---|---|---|
| Simplicity | ✅ Very Simple | ✅ Simple, a bit more complex |
| Responsiveness | ❌ Poor (polling) | ✅ Good |
| Use of Hardware Features | ❌ None | ✅ Timers, GPIO interrupts, etc. |
| Timing Precision | ❌ Low | ✅ Medium (depends on ISR timing) |
| Response Time | Depends on loop timing and polling delay | Near-instantaneous (low interrupt latency) |
| Input Handling | Polling-based, may miss fast changes | Event-driven, reacts immediately |
| CPU Usage | High — constant polling keeps CPU always active | Low — CPU can enter sleep modes between interrupt events |
| Complexity | Low — easy to understand and implement | Moderate — requires managing ISRs and shared resources |
| Risk of Missed Input | High for short pulses | Very low if ISR is short and reliable |
| Ideal For | Simple applications, education, basic systems | Responsive tasks, real-time inputs, power-sensitive designs |
Important
But there's a catch: ISRs must be short and simple. You shouldn't do complex processing inside an interrupt. Instead, set a flag or update a variable, and let the main loop handle the heavy lifting.
Best Practice: Flag-Based Event Handling
The golden rule for ISRs is: get in, set a flag, get out. Here's why and how:
import machine
# Event flag - shared between ISR and main loop
line_detected = False
def tracking_handler(pin):
global line_detected
line_detected = True # Just set the flag, nothing else!
# Set up interrupt
tracking_rc = machine.Pin(4, machine.Pin.IN)
tracking_rc.irq(handler=tracking_handler, trigger=machine.Pin.IRQ_FALLING)
while True:
if line_detected:
line_detected = False # Clear the flag
# Now do the heavy work in the main loop
print("Line detected! Processing...")
# Complex calculations, motor control, etc.
# Other background tasks...
This pattern separates event detection (ISR) from event processing (main loop):
| In the ISR (Foreground) | In the Main Loop (Background) |
|---|---|
| Set flags | Check and clear flags |
| Update counters | Process the events |
| Store timestamps | Control motors |
| Keep it fast! | Handle complex logic |
Why Keep ISRs Short?
- While an ISR runs, other interrupts may be blocked
- Long ISRs cause missed events and timing problems
- The main loop freezes until the ISR completes
- Rule of thumb: ISRs should complete in microseconds, not milliseconds
But Wait—Now We Have a New Problem
Let’s say detecting the line with to optocouplers should change the robot’s direction. You now need to keep track of which mode you’re in and respond accordingly.
In small programs, developers often resort to a mix of global variables, nested if statements, and switch-case chains. But this quickly leads to messy, hard-to-maintain code.
What we need is a better way to manage the robot’s behavior. A way to model its modes, and define how it should transition between them in response to events.
Finite State Machines (FSMs)
A Finite State Machine is a model that represents all the modes (or "states") your system can be in, and defines how it transitions between them.
Instead of asking “What do I do now?”, your code starts by asking:
“What state am I in?”
From there, it decides what to do next based on the current state and the incoming event.
For example, in an FSM, you might define:
-
State: STOPPED
- On button press → Transition to MOVING
-
State: MOVING
- On button press → Transition to STOPPED
This structure is predictable, traceable, and easy to extend. If you later want to add a third state (like "PAUSED"), it slots naturally into the design.
You can even model it visually using UML state diagrams, which help you understand complex behaviors at a glance.
Choosing the Right Architecture
Here's a quick decision guide to help you choose the appropriate architecture for your project:
| Your Situation | Recommended Architecture |
|---|---|
| Learning the basics, simple blinking LED | Round-Robin |
| Need to respond to button presses or sensors | Round-Robin + Interrupts |
| Multiple inputs that can't be missed | Foreground/Background with flags |
| Robot has different modes (stopped, moving, turning) | State Machine |
| Complex behavior with many states and transitions | State Machine + Interrupts |
| Need precise timing for multiple tasks | Consider RTOS |
Start Simple
Always start with the simplest architecture that works. You can always add complexity later. A working simple solution is better than a broken complex one.
The Bare-Metal Advantage
Why learn bare-metal programming when operating systems exist?
- Understanding: You learn how things really work at the hardware level
- Control: Complete control over timing and resources
- Efficiency: No OS overhead—your code runs faster with less memory
- Predictability: You know exactly what your code does and when
- Foundation: Essential knowledge before moving to RTOS or complex systems
Real-World Examples
| Application | Why Bare-Metal? | Trade-off |
|---|---|---|
| TV Remote Control | Runs on tiny battery for years, needs instant button response | Simple functionality, no complex features |
| Motor Controller | Microsecond-precise PWM timing for smooth motor control | Single dedicated task, can't multitask easily |
| ECG Heart Monitor | Predictable sampling rate critical for accurate readings | Requires careful timing design |
| Car Airbag System | Must deploy in <15ms, no time for OS overhead | Safety-critical, extensively tested |
| LED Light Bulb | Costs must be minimal, runs on cheapest possible chip | Limited memory, basic features only |
| Industrial Sensor | Runs for 10+ years on battery, sleeps 99% of time | Complex networking needs external modules |
When Bare-Metal Makes Sense
✅ Use bare-metal when: - Cost per unit is critical (mass production) - Battery life is paramount - Timing must be predictable and fast - The system does one thing well - Memory is extremely limited (< 32KB RAM)
❌ Consider an RTOS when: - You need to run multiple independent tasks - Communication stacks are complex (TCP/IP, Bluetooth) - Tasks have different timing requirements - Code maintainability is more important than raw performance
As an electrical engineer, understanding bare-metal firmware gives you insight into how hardware and software interact—knowledge that's valuable whether you're debugging a circuit, optimizing power consumption, or designing a new system.
Professional Context: Industrial & Automotive Architectures
Your super loop and state machine work for a robot. Professional systems use standardized, layered architectures designed for safety, certification, and team collaboration across multiple suppliers. Here's how they compare:
Architecture Comparison
| Feature | Bare-Metal (yours) | Industrial PLC | AUTOSAR Classic | Safety-Critical |
|---|---|---|---|---|
| Structure | Super loop + ISR | Cyclic tasks | Layered components | Partitioned, isolated |
| Scheduling | Manual / none | Cyclic, priority | OS tasks + runnables | Time-partitioned |
| Configuration | Code only | IDE/ladder logic | XML (ARXML) | Formal specification |
| Code reuse | Copy-paste | Function blocks | Standardized interfaces | Certified libraries |
| Multi-vendor | No standard | IEC 61131-3 | AUTOSAR standard | DO-178C/ISO 26262 |
| Verification | Manual test | PLC simulation | Config validation | Formal methods |
| Safety | None | PLe/SIL 3 optional | ASIL-D capable | Required, certified |
| Development | 1 developer | Team | Multiple suppliers | Certified process |
AUTOSAR: Automotive Software Architecture
Modern cars don't use super loops. They use AUTOSAR—a standardized architecture for automotive ECUs:
Your robot (monolithic):
main.py
└── Everything in one file
├── Sensor reading
├── Motor control
├── State machine
└── Display update
AUTOSAR Classic (layered):
┌─────────────────────────────────────────────────────────┐
│ Application Layer (SWC) │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Sensor │ │ Motor │ │ Diag │ ← Your logic │
│ │ SWC │ │ SWC │ │ SWC │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ │
├────────┼────────────┼───────────┼──────────────────────┤
│ └────────────┴───────────┘ │
│ RTE (Runtime Environment) │
│ ← Virtual bus, generated code │
├─────────────────────────────────────────────────────────┤
│ Basic Software (BSW) │
│ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │
│ │ COM │ │ NvM │ │ Dem │ │ Os │ │
│ │ (Comm) │ │(Memory)│ │(Diag) │ │(Sched) │ │
│ └────────┘ └────────┘ └────────┘ └────────┘ │
├─────────────────────────────────────────────────────────┤
│ MCAL (Microcontroller Abstraction) │
│ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │
│ │ DIO │ │ ADC │ │ PWM │ │ CAN │ │
│ └────────┘ └────────┘ └────────┘ └────────┘ │
└─────────────────────────────────────────────────────────┘
Hardware
Key benefits:
- SWC (Software Component) from Supplier A
- BSW (Basic Software) from Supplier B
- MCAL from chip vendor
- All integrate via standardized interfaces (RTE)
Safety Architecture: Freedom From Interference
Safety-critical systems must prove components don't interfere:
Your robot:
Motor code bug → robot crashes → annoying
No isolation:
┌─────────────────────────────────┐
│ Motor │ Sensor │ Display │ ... │ ← All share memory
└─────────────────────────────────┘
Bug in display code could corrupt motor variables!
Automotive ASIL-D (ISO 26262):
Must prove: Steering software unaffected by infotainment bugs
Memory protection:
┌──────────────────┐ ┌──────────────────┐
│ ASIL-D Partition │ │ QM Partition │
│ (Safety-critical)│ │ (Non-safety) │
│ ├── Steering │ │ ├── Radio │
│ ├── Braking │ │ ├── Display │
│ └── Airbag │ │ └── Navigation │
└──────────────────┘ └──────────────────┘
│ │
└──────┬───────────────┘
│
Hardware Memory Protection Unit (MPU)
Infotainment crash → only infotainment restarts
Steering keeps working (proven by analysis)
Aerospace: Time Partitioning (ARINC 653)
Aircraft systems use strict time separation:
Your robot (cooperative):
while True:
read_sensors() # Takes unknown time
update_motors() # Could be delayed
# No guarantees!
ARINC 653 (time-partitioned):
┌────────────────────────────────────────────────────┐
│ Time │ 0ms 10ms 20ms 30ms 40ms 50ms ... │
├──────────┼─────────────────────────────────────────┤
│ Flight │ ████ ████ ████ │
│ Control │ │
├──────────┼─────────────────────────────────────────┤
│ Engine │ ████ ████ ████ │
│ Monitor │ │
├──────────┼─────────────────────────────────────────┤
│ Display │ ██ ██ ██ │
│ │ │
└──────────┴─────────────────────────────────────────┘
Each partition:
- Gets guaranteed time slice (can't be stolen)
- Isolated memory (can't corrupt others)
- Independent failure (one crash doesn't affect others)
Flight control ALWAYS runs at 0ms, 30ms, 60ms...
Even if display partition crashes
Model-Based Development
Professional systems are designed in models, not just code:
Your approach:
1. Think about behavior
2. Write Python code
3. Test manually
4. Debug → modify code → repeat
Model-Based Development (Simulink/Stateflow):
1. Design in graphical model
2. Simulate behavior (no hardware needed)
3. Generate C code automatically
4. Code matches model (by construction)
┌─────────────────────────────────────────────────┐
│ Simulink Model │
│ ┌─────┐ ┌─────┐ ┌─────┐ │
│ │Sensor├───►│Control├───►│Motor│ │
│ └─────┘ └─────┘ └─────┘ │
│ ← Click to simulate │
└─────────────────────────────────────────────────┘
│
▼ Code Generation
┌─────────────────────────────────────────────────┐
│ /* Auto-generated - DO NOT EDIT */ │
│ void control_step(void) { │
│ rtY.motor = rtP.gain * rtU.sensor; │
│ } │
└─────────────────────────────────────────────────┘
Benefits:
- Simulate before building hardware
- Code guaranteed correct by construction
- Certified code generators (DO-178C, ISO 26262)
- Model IS the documentation
Industrial PLC Architecture (IEC 61131-3)
Factory automation uses standardized programming:
Your robot:
while True:
if button and not running:
start_motor()
running = True
# ... more spaghetti
IEC 61131-3 PLC (Structured Text / Function Blocks):
┌─────────────────────────────────────────────────┐
│ Program Organization Units (POUs) │
├─────────────────────────────────────────────────┤
│ PROGRAM MainControl │
│ VAR │
│ StartButton: BOOL; │
│ Motor: FB_Motor; │
│ Sequence: FB_Sequence; │
│ END_VAR │
│ │
│ (* Cyclic execution every 10ms *) │
│ Sequence(Start := StartButton, │
│ Motor := Motor); │
│ END_PROGRAM │
└─────────────────────────────────────────────────┘
Task configuration:
├── Task_Fast (1ms cycle) ─── Motion control
├── Task_Normal (10ms cycle) ─── Logic, I/O
└── Task_Slow (100ms cycle) ─── HMI, logging
Key differences from your code:
- Deterministic cycle times (guaranteed)
- Standardized function blocks (reusable)
- Visual programming available (Ladder, FBD, SFC)
- Built-in I/O mapping, diagnostics
Mixed-Criticality Systems
Modern systems combine different safety levels:
Your robot: Everything same priority, same reliability needs
Real system (car door ECU):
┌────────────────────────────────────────────────┐
│ Component │ ASIL │ Consequence │
├─────────────────────┼──────┼───────────────────┤
│ Child lock control │ ASIL-B │ Child injury │
│ Window pinch detect│ ASIL-B │ Finger injury │
│ Mirror adjustment │ QM │ Inconvenience │
│ Ambient lighting │ QM │ Annoying │
└────────────────────────────────────────────────┘
Challenge: Run all on ONE microcontroller
Solution: Software partitioning + MPU + certified OS
ASIL-B code: Certified compiler, full coverage, reviewed
QM code: Normal development process
Both coexist safely (proven by architecture analysis)
Development Process Comparison
| Aspect | Your Project | Industrial | Automotive (ASPICE) | Aerospace (DO-178C) |
|---|---|---|---|---|
| Requirements | "Make it work" | Documented | Formal, traced | Formal, verified |
| Design | In your head | Diagrams | ARXML + SysML | Formal models |
| Coding | Just write it | Standards | MISRA-C, AUTOSAR | Certified subset |
| Testing | Manual | Automated | Coverage targets | MC/DC 100% |
| Review | Maybe yourself | Peer review | Independent review | Independent team |
| Traceability | None | Partial | Full | Bidirectional |
| Change control | Git commit | Tickets | Impact analysis | Formal process |
What the Industry Uses
| Manufacturer | Product | Application |
|---|---|---|
| Vector | MICROSAR | AUTOSAR basic software |
| ETAS | ISOLAR/RTA | AUTOSAR tools and OS |
| dSPACE | TargetLink | Production code generation |
| MathWorks | Embedded Coder | Simulink to C code |
| Wind River | VxWorks | Safety-critical RTOS |
| Green Hills | INTEGRITY | Separation kernel, DO-178C |
| Siemens | TIA Portal | PLC engineering |
| Elektrobit | EB tresos | Automotive software platform |
| ANSYS | SCADE | Certified model-based dev |
Hardware Limits Principle
What Software Can and Cannot Fix
Software CAN improve: - Code organization → layers, modules, components - Maintainability → separation of concerns, interfaces - Reusability → standardized APIs, function blocks - Testability → modular design, dependency injection - Team collaboration → clear interfaces between components
Software CANNOT fix: - No memory protection → MPU/MMU hardware required for isolation - No deterministic timing → need RTOS or time-partitioned OS - No formal verification → requires specialized tools and process - No safety certification → requires qualified tools, process evidence - Python GC pauses → use C/C++ for real-time guarantees - Single point of failure → need hardware redundancy
The lesson: A well-structured super loop can handle a robot. A car with 100+ ECUs from 50+ suppliers needs AUTOSAR standardization. An aircraft with DO-178C Level A certification needs proven separation, certified tools, and formal methods. The principles (modularity, separation) are the same—but the rigor, tooling, and hardware support are completely different.
Real Example: Adding a Feature
| Task | Your Robot | AUTOSAR Vehicle |
|---|---|---|
| "Add temperature monitoring" | 10 lines of code, done | |
| Requirements | None | Formal requirement in DOORS |
| Architecture | Add to main.py | New SWC, update ARXML |
| Interfaces | Direct function call | RTE port definition |
| Implementation | Write Python | Generate code + implement |
| Testing | Try it | Unit test + integration test |
| Review | Self | Peer review + safety review |
| Documentation | Comment maybe | Updated design docs |
| Time | 30 minutes | 2-4 weeks |
Your 30-minute change is fine for a robot. The 2-week process ensures a car's temperature monitoring won't interfere with braking or steering—ever, in any condition, for 15 years.
Further Reading
- Industrial Architectures - Detailed AUTOSAR, ARINC 653, IEC 61131-3 reference
- State Machines - Structured behavior modeling
- Interrupts - Event-driven programming
- RTOS Introduction - When to use an OS
- AUTOSAR Overview - Automotive architecture standard
- IEC 61131-3 - Industrial PLC programming standard
- DO-178C Overview - Aerospace software certification