Lesson 8: Boot Flow & Architectures
Why does a Pi boot in 35 seconds but an STM32 in 50 ms?
Óbuda University — Linux in Embedded Systems
Today's Map
- Block 1 (45 min): Boot flow: five boot stages,
dmesgwalkthrough,systemd-analyze, boot optimization, debugging by stage. - Block 2 (45 min): Boot architectures: MCU vs SoC vs PC, platform boot flows, secure boot chains, heterogeneous SoCs, design exercise.
Problem First
Your device boots in 35 seconds and occasionally hangs before the app starts.
You need to answer two questions:
- Which stage is slow?
- Which stage failed?
Without a staged boot model, this is mostly guesswork. You end up staring at a blank screen wondering whether the bootloader, kernel, or your application is the problem.
Block 1
Boot Stages, dmesg, systemd-analyze
Boot Is a Pipeline, Not a Single Step
An embedded Linux system boots in stages, each handing off to the next:
+----------+ +------------+ +--------+ +---------+ +-------+
| ROM Boot | -> | Bootloader | -> | Kernel | -> | systemd | -> | App |
| (SoC) | | (U-Boot) | | (Linux)| | (init) | | |
+----------+ +------------+ +--------+ +---------+ +-------+
~0.5s ~1-2s ~2-3s ~5-20s ~1-5s
*Times shown are typical for RPi 4 with stock Raspberry Pi OS.*
Each stage has its own failure modes, its own debugging tools, and its own optimization opportunities.
The Five Boot Stages
| # | Stage | What It Does |
|---|---|---|
| 1 | ROM / SoC boot | Hardwired in silicon; finds bootloader on storage |
| 2 | Bootloader (U-Boot) | Initializes DRAM, clocks, storage; loads kernel |
| 3 | Linux kernel | Probes HW via device tree, loads drivers, mounts rootfs |
| 4 | Init system (systemd) | Starts services in dependency order |
| 5 | User application | Your product code begins running |

The five boot stages from power-on to application: ROM, bootloader, kernel, init system, and user application.
Stage 1 — ROM Boot Code
- Hardcoded in silicon at the factory — you cannot change it
- Reads boot pins or fuses to determine boot source (SD, eMMC, USB, NAND)
- Loads the first external code (bootloader) into internal SRAM
- Duration: < 1 second typically
If this fails: nothing appears on any console. The board looks dead. Check power, boot pin configuration, and storage media.
Stage 2 — Bootloader (U-Boot)
The bootloader does the minimum to load the kernel:
- Initialize DRAM (the kernel needs memory to decompress into)
- Initialize clocks and basic storage (SD/eMMC/NAND)
- Load the kernel image + device tree into DRAM
- Jump to the kernel entry point
If this fails: you see partial output on the serial console, then silence. Typical causes: wrong DRAM timing, corrupt kernel image, wrong boot device.
Stage 3 — Linux Kernel
The kernel takes over and initializes the full hardware platform:
- Decompresses itself into memory
- Parses the device tree to discover hardware
- Probes drivers in dependency order
- Mounts the root filesystem
- Launches PID 1 (the init system)
If this fails: dmesg captures everything. Look for missing driver probes, device tree errors, or filesystem mount failures.
Stages 4 & 5 — Init System and Application
Stage 4 — systemd starts user-space services in dependency order:
- Networking, logging, D-Bus, time sync, udev
- Most boot time lives here — stock Pi OS spends 10-20 s in this stage
- Debug with: journalctl -b and systemctl status <service>
Stage 5 — Your application finally runs: - All hardware initialized, all services available - Design goal: start as soon as dependencies are met, not after everything else
Where Hardware Is Initialized
Each stage initializes only what the next stage needs:
| Stage | Initializes | For Whom |
|---|---|---|
| Bootloader | Clocks, DRAM, storage basics | Kernel |
| Kernel | Drivers, device tree probing, modules | User space |
| User space | Service config, app-level HW policy | Application |
Anti-pattern: putting I2C sensor init in the bootloader (kernel's job). Anti-pattern: deciding which sensor readings to log in a kernel driver (application's job).
Live Demo: dmesg Walkthrough
dmesg shows every kernel message since boot with timestamps:
[ 0.000000] Booting Linux on physical CPU 0x0 <- kernel starts
[ 0.000000] Machine model: Raspberry Pi 4 B <- DT identified board
[ 0.524173] spi-bcm2835 fe204000.spi: CS0 <- SPI init
[ 1.023456] i2c_dev: i2c /dev entries driver <- I2C ready
[ 1.245678] EXT4-fs (mmcblk0p2): mounted <- rootfs mounted
Timestamps = seconds since kernel start. Gaps reveal where time is spent.
| Pattern in dmesg | Meaning |
|---|---|
| Large timestamp gap | A driver or subsystem is slow |
error / timeout |
Something went wrong |
deferred |
Driver waiting for a dependency |
Try It Now: Read the Boot Log (5 min)
Inspect your Pi's boot log and identify the five boot stages from real timestamps:
# First 30 lines — find when the kernel started
dmesg | head -30
# Search for key milestones
dmesg | grep -i "kernel command line"
dmesg | grep -i "mounted"
dmesg | grep -i "systemd"
# Find the largest timestamp gap (slow driver or subsystem)
dmesg | awk '{print $1}' | head -50
Which stage takes the longest? Where is the biggest gap?
Tutorial: Boot Timing Lab — Section 1: Measure Boot Time Theory: Section 1: Boot Stages
Live Demo: systemd-analyze
$ systemd-analyze time
Startup finished in 1.512s (kernel) + 12.345s (userspace) = 13.857s
graphical.target reached after 12.100s in userspace
This output shows
graphical.target— a desktop target. Embedded systems typically usemulti-user.target(no GUI) or a custom target, which boots faster.
$ systemd-analyze blame
5.012s apt-daily.service
3.456s NetworkManager-wait-online.service
1.234s dev-mmcblk0p2.device
0.987s bluetooth.service
0.543s avahi-daemon.service
Kernel: 1.5 s. Userspace: 12.3 s. The bottleneck is userspace. Lists services slowest first — disable what you do not need.
Try It Now: Find the Bottleneck (5 min)
Use systemd-analyze to find what is slowing down your boot:
# Overall boot time breakdown
systemd-analyze time
# Top 10 slowest services
systemd-analyze blame | head -10
# Critical path — what actually blocked boot
systemd-analyze critical-chain
Which service takes the most time? Could you disable it on an embedded device?
Tutorial: Boot Timing Lab — Section 2: Analyze Services Theory: Section 3: systemd and Init
Boot Time Optimization Strategy
Principle: measure first, optimize second. Never guess.
| Optimization | Time Saved | Effort |
|---|---|---|
| Disable Bluetooth service | ~0.5 s | Low |
| Disable avahi-daemon (mDNS) | ~1.0 s | Low |
| Remove desktop/GUI packages | ~5-15 s | Medium |
| Use Buildroot instead of stock OS | ~10-25 s | High |
| Kernel: disable unused drivers | ~1-3 s | High |
Stock Pi OS: 15-35 s. Tuned Buildroot: 3-10 s.
Additional strategies: parallelize non-dependent services, defer non-critical init, avoid blocking on optional devices.
Debugging by Stage — Which Tool?
+------------------+---------------------------+
| Stage | Primary Tool |
+------------------+---------------------------+
| ROM / Bootloader | Serial console |
| | (nothing else is running) |
+------------------+---------------------------+
| Kernel | dmesg |
| | (kernel ring buffer) |
+------------------+---------------------------+
| User space | systemd-analyze |
| | journalctl -b |
+------------------+---------------------------+
Wrong tool = wasted time. If the bootloader failed, dmesg shows nothing — the kernel never ran.
Mini Exercise: Label the Boot Log
Which stage produced each line? Suggest one optimization.
[ 0.000000] Booting Linux on physical CPU 0x0 <- ???
[ 0.821432] i2c_dev: i2c /dev entries driver <- ???
[ 1.245678] EXT4-fs (mmcblk0p2): mounted filesystem <- ???
[ 12.345678] systemd[1]: Started Avahi mDNS/DNS-SD <- ???
[ 13.456789] systemd[1]: Started data-logger.service <- ???
[ 15.000000] data-logger: first measurement recorded <- ???
Mini Exercise: Answer
[ 0.000000] Booting Linux on physical CPU 0x0 <- KERNEL
[ 0.821432] i2c_dev: i2c /dev entries driver <- KERNEL
[ 1.245678] EXT4-fs (mmcblk0p2): mounted filesystem <- KERNEL
[ 12.345678] systemd[1]: Started Avahi mDNS/DNS-SD <- INIT
[ 13.456789] systemd[1]: Started data-logger.service <- INIT
[ 15.000000] data-logger: first measurement recorded <- APP
Optimization: disable Avahi (saves ~1 s). The 11 s gap between rootfs mount and Avahi indicates heavy init work — investigate with systemd-analyze blame.
Block 1 Key Takeaways
- Boot is a five-stage pipeline: ROM, bootloader, kernel, init, app
- Each stage has different failure modes and tools
- Most time is spent in the init/systemd stage
- Measure first with
dmesgandsystemd-analyze, then optimize - Stock OS: 15-35 s. Minimal Buildroot: 3-10 s.
Block 2
Boot Architectures — MCU vs SoC vs PC
The Core Question
| Platform | Boot Time | Runs Linux? |
|---|---|---|
| STM32 (Cortex-M) | < 100 ms | No |
| STM32MP1 (A7+M4) | 5-10 s | Yes (A7 core) |
| Raspberry Pi 4 | 15-35 s | Yes |
| PC (x86-64) | 10-30 s | Yes |
Why the 700x difference between STM32 and Pi?
Not just "more software." It is a fundamentally different boot architecture driven by the presence of MMU, filesystem, and trust chain.
Team Activity: Draw the Boot Flow
Each team picks one platform (STM32 / RPi4 / PC / STM32MP1).
Task (5 min): Draw the boot flow from power-on to application-ready.
- What stages exist?
- Where does each stage live (ROM, flash, SD card)?
- What does each stage initialize?
- What is the handoff mechanism?
Compare your drawings. Where are the differences?
Generic Boot Stage Model (BL0-BL6)
All platforms follow staged boot. A unified naming helps compare:
+------+ +------+ +--------+ +------+ +--------+ +------+ +-----+
| BL0 | -> | BL1 | -> | BL2 | -> | BL3 | -> | BL4 | -> | BL5 | -> | BL6 |
| Boot | | FSBL | | Secure | | SSBL | | Kernel | | Init | | App |
| ROM | | DRAM | | FW | |U-Boot| | Linux | |systemd | |
+------+ +------+ +--------+ +------+ +--------+ +------+ +-----+
| Stage | Name | Role |
|---|---|---|
| BL0 | Boot ROM | Hardcoded in silicon, loads first external code |
| BL1 | FSBL | Initializes DRAM, clocks, loads next stage |
| BL2 | Secure FW | TF-A / TF-M / UEFI Secure Phase |
| BL3 | SSBL | U-Boot / GRUB / UEFI Boot Manager |
Not every platform uses all stages. An MCU jumps from BL0 to BL4.
Naming Confusion: FSBL / SSBL / TF-A / TF-M
The industry uses multiple names for the same concept:
| Generic | STM32 (M) | STM32MP1 | RPi 4 | PC (x86) |
|---|---|---|---|---|
| BL0: Boot ROM | Internal ROM | Boot ROM + OTP | GPU ROM | CPU microcode |
| BL1: FSBL | N/A | TF-A (BL2) | start4.elf | UEFI PEI |
| BL2: Secure FW | TF-M (opt) | OP-TEE (BL32) | N/A | UEFI DXE |
| BL3: SSBL | N/A | U-Boot (BL33) | N/A | GRUB |
| BL4: Kernel | main() | Linux/FreeRTOS | Linux | Linux |
Key rules: TF-M = Cortex-M secure FW. TF-A = Cortex-A secure FW. OP-TEE = Linux secure world on Cortex-A.
STM32 (Cortex-M): ROM to main()
+----------+ +----------------+ +-------------+ +----------+
| Power On | --> | Boot ROM | --> | SystemInit | --> | main() |
| | | Check BOOT | | Clocks + | | App |
| | | pins | | PLL | | running |
+----------+ +----------------+ +-------------+ +----------+
~1 ms ~10 ms ~20 ms ~50 ms
- No OS. No filesystem. No bootloader.
- CPU fetches the reset vector from internal flash and runs.
- Total: under 100 ms. Uses only BL0 and BL4.
Raspberry Pi 4: GPU Boots the CPU
+---------+ +---------+ +-----------+ +--------+ +---------+ +-------+
| Power | -> | GPU ROM | -> | start4.elf| -> | Kernel | -> | systemd | -> | App |
| On | | Reads | | GPU FW + | | Linux | | Services| | Ready |
| | | SD card | | config.txt| | | | | | |
+---------+ +---------+ +-----------+ +--------+ +---------+ +-------+
~1 s ~1.5 s ~2-3 s ~5-20 s ~1-5 s
- Unique: the GPU boots first and initializes the ARM CPU
- No traditional U-Boot stage; GPU firmware reads
config.txt - Total: 15-35 seconds
STM32MP1: ARM Trusted Firmware Chain
+-------+ +--------+ +-------+ +--------+ +--------+ +---------+ +-----+
| Power | ->| Boot | ->| TF-A | ->| OP-TEE | ->| U-Boot | ->| Kernel | ->| App |
| On | | ROM | | FSBL | | Secure | | SSBL | | + Init | | |
| | | + OTP | | DDR | | World | | Load | | systemd | | |
+-------+ +--------+ +-------+ +--------+ +--------+ +---------+ +-----+
~0.5 s ~1 s ~0.5 s ~2 s ~5 s ~1 s
The most complete embedded boot chain: ROM, FSBL (TF-A), secure world (OP-TEE), SSBL (U-Boot), kernel, init, app. Total: 5-10 seconds.
PC (x86): UEFI Firmware Chain
+-------+ +-----------+ +-----------+ +--------+ +---------+ +-------+
| Power | ->| CPU ROM | ->| UEFI | ->| GRUB / | ->| Kernel | ->| App / |
| On | | Microcode | | PEI + DXE | | Boot | | + Init | | Desk- |
| | | + UEFI SEC| | DRAM+PCIe | | Mgr | | systemd | | top |
+-------+ +-----------+ +-----------+ +--------+ +---------+ +-------+
~1 s ~2 s ~1 s ~7-20 s ~2-5 s
- Most complex hardware enumeration (PCIe, USB, SATA)
- UEFI replaces the old BIOS; provides Secure Boot via signed bootloaders
- Total: 10-30 seconds
Platform Comparison Table
| Feature | STM32 (M) | RPi 4 (A72) | STM32MP1 | PC (x86) |
|---|---|---|---|---|
| MMU | No | Yes | Yes (A7) | Yes |
| OS | Bare/RTOS | Linux | Linux + RTOS | Linux |
| Boot ROM | Internal flash | GPU-based | ROM + OTP | UEFI ROM |
| Bootloader | None | start4.elf | TF-A + U-Boot | UEFI + GRUB |
| Filesystem | None | ext4 | ext4/squashfs | ext4/NTFS |
| Boot time | < 100 ms | 15-35 s | 5-10 s | 10-30 s |
Why So Different?
The boot time difference comes from architectural requirements:
| Requirement | Adds Time Because... |
|---|---|
| DRAM initialization | Must train memory timing (hundreds of ms) |
| Filesystem mounting | Must read and verify metadata structures |
| Device tree probing | Must discover and bind hundreds of devices |
| Service dependencies | Must resolve and start in correct order |
| Secure verification | Must check cryptographic signatures |
An MCU skips all five. No DRAM, no filesystem, no device tree, no services, no secure boot.
Secure Boot: Why It Matters
Without secure boot, an attacker with physical access can replace the bootloader or kernel — and the device will execute it.
Secure boot = every stage cryptographically verifies the next before handing over.

Secure boot flow: ROM verifies bootloader, bootloader verifies kernel, kernel verifies rootfs. A break at any point halts the chain.
Secure Boot Chain of Trust
+------------+ +----------+ +---------+ +--------+
| HW Root of | verify | Signed | verify | Signed | verify | Signed |
| Trust | -----> | Boot- | -----> | Kernel | -----> | Root |
| (OTP/TPM) | | loader | | | | FS |
+------------+ +----------+ +---------+ +--------+
|
| Cannot be modified after manufacturing
If any verification fails, boot halts. The strength depends on a hardware root of trust — a key in OTP fuses or TPM that cannot be modified.
| Feature | STM32 (M) | STM32MP1 | RPi 4 | PC (x86) |
|---|---|---|---|---|
| Root of trust | RDP + option bytes | OTP fuses | Limited | TPM 2.0 |
| Chain | Flash lock | ROM->TF-A->U-Boot->Kernel | Partial | ROM->UEFI->GRUB->Kernel |
Secure Boot Chain Diagram

Chain of trust: each boot stage verifies the cryptographic signature of the next before handing over control.
Heterogeneous SoCs: The Bridge Architecture
The STM32MP1 combines two worlds on one chip:
+----------------------------------------------+
| STM32MP1 SoC |
| |
| +-------------------+ +------------------+ |
| | Cortex-A7 (Linux) | | Cortex-M4 (RTOS)| |
| | - Networking | | - Motor control | |
| | - UI, storage | | - Sensor sampling| |
| | - OTA updates | | - Safety loops | |
| | Boot: 5-10 s | | Boot: < 50 ms | |
| +--------+-----------+ +--------+---------+ |
| | RPMsg / Shared Memory | |
| +-----------------------------+ |
+----------------------------------------------+
Heterogeneous SoC: Role Split
| Concern | Cortex-A7 (Linux) | Cortex-M4 (RTOS) |
|---|---|---|
| Role | Networking, UI, storage | Motor control, sensors, safety |
| OS | Linux + systemd | FreeRTOS / bare-metal |
| Boot time | 5-10 seconds | < 50 ms |
| Latency | ms-level (non-deterministic) | us-level (deterministic) |
| Communication | RPMsg / shared memory | RPMsg / shared memory |
Design guideline: safety-critical and time-critical loops on the M4. Networking, storage, UI, and OTA on the A7. Define the interface early.
When to Choose What?
Need Linux (networking, filesystem, UI)?
|
+----+----+
No Yes
| |
[MCU] Need hard real-time AND Linux?
STM32 |
< 100 ms +----+----+
No Yes
| |
[SoC with [Heterogeneous SoC]
Linux] STM32MP1
RPi4 A7 = Linux
5-35 s M4 = RTOS
The architecture decision drives the boot architecture, not the other way around.
Mini Exercise: Smart Greenhouse
Design a smart greenhouse controller that must:
- Read soil moisture + temperature every 100 ms (hard real-time)
- Display status on 7-inch LCD with web dashboard
- Receive firmware updates over Wi-Fi
- Boot the sensor loop within 200 ms of power-on
Which platform? Which tasks on which core? What is the boot sequence for each?
Take 5 minutes. Then we compare solutions.
Mini Exercise: Solution Sketch
Platform: STM32MP1 (heterogeneous SoC)
| Task | Core | Reason |
|---|---|---|
| Sensor loop (100 ms) | M4 | Hard RT, boots in < 50 ms |
| LCD + web dashboard | A7 | Needs Linux, networking, graphics |
| Wi-Fi OTA updates | A7 | Needs TCP/IP stack, filesystem |
| Sensor -> display data | RPMsg | Inter-core shared memory |
M4 boots and starts sampling immediately. A7 boots Linux in 5-10 s, then starts the dashboard. Sensor data is never lost.
Key Takeaways
- Boot is a pipeline — ROM, bootloader, kernel, init, app — each with its own tools
- Measure before optimizing —
dmesg+systemd-analyze blame - Architecture determines boot — MMU, filesystem, trust chain add stages and time
- More stages = more complex HW + more trust decisions
- Heterogeneous SoCs combine Linux flexibility with MCU-level real-time
- Secure boot requires a hardware root of trust and cryptographic chain
Next Steps
Hands-on tutorials to reinforce these concepts:
- Boot Timing Lab — measure actual boot stages with timestamps
- Buildroot Mini-Linux — build a minimal image, compare boot time to stock OS
- Exploring Linux — navigate the running system, read dmesg, inspect services
The theory tells you why. The lab tells you how fast (or how slow).