Skip to content

Lesson 8: Boot Flow & Architectures

Why does a Pi boot in 35 seconds but an STM32 in 50 ms?

Óbuda University — Linux in Embedded Systems


Today's Map

  • Block 1 (45 min): Boot flow: five boot stages, dmesg walkthrough, systemd-analyze, boot optimization, debugging by stage.
  • Block 2 (45 min): Boot architectures: MCU vs SoC vs PC, platform boot flows, secure boot chains, heterogeneous SoCs, design exercise.

Problem First

Your device boots in 35 seconds and occasionally hangs before the app starts.

You need to answer two questions:

  • Which stage is slow?
  • Which stage failed?

Without a staged boot model, this is mostly guesswork. You end up staring at a blank screen wondering whether the bootloader, kernel, or your application is the problem.


Block 1

Boot Stages, dmesg, systemd-analyze


Boot Is a Pipeline, Not a Single Step

An embedded Linux system boots in stages, each handing off to the next:

  +----------+    +------------+    +--------+    +---------+    +-------+
  | ROM Boot | -> | Bootloader | -> | Kernel | -> | systemd | -> |  App  |
  |  (SoC)   |    |  (U-Boot)  |    | (Linux)|    | (init)  |    |       |
  +----------+    +------------+    +--------+    +---------+    +-------+
     ~0.5s           ~1-2s           ~2-3s          ~5-20s        ~1-5s

*Times shown are typical for RPi 4 with stock Raspberry Pi OS.*

Each stage has its own failure modes, its own debugging tools, and its own optimization opportunities.


The Five Boot Stages

# Stage What It Does
1 ROM / SoC boot Hardwired in silicon; finds bootloader on storage
2 Bootloader (U-Boot) Initializes DRAM, clocks, storage; loads kernel
3 Linux kernel Probes HW via device tree, loads drivers, mounts rootfs
4 Init system (systemd) Starts services in dependency order
5 User application Your product code begins running

center

The five boot stages from power-on to application: ROM, bootloader, kernel, init system, and user application.


Stage 1 — ROM Boot Code

  • Hardcoded in silicon at the factory — you cannot change it
  • Reads boot pins or fuses to determine boot source (SD, eMMC, USB, NAND)
  • Loads the first external code (bootloader) into internal SRAM
  • Duration: < 1 second typically

If this fails: nothing appears on any console. The board looks dead. Check power, boot pin configuration, and storage media.


Stage 2 — Bootloader (U-Boot)

The bootloader does the minimum to load the kernel:

  • Initialize DRAM (the kernel needs memory to decompress into)
  • Initialize clocks and basic storage (SD/eMMC/NAND)
  • Load the kernel image + device tree into DRAM
  • Jump to the kernel entry point

If this fails: you see partial output on the serial console, then silence. Typical causes: wrong DRAM timing, corrupt kernel image, wrong boot device.


Stage 3 — Linux Kernel

The kernel takes over and initializes the full hardware platform:

  • Decompresses itself into memory
  • Parses the device tree to discover hardware
  • Probes drivers in dependency order
  • Mounts the root filesystem
  • Launches PID 1 (the init system)

If this fails: dmesg captures everything. Look for missing driver probes, device tree errors, or filesystem mount failures.


Stages 4 & 5 — Init System and Application

Stage 4 — systemd starts user-space services in dependency order: - Networking, logging, D-Bus, time sync, udev - Most boot time lives here — stock Pi OS spends 10-20 s in this stage - Debug with: journalctl -b and systemctl status <service>

Stage 5 — Your application finally runs: - All hardware initialized, all services available - Design goal: start as soon as dependencies are met, not after everything else


Where Hardware Is Initialized

Each stage initializes only what the next stage needs:

Stage Initializes For Whom
Bootloader Clocks, DRAM, storage basics Kernel
Kernel Drivers, device tree probing, modules User space
User space Service config, app-level HW policy Application

Anti-pattern: putting I2C sensor init in the bootloader (kernel's job). Anti-pattern: deciding which sensor readings to log in a kernel driver (application's job).


Live Demo: dmesg Walkthrough

dmesg shows every kernel message since boot with timestamps:

[    0.000000] Booting Linux on physical CPU 0x0  <- kernel starts
[    0.000000] Machine model: Raspberry Pi 4 B    <- DT identified board
[    0.524173] spi-bcm2835 fe204000.spi: CS0      <- SPI init
[    1.023456] i2c_dev: i2c /dev entries driver    <- I2C ready
[    1.245678] EXT4-fs (mmcblk0p2): mounted        <- rootfs mounted

Timestamps = seconds since kernel start. Gaps reveal where time is spent.

Pattern in dmesg Meaning
Large timestamp gap A driver or subsystem is slow
error / timeout Something went wrong
deferred Driver waiting for a dependency

Try It Now: Read the Boot Log (5 min)

Inspect your Pi's boot log and identify the five boot stages from real timestamps:

# First 30 lines — find when the kernel started
dmesg | head -30

# Search for key milestones
dmesg | grep -i "kernel command line"
dmesg | grep -i "mounted"
dmesg | grep -i "systemd"

# Find the largest timestamp gap (slow driver or subsystem)
dmesg | awk '{print $1}' | head -50

Which stage takes the longest? Where is the biggest gap?

Tutorial: Boot Timing Lab — Section 1: Measure Boot Time Theory: Section 1: Boot Stages


Live Demo: systemd-analyze

$ systemd-analyze time
Startup finished in 1.512s (kernel) + 12.345s (userspace) = 13.857s
graphical.target reached after 12.100s in userspace

This output shows graphical.target — a desktop target. Embedded systems typically use multi-user.target (no GUI) or a custom target, which boots faster.

$ systemd-analyze blame
  5.012s apt-daily.service
  3.456s NetworkManager-wait-online.service
  1.234s dev-mmcblk0p2.device
  0.987s bluetooth.service
  0.543s avahi-daemon.service

Kernel: 1.5 s. Userspace: 12.3 s. The bottleneck is userspace. Lists services slowest first — disable what you do not need.


Try It Now: Find the Bottleneck (5 min)

Use systemd-analyze to find what is slowing down your boot:

# Overall boot time breakdown
systemd-analyze time

# Top 10 slowest services
systemd-analyze blame | head -10

# Critical path — what actually blocked boot
systemd-analyze critical-chain

Which service takes the most time? Could you disable it on an embedded device?

Tutorial: Boot Timing Lab — Section 2: Analyze Services Theory: Section 3: systemd and Init


Boot Time Optimization Strategy

Principle: measure first, optimize second. Never guess.

Optimization Time Saved Effort
Disable Bluetooth service ~0.5 s Low
Disable avahi-daemon (mDNS) ~1.0 s Low
Remove desktop/GUI packages ~5-15 s Medium
Use Buildroot instead of stock OS ~10-25 s High
Kernel: disable unused drivers ~1-3 s High

Stock Pi OS: 15-35 s. Tuned Buildroot: 3-10 s.

Additional strategies: parallelize non-dependent services, defer non-critical init, avoid blocking on optional devices.


Debugging by Stage — Which Tool?

  +------------------+---------------------------+
  | Stage            | Primary Tool              |
  +------------------+---------------------------+
  | ROM / Bootloader | Serial console            |
  |                  | (nothing else is running)  |
  +------------------+---------------------------+
  | Kernel           | dmesg                     |
  |                  | (kernel ring buffer)       |
  +------------------+---------------------------+
  | User space       | systemd-analyze           |
  |                  | journalctl -b             |
  +------------------+---------------------------+

Wrong tool = wasted time. If the bootloader failed, dmesg shows nothing — the kernel never ran.


Mini Exercise: Label the Boot Log

Which stage produced each line? Suggest one optimization.

[    0.000000] Booting Linux on physical CPU 0x0          <- ???
[    0.821432] i2c_dev: i2c /dev entries driver           <- ???
[    1.245678] EXT4-fs (mmcblk0p2): mounted filesystem    <- ???
[   12.345678] systemd[1]: Started Avahi mDNS/DNS-SD      <- ???
[   13.456789] systemd[1]: Started data-logger.service     <- ???
[   15.000000] data-logger: first measurement recorded     <- ???

Mini Exercise: Answer

[    0.000000] Booting Linux on physical CPU 0x0          <- KERNEL
[    0.821432] i2c_dev: i2c /dev entries driver           <- KERNEL
[    1.245678] EXT4-fs (mmcblk0p2): mounted filesystem    <- KERNEL
[   12.345678] systemd[1]: Started Avahi mDNS/DNS-SD      <- INIT
[   13.456789] systemd[1]: Started data-logger.service     <- INIT
[   15.000000] data-logger: first measurement recorded     <- APP

Optimization: disable Avahi (saves ~1 s). The 11 s gap between rootfs mount and Avahi indicates heavy init work — investigate with systemd-analyze blame.


Block 1 Key Takeaways

  • Boot is a five-stage pipeline: ROM, bootloader, kernel, init, app
  • Each stage has different failure modes and tools
  • Most time is spent in the init/systemd stage
  • Measure first with dmesg and systemd-analyze, then optimize
  • Stock OS: 15-35 s. Minimal Buildroot: 3-10 s.

Block 2

Boot Architectures — MCU vs SoC vs PC


The Core Question

Platform Boot Time Runs Linux?
STM32 (Cortex-M) < 100 ms No
STM32MP1 (A7+M4) 5-10 s Yes (A7 core)
Raspberry Pi 4 15-35 s Yes
PC (x86-64) 10-30 s Yes

Why the 700x difference between STM32 and Pi?

Not just "more software." It is a fundamentally different boot architecture driven by the presence of MMU, filesystem, and trust chain.


Team Activity: Draw the Boot Flow

Each team picks one platform (STM32 / RPi4 / PC / STM32MP1).

Task (5 min): Draw the boot flow from power-on to application-ready.

  • What stages exist?
  • Where does each stage live (ROM, flash, SD card)?
  • What does each stage initialize?
  • What is the handoff mechanism?

Compare your drawings. Where are the differences?


Generic Boot Stage Model (BL0-BL6)

All platforms follow staged boot. A unified naming helps compare:

  +------+    +------+    +--------+    +------+    +--------+    +------+    +-----+
  | BL0  | -> | BL1  | -> |  BL2   | -> | BL3  | -> |  BL4   | -> | BL5  | -> | BL6 |
  | Boot |    | FSBL |    | Secure |    | SSBL |    | Kernel |    | Init |    | App |
  | ROM  |    | DRAM |    |   FW   |    |U-Boot|    | Linux  |    |systemd    |     |
  +------+    +------+    +--------+    +------+    +--------+    +------+    +-----+
Stage Name Role
BL0 Boot ROM Hardcoded in silicon, loads first external code
BL1 FSBL Initializes DRAM, clocks, loads next stage
BL2 Secure FW TF-A / TF-M / UEFI Secure Phase
BL3 SSBL U-Boot / GRUB / UEFI Boot Manager

Not every platform uses all stages. An MCU jumps from BL0 to BL4.


Naming Confusion: FSBL / SSBL / TF-A / TF-M

The industry uses multiple names for the same concept:

Generic STM32 (M) STM32MP1 RPi 4 PC (x86)
BL0: Boot ROM Internal ROM Boot ROM + OTP GPU ROM CPU microcode
BL1: FSBL N/A TF-A (BL2) start4.elf UEFI PEI
BL2: Secure FW TF-M (opt) OP-TEE (BL32) N/A UEFI DXE
BL3: SSBL N/A U-Boot (BL33) N/A GRUB
BL4: Kernel main() Linux/FreeRTOS Linux Linux

Key rules: TF-M = Cortex-M secure FW. TF-A = Cortex-A secure FW. OP-TEE = Linux secure world on Cortex-A.


STM32 (Cortex-M): ROM to main()

  +----------+     +----------------+     +-------------+     +----------+
  | Power On | --> | Boot ROM       | --> | SystemInit  | --> | main()   |
  |          |     | Check BOOT     |     | Clocks +    |     | App      |
  |          |     | pins           |     | PLL         |     | running  |
  +----------+     +----------------+     +-------------+     +----------+
     ~1 ms              ~10 ms                ~20 ms             ~50 ms
  • No OS. No filesystem. No bootloader.
  • CPU fetches the reset vector from internal flash and runs.
  • Total: under 100 ms. Uses only BL0 and BL4.

Raspberry Pi 4: GPU Boots the CPU

  +---------+    +---------+    +-----------+    +--------+    +---------+    +-------+
  | Power   | -> | GPU ROM | -> | start4.elf| -> | Kernel | -> | systemd | -> |  App  |
  |   On    |    | Reads   |    | GPU FW +  |    | Linux  |    | Services|    | Ready |
  |         |    | SD card |    | config.txt|    |        |    |         |    |       |
  +---------+    +---------+    +-----------+    +--------+    +---------+    +-------+
                    ~1 s           ~1.5 s          ~2-3 s       ~5-20 s       ~1-5 s
  • Unique: the GPU boots first and initializes the ARM CPU
  • No traditional U-Boot stage; GPU firmware reads config.txt
  • Total: 15-35 seconds

STM32MP1: ARM Trusted Firmware Chain

  +-------+   +--------+   +-------+   +--------+   +--------+   +---------+   +-----+
  | Power | ->| Boot   | ->| TF-A  | ->| OP-TEE | ->| U-Boot | ->| Kernel  | ->| App |
  |  On   |   | ROM    |   | FSBL  |   | Secure |   | SSBL   |   | + Init  |   |     |
  |       |   | + OTP  |   | DDR   |   | World  |   | Load   |   | systemd |   |     |
  +-------+   +--------+   +-------+   +--------+   +--------+   +---------+   +-----+
                ~0.5 s        ~1 s        ~0.5 s       ~2 s         ~5 s        ~1 s

The most complete embedded boot chain: ROM, FSBL (TF-A), secure world (OP-TEE), SSBL (U-Boot), kernel, init, app. Total: 5-10 seconds.


PC (x86): UEFI Firmware Chain

  +-------+   +-----------+   +-----------+   +--------+   +---------+   +-------+
  | Power | ->| CPU ROM   | ->| UEFI      | ->| GRUB / | ->| Kernel  | ->| App / |
  |  On   |   | Microcode |   | PEI + DXE |   | Boot   |   | + Init  |   | Desk- |
  |       |   | + UEFI SEC|   | DRAM+PCIe |   | Mgr    |   | systemd |   | top   |
  +-------+   +-----------+   +-----------+   +--------+   +---------+   +-------+
                  ~1 s             ~2 s           ~1 s         ~7-20 s      ~2-5 s
  • Most complex hardware enumeration (PCIe, USB, SATA)
  • UEFI replaces the old BIOS; provides Secure Boot via signed bootloaders
  • Total: 10-30 seconds

Platform Comparison Table

Feature STM32 (M) RPi 4 (A72) STM32MP1 PC (x86)
MMU No Yes Yes (A7) Yes
OS Bare/RTOS Linux Linux + RTOS Linux
Boot ROM Internal flash GPU-based ROM + OTP UEFI ROM
Bootloader None start4.elf TF-A + U-Boot UEFI + GRUB
Filesystem None ext4 ext4/squashfs ext4/NTFS
Boot time < 100 ms 15-35 s 5-10 s 10-30 s

Why So Different?

The boot time difference comes from architectural requirements:

Requirement Adds Time Because...
DRAM initialization Must train memory timing (hundreds of ms)
Filesystem mounting Must read and verify metadata structures
Device tree probing Must discover and bind hundreds of devices
Service dependencies Must resolve and start in correct order
Secure verification Must check cryptographic signatures

An MCU skips all five. No DRAM, no filesystem, no device tree, no services, no secure boot.


Secure Boot: Why It Matters

Without secure boot, an attacker with physical access can replace the bootloader or kernel — and the device will execute it.

Secure boot = every stage cryptographically verifies the next before handing over.

center

Secure boot flow: ROM verifies bootloader, bootloader verifies kernel, kernel verifies rootfs. A break at any point halts the chain.


Secure Boot Chain of Trust

  +------------+        +----------+        +---------+        +--------+
  | HW Root of | verify | Signed   | verify | Signed  | verify | Signed |
  | Trust      | -----> | Boot-    | -----> | Kernel  | -----> | Root   |
  | (OTP/TPM)  |        | loader   |        |         |        | FS     |
  +------------+        +----------+        +---------+        +--------+
       |
       | Cannot be modified after manufacturing

If any verification fails, boot halts. The strength depends on a hardware root of trust — a key in OTP fuses or TPM that cannot be modified.

Feature STM32 (M) STM32MP1 RPi 4 PC (x86)
Root of trust RDP + option bytes OTP fuses Limited TPM 2.0
Chain Flash lock ROM->TF-A->U-Boot->Kernel Partial ROM->UEFI->GRUB->Kernel

Secure Boot Chain Diagram

center

Chain of trust: each boot stage verifies the cryptographic signature of the next before handing over control.


Heterogeneous SoCs: The Bridge Architecture

The STM32MP1 combines two worlds on one chip:

  +----------------------------------------------+
  |              STM32MP1 SoC                     |
  |                                               |
  |  +-------------------+  +------------------+  |
  |  | Cortex-A7 (Linux) |  | Cortex-M4 (RTOS)|  |
  |  | - Networking       |  | - Motor control  |  |
  |  | - UI, storage      |  | - Sensor sampling|  |
  |  | - OTA updates      |  | - Safety loops   |  |
  |  | Boot: 5-10 s       |  | Boot: < 50 ms   |  |
  |  +--------+-----------+  +--------+---------+  |
  |           |     RPMsg / Shared Memory   |      |
  |           +-----------------------------+      |
  +----------------------------------------------+

Heterogeneous SoC: Role Split

Concern Cortex-A7 (Linux) Cortex-M4 (RTOS)
Role Networking, UI, storage Motor control, sensors, safety
OS Linux + systemd FreeRTOS / bare-metal
Boot time 5-10 seconds < 50 ms
Latency ms-level (non-deterministic) us-level (deterministic)
Communication RPMsg / shared memory RPMsg / shared memory

Design guideline: safety-critical and time-critical loops on the M4. Networking, storage, UI, and OTA on the A7. Define the interface early.


When to Choose What?

  Need Linux (networking, filesystem, UI)?
         |
    +----+----+
    No        Yes
    |         |
  [MCU]     Need hard real-time AND Linux?
  STM32       |
  < 100 ms  +----+----+
            No        Yes
            |         |
        [SoC with   [Heterogeneous SoC]
         Linux]      STM32MP1
         RPi4        A7 = Linux
         5-35 s      M4 = RTOS

The architecture decision drives the boot architecture, not the other way around.


Mini Exercise: Smart Greenhouse

Design a smart greenhouse controller that must:

  • Read soil moisture + temperature every 100 ms (hard real-time)
  • Display status on 7-inch LCD with web dashboard
  • Receive firmware updates over Wi-Fi
  • Boot the sensor loop within 200 ms of power-on

Which platform? Which tasks on which core? What is the boot sequence for each?

Take 5 minutes. Then we compare solutions.


Mini Exercise: Solution Sketch

Platform: STM32MP1 (heterogeneous SoC)

Task Core Reason
Sensor loop (100 ms) M4 Hard RT, boots in < 50 ms
LCD + web dashboard A7 Needs Linux, networking, graphics
Wi-Fi OTA updates A7 Needs TCP/IP stack, filesystem
Sensor -> display data RPMsg Inter-core shared memory

M4 boots and starts sampling immediately. A7 boots Linux in 5-10 s, then starts the dashboard. Sensor data is never lost.


Key Takeaways

  • Boot is a pipeline — ROM, bootloader, kernel, init, app — each with its own tools
  • Measure before optimizingdmesg + systemd-analyze blame
  • Architecture determines boot — MMU, filesystem, trust chain add stages and time
  • More stages = more complex HW + more trust decisions
  • Heterogeneous SoCs combine Linux flexibility with MCU-level real-time
  • Secure boot requires a hardware root of trust and cryptographic chain

Next Steps

Hands-on tutorials to reinforce these concepts:

  • Boot Timing Lab — measure actual boot stages with timestamps
  • Buildroot Mini-Linux — build a minimal image, compare boot time to stock OS
  • Exploring Linux — navigate the running system, read dmesg, inspect services

The theory tells you why. The lab tells you how fast (or how slow).