Lesson 7: Architecture & Integration

Óbuda University -- Linux in Embedded Systems

"Draw the whole system. Where does data flow? Where can it fail? How did we get here?"

Problem First

Your data logger starts before the network is ready -- the first cloud upload fails.

The display service starts before the sensor driver has probed -- stale data for 10 seconds.

Logs are split between journalctl, dmesg, a CSV file, and stdout -- nobody can reconstruct what happened when the device rebooted at 3 AM.

These are not coding bugs. They are architecture bugs.

No amount of fixing individual services will help if the startup order is fragile, the data flow is undefined, and the failure modes are unplanned.

Today's Map

Block 1 (45 min): Full-stack system view: stack from hardware to app, hardware contract, data flow as architecture, four architecture views, layering rules.
Block 2 (45 min): Draw your system plus history: architecture exercise, peer review, Unix/Linux timeline, obsolete vs emerging components, regulatory landscape.

What Is Architecture?

Architecture defines three things before code is written:

Which components exist -- services, drivers, hardware blocks
How they communicate -- protocols, interfaces, data formats
What happens when something breaks -- recovery, fallback, degradation

A system without documented architecture works by accident. It breaks the moment someone changes something that was never written down.

The Full Stack

 +-------------------------------------------------------+
 |                  Application Layer                     |
 |  Product logic: read sensors, render display, upload   |
 +--------------------------+----------------------------+
 |       Init System        |     System Services        |
 |  systemd: ordering,      |  logging, networking,      |
 |  supervision, restart    |  monitoring                |
 +--------------------------+----------------------------+
 |                  Linux Kernel                          |
 |  Drivers (I2C, SPI, GPIO, DRM)  |  Scheduler  |  FS  |
 +--------------------------+------+---------+---+-------+
 |                  Bootloader                            |
 |  U-Boot / RPi firmware: HW init, kernel loading       |
 +--------------------------+----------------------------+
 |                  Hardware                              |
 |  SoC (CPU, GPU, peripherals)  |  Sensors, Displays   |
 +-------------------------------------------------------+

Each layer has one job and communicates only with its neighbors.

Layer Responsibilities

Layer	Owns	Stops At
Bootloader	Clock init, RAM init, load kernel	Kernel handoff
Kernel	Drivers, scheduling, memory, filesystems	Stable `/dev/` and sysfs interfaces
Init (systemd)	Service ordering, supervision, restart	Application readiness
Application	Product behavior, data processing	User-visible output

When a layer reaches into another layer's job, the system becomes fragile.

center

Operating system architecture layers: hardware, kernel (drivers, scheduler, memory management), system libraries, and user-space applications.

Block 1

Full-Stack System View

The Hardware Contract

Three layers form a contract:

 +---------------------+
 |   Application       |  reads /dev/mcp9808
 |   (Python script)   |  or /sys/bus/iio/...
 +----------+----------+
            |  stable interface (read/write syscalls)
 +----------v----------+
 |   Kernel Driver      |  knows register map,
 |   (mcp9808.ko)       |  timing, bus protocol
 +----------+----------+
            |  I2C bus address from Device Tree
 +----------v----------+
 |   Device Tree        |  declares: "MCP9808 at
 |   (.dtb / overlay)   |  address 0x18 on i2c-1"
 +---------------------+

The app does not know the I2C address. The driver does not know what the app does with the temperature. The Device Tree does not know either.

When the Contract Breaks

Broken Layer	Symptom	Example
Device Tree	Driver never probes	Wrong I2C address in overlay
Driver	Interface changes between kernels	sysfs attribute renamed in 6.x
Application	Bypasses driver	App reads `/dev/mem` directly

Respecting the contract means your application runs on any board that provides the same driver interface -- Raspberry Pi, custom PCB, or industrial SoM.

Portability comes from discipline, not from luck.

Data Flow as Architecture

The most useful way to view an embedded system is as a data pipeline:

 +--------+    +---------+    +----------+    +---------+    +---------+
 | Sensor |--->| Driver  |--->| Service  |--->| Storage |--->| Display |
 | (HW)   |    | (kernel)|    | (user)   |    | (CSV/DB)|    | (OLED)  |
 +--------+    +---------+    +----------+    +---------+    +---------+
    I2C bus      /dev/ or        Python        /data/         SPI/DRM
                 sysfs           process       log.csv

Every embedded product follows this pattern. Thinking in data flow reveals the questions that matter.

Data Flow Questions

At every arrow in the pipeline, ask:

Question	Why It Matters
Where is data buffered?	Determines memory usage and latency
What format is the data?	Raw ADC counts? Degrees C? JSON?
Who owns the format conversion?	Avoids duplicate conversions
What happens when this link breaks?	Defines failure behavior
How much latency does this stage add?	Sets end-to-end timing

A pipeline that is documented can be debugged by anyone. A pipeline that exists only in the developer's head becomes unmaintainable.

Example: Temperature Monitoring

 +----------+      +-------------+      +--------------+
 | MCP9808  |--I2C-| mcp9808     |--/dev| Logger       |
 | sensor   |      | kernel drv  |      | (Python)     |
 +----------+      +-------------+      +---+-----+----+
                                            |     |
                              +-------------+     +----------+
                              |                              |
                        +-----v------+              +--------v-------+
                        | CSV file   |              | Cloud endpoint |
                        | /data/     |              | HTTP POST      |
                        +------------+              +----------------+

                        +-------------+      +--------------+
                        | mcp9808     |--/dev| Display svc  |---SPI---OLED
                        | kernel drv  |      | (updater)    |
                        +-------------+      +--------------+

Each arrow is a potential failure point. The architecture must define what happens when any link breaks.

Failure at Each Link

Link Breaks	System Behavior
Sensor disconnected	Driver returns error; logger records "N/A"
Network down	Logger buffers locally; retries on reconnect
Display fails	System continues logging (display is optional)
CSV disk full	Rotate logs; alert via network
Cloud rejects data	Queue locally; log rejection reason

Rule: No single component failure should take the whole system down. The architecture must define graceful degradation.

Four Architecture Views

A single diagram cannot capture a system. Different questions need different views:

View	Answers	Example Question
Boot / Startup	What runs when?	"Why does the logger fail on cold boot?"
Process / Service	Who supervises what?	"What restarts the display service?"
Data Flow	How does data move?	"Why is the OLED showing stale data?"
Failure / Recovery	What happens when X breaks?	"What caused the 3 AM reboot?"

Without all four, debugging degrades to guessing.

View 1: Boot / Startup

 Power On
    |
    v
 U-Boot .............. 1 s
    |
    v
 Linux Kernel ........ 2 s
    |   (drivers probe, Device Tree parsed)
    v
 systemd ............. 5 s
    |   (mount filesystems, start networking)
    v
 data-logger.service . 0.5 s
    |
    v
 App Ready ........... Total: ~8.5 s
                       Budget: 10 s

Every stage has a time budget. If boot exceeds the budget, find which stage grew and fix it.

View 2: Process / Service

 systemd (PID 1)
    |
    +-- data-logger.service
    |     Restart=always, WatchdogSec=10
    |
    +-- display-updater.service
    |     Restart=always, After=data-logger.service
    |
    +-- sshd.service
    |     Restart=on-failure (remote access)
    |
    +-- NetworkManager.service
    |     Restart=on-failure
    |
    +-- systemd-watchdog
          HW watchdog, reboot if systemd hangs

Every service has an explicit restart policy and dependency ordering. No service starts "whenever it feels like it."

View 3: Data Flow

(See the temperature monitoring pipeline from earlier.)

Key additions to document:

Buffering: logger holds 1000 samples in memory before writing to CSV
Format: driver provides millidegrees (integer); logger converts to degrees (float)
Rate: sensor polled every 1 s; display updated every 2 s; cloud push every 60 s

Different rates at different stages create a natural fan-out that must be designed, not discovered.

View 4: Failure / Recovery

 +--------------------+-----------------+------------------+
 | Failure            | Detection       | Recovery         |
 +--------------------+-----------------+------------------+
 | Service crash      | systemd         | Restart (3 s)    |
 +--------------------+-----------------+------------------+
 | System hang        | HW watchdog     | Reboot (15 s)    |
 +--------------------+-----------------+------------------+
 | Bad firmware       | A/B partition   | Rollback (30 s)  |
 | update             | boot counter    |                  |
 +--------------------+-----------------+------------------+
 | Sensor failure     | Driver error    | Log "N/A",       |
 |                    | code            | continue running  |
 +--------------------+-----------------+------------------+
 | Disk full          | inotify /       | Log rotation,    |
 |                    | threshold check | alert             |
 +--------------------+-----------------+------------------+

Why You Need All Four Views

You Have Only...	You Cannot Debug...
Boot view	Why sensor data stopped at 3 AM
Data flow view	Why the system hangs during boot
Process view	Where data gets corrupted in transit
Failure view	Why startup takes 30 seconds

Each view is a lens on the same system. Together they give full coverage of engineering and operational questions.

Layering Rules

These rules exist because every team that violates them rediscovers the same problems:

Kernel drivers expose raw data -- not "temperature too high"
Bug in threshold logic crashes a process, not the kernel
Business logic stays in user space -- where crashes are recoverable
Service supervision stays in systemd -- not in application code
Restart policies, ordering, watchdog belong to init
Cross-layer shortcuts are documented exceptions
mmap for DMA? Fine -- but document why and isolate it

Layering Violations and Their Cost

Violation	Consequence
Business logic in kernel driver	Bug causes kernel panic, not app crash
Application supervises itself	Must reimplement systemd restart logic
App reads `/dev/mem` directly	Loses portability, risks security holes
Driver formats data as JSON	Kernel does string processing (slow, fragile)

The rule: each layer does its job and trusts the adjacent layers to do theirs.

View 5: Performance View

The four views tell you what runs, how data flows, and what happens when things break. The performance view tells you whether it runs fast enough.

Metric	Where to Measure	Tool
Pipeline latency per stage	Each arrow in the data flow	`clock_gettime` instrumentation
CPU usage per process	Process view	`mpstat -P ALL`, `pidstat -t`
Hotspot functions	Inside each service	`perf record → perf report`
Blocking calls	System boundary	`strace -c`
Scheduling jitter	RT threads	`cyclictest`

Reference: Performance Profiling

Quick Architecture Checks

Five yes/no questions to validate any embedded Linux architecture:

Can one service fail without taking the whole system down?
Are all dependencies explicit in systemd unit files?
Can you trace one sensor value through the full stack?
Is each layer's ownership clear and documented?
Have you measured latency at each pipeline stage?

If any answer is "no" or "I don't know," the architecture has a gap.

center

Linux kernel subsystems: process management, memory management, filesystems, networking, and device drivers — all running in privileged mode.

Block 1 Summary

The stack is layered -- hardware, bootloader, kernel, init, application
The hardware contract -- Device Tree + driver + app interface = portability
Data flow is the backbone -- every product is a sensor-to-output pipeline
Four views are needed -- boot, process, data flow, failure/recovery
Layering rules prevent cross-layer bugs -- each layer has one job

Block 2

Draw Your System + History

Exercise: Draw Your Level Display System (25 min)

Task: In your team, draw the complete architecture of your IMU level display system.

Include: - All hardware components (IMU, display, SoC, buses) - All software layers (driver, service, application) - Data flow with latency at each stage - Failure modes at each link

Use all four views: boot, process, data flow, failure/recovery.

Paper, whiteboard, or digital -- your choice.

Exercise Template

 +--------+       +-----------+       +------------+
 | BMI160 |--SPI--| spi driver|--/dev-| level_app  |
 | IMU    |       | (kernel)  |       | (Python)   |
 +--------+       +-----------+       +-----+------+
                                            |
                                     +------v-------+
                                     | SDL2 / DRM   |
                                     | framebuffer  |
                                     +------+-------+
                                            |
                                     +------v-------+
                                     | OLED / LCD   |
                                     | display (HW) |
                                     +--------------+

Add: latency per arrow, failure mode per component, boot order, restart policy.

Peer Review (10 min)

Swap diagrams with another team.

Review checklist:

[ ] All four views present (boot, process, data flow, failure)?
[ ] Latency annotated at each pipeline stage?
[ ] Failure mode defined for each component?
[ ] Service dependencies explicit?
[ ] Can you trace a single IMU sample from sensor to pixel?

Write two strengths and two gaps on the diagram. Return it.

How Did We Get Here?

Embedded Linux did not appear overnight. Understanding history explains why certain tools exist -- and which ones are disappearing.

 1969  Unix (Bell Labs, PDP-7)
   |
 1972  Rewritten in C --> first portable OS
   |
 1977  BSD adds TCP/IP, sockets --> the Internet
   |
 1983  GNU project (Stallman) --> free tools
   |
 1988  POSIX standard --> portable API
   |
 1991  Linux 0.01 (Torvalds) --> the kernel
   |
 1999  BusyBox + uClibc --> Linux on small ARM
   |
 2003  Buildroot --> reproducible embedded builds

From Hobby to Everywhere

 2006  Android announced --> Linux goes mobile
   |
 2010  Yocto Project --> industrial embedded Linux
   |
 2015  Device Tree mandatory for ARM --> no more board files
   |
 2017  100% of Top 500 supercomputers run Linux
   |
 2023  PREEMPT_RT merged into mainline
   |
 2024  Rust modules accepted in mainline kernel
   |
 Today Linux runs on everything from $0.50 MCUs
       to Mars rovers

From a Finnish student's hobby to the OS that runs the world -- in 33 years.

center

Unix/Linux timeline: from Bell Labs PDP-7 (1969) through BSD, GNU, POSIX, and Linux (1991) to today's embedded and cloud deployments.

What's Obsolete

Old Technology	Replaced By	Year	Why
devfs	udev	2004	Dynamic, rules-based device management
fbdev (`/dev/fb0`)	DRM/KMS	~2012	Atomic page flips, VSync, multi-plane
sysvinit	systemd	~2014	Parallel boot, dependencies, watchdog
HAL daemon	udev + sysfs	~2010	Simpler, kernel-integrated
Board files (C)	Device Tree	2015	HW description separated from code

If you find tutorials using the "old" column, the approach still works but is not recommended for new designs.

What's Emerging

Technology	What It Is	Why It Matters
eBPF	Attach programs to kernel events without recompiling	Production tracing, zero overhead when off
Rust in kernel	Memory-safe driver language (since Linux 6.1)	Fewer use-after-free, buffer overflow bugs
Zephyr RTOS	Modern RTOS for MCUs	What PREEMPT_RT cannot reach
RISC-V	Open ISA, no license fees	Buildroot/Yocto support; gaining momentum

These are not speculative -- they are in mainline kernels and production systems today.

Regulatory Landscape

Embedded devices increasingly face mandatory regulations:

Standard	Scope	Key Requirement
IEC 62443	Industrial cybersecurity	Security lifecycle management with zone/conduit model — covers entire product life from design to decommission
ETSI EN 303 645	Consumer IoT	No default passwords, mandatory security updates for minimum 5 years, vulnerability disclosure process
EU Cyber Resilience Act	All connected products sold in EU	Security by design, 5-year update obligation, applies to all connected products starting 2027

These are not optional. The EU CRA affects what you can legally sell starting 2027.

Design for compliance from the start -- retrofitting security is expensive.

Connecting History to Today

Historical Decision	Impact on Your Work Today
Unix rewritten in C (1972)	Linux is portable across ARM, x86, RISC-V
"Everything is a file" (1970s)	You read sensors via `/dev/` and `/sys/`
POSIX standard (1988)	Same API on embedded Linux and desktop Linux
GPL license (1991)	Linux is free; you must share kernel changes
Device Tree (2015)	Hardware described in data, not compiled in code
PREEMPT_RT (2023)	Real-time is no longer a separate patchset

History is not trivia -- it explains why things are the way they are.

Key Takeaways

Architecture is a system view, not a code file -- define components, communication, and failure modes before writing code
Data flow is the backbone -- every embedded product is a pipeline from sensor to output
Four views give full coverage -- boot, process, data flow, failure/recovery
Respect the layers -- kernel drivers expose raw data; business logic lives in user space
30 years of history shaped today's tools -- learn from what was replaced and why
Regulations are becoming mandatory -- design for compliance from the start

Hands-On Next

Apply these architecture concepts in practice:

Capstone: Level Display System Build a complete sensor-to-display pipeline that exercises all four architecture views.

You will: - Document the boot sequence with timing budgets - Define systemd services with explicit dependencies - Trace data from IMU sensor to rendered pixels - Plan failure modes and recovery for each component

This is where architecture stops being theory and becomes engineering.