Debugging Embedded Linux
Time: 60 min | Prerequisites: SSH Login | Theory companion: Linux Fundamentals, Section 11
Learning Objectives
By the end of this tutorial you will be able to:
- Follow a structured debugging flowchart to diagnose boot, service, device, and application failures
- Use
straceto trace system calls and identify permission or path errors - Debug a C program interactively with GDB (breakpoints, backtrace, variable inspection)
- Set up remote debugging with
gdbserverfor cross-development - Run and debug ARM binaries on x86 using QEMU
- Read kernel logs with
dmesgandjournalctlat different severity levels - Choose the right debugging approach (GDB/SSH, QEMU, JTAG) for a given failure mode
The Debugging Tools Hierarchy
Debugging embedded Linux follows a natural escalation path, from least invasive to most powerful:
- printf / logging -- the simplest tool. Add
printfstatements or structured log output to narrow down where the problem occurs. In kernel code, usedev_err,dev_warn,dev_info, anddev_dbginstead. - strace -- intercepts every system call your program makes. Invaluable for finding permission errors, missing files, or unexpected I/O patterns without modifying the source code.
- GDB -- interactive debugging with breakpoints, variable inspection, and call stack examination. Use locally on the Pi, remotely via
gdbserver, or with QEMU for hardware-free testing. - perf / ftrace -- profiling and kernel tracing. When the bug is a performance problem (latency spikes, CPU hogging), these tools show where time is actually spent.
- JTAG / SWD -- hardware debug interfaces that work even when the OS is not running. Required for bootloader debugging, kernel panics with no serial output, and hard lockups.
The key principle: start simple and escalate only when needed. Most embedded bugs are found with dmesg, strace, and a few printf calls. GDB and JTAG are powerful but add setup overhead -- reserve them for crashes, race conditions, and hardware bring-up.
For the full conceptual framework, see Linux Fundamentals, Section 11.
Debugging Flowchart
When something fails, follow this decision tree:
graph TD
A[System won't boot?] -->|Yes| B[Check serial console / dmesg]
A -->|No| C[Service won't start?]
B --> B1[Kernel panic → check DT / driver]
B --> B2[Hangs at init → check systemd deps]
C -->|Yes| D[journalctl -u SERVICE]
C -->|No| E[Device not detected?]
D --> D1[Check ExecStart path and permissions]
E -->|Yes| F[Check dmesg + i2cdetect/lsmod]
E -->|No| G[App misbehaving?]
F --> F1[Driver not loaded → check DT overlay]
F --> F2[Wrong address → check wiring]
G -->|Yes| H[strace / GDB / log output]
1. Boot and Service Diagnostics
Concept: Most failures are visible in boot logs or service status.
Example — finding a failed service:
$ systemctl list-units --failed
UNIT LOAD ACTIVE SUB DESCRIPTION
● data-logger.service loaded failed failed Data Logger Appliance
$ journalctl -u data-logger.service
-- No entries --
# ← This means the service never started. Check ExecStart path.
2. Driver and Device Checks
Concept: Drivers expose hardware via /dev and sysfs.
Example — driver not loaded:
$ lsmod | grep mcp
# (empty output means the module is not loaded)
$ dmesg | grep mcp
[ 12.345] mcp9808: probe failed with error -5
# ← Error -5 is EIO (I/O error). Check wiring and I2C address.
3. Process and Resource Monitoring
Concept: Embedded systems often fail due to CPU, memory, or I/O pressure.
4. Tracing and System Calls
Concept: strace shows what your program actually does.
Example — finding why a program fails to open a device:
$ strace -e openat cat /dev/mcp9808
openat(AT_FDCWD, "/dev/mcp9808", O_RDONLY) = -1 EACCES (Permission denied)
# ← Fix with: chmod 666 /dev/mcp9808 or add a udev rule
5. Network Debugging
Concept: Many embedded apps fail due to network misconfiguration.
6. I2C/SPI Debugging
Concept: Bus errors are often electrical or addressing mistakes.
7. A Minimal Debug Checklist
- Is the device visible in
/dev? - Does the driver load cleanly (
dmesg)? - Are permissions correct?
- Is the service running (
systemctl)? - Is the process consuming CPU or memory unexpectedly?
Driver Debugging Checklist
- Confirm device tree entry or overlay is loaded.
- Check
dmesgfor probe errors. - Verify bus address (I2C/SPI) is correct.
- Ensure power and pull-ups are present.
- Use
straceon user-space tools to confirm IO calls.
8. GDB Basics
GDB (GNU Debugger) lets you pause a running program, inspect variables, and step through code one line at a time. This section uses the sensor_reader.c program from the ELF tutorial.
8.1 Compile with Debug Symbols
# -g adds debug info, -O0 disables optimization (variables won't be "optimized out")
gcc -g -O0 -o sensor_reader sensor_reader.c
If you don't have the file yet, create sensor_reader.c:
// sensor_reader.c — Read CPU temperature from sysfs
#include <stdio.h>
#include <stdlib.h>
int main(void) {
FILE *fp = fopen("/sys/class/thermal/thermal_zone0/temp", "r");
if (fp == NULL) {
perror("Failed to open thermal sensor");
return 1;
}
char buf[16];
if (fgets(buf, sizeof(buf), fp) == NULL) {
perror("Failed to read temperature");
fclose(fp);
return 1;
}
fclose(fp);
int raw = atoi(buf);
printf("CPU temperature: %d.%d °C\n", raw / 1000, (raw % 1000) / 100);
return 0;
}
8.2 GDB Walkthrough
| Command | What It Does |
|---|---|
break main |
Set a breakpoint at the start of main() |
run |
Start the program (stops at breakpoint) |
next |
Execute one line, stepping over function calls |
step |
Execute one line, stepping into function calls |
print fp |
Print the value of variable fp |
print buf |
Print the contents of buf |
print raw |
Print the integer value |
info locals |
Show all local variables |
backtrace |
Show the call stack (where am I?) |
continue |
Resume execution until next breakpoint or exit |
quit |
Exit GDB |
8.3 Catch a Bug with GDB
Create a buggy version — remove the NULL check so a bad path causes a segfault:
// sensor_reader_buggy.c
#include <stdio.h>
#include <stdlib.h>
int main(void) {
FILE *fp = fopen("/nonexistent/path", "r");
// BUG: no NULL check — fp is NULL
char buf[16];
fgets(buf, sizeof(buf), fp); // segfault: dereferencing NULL
int raw = atoi(buf);
printf("CPU temperature: %d.%d °C\n", raw / 1000, (raw % 1000) / 100);
return 0;
}
gcc -g -O0 -o sensor_buggy sensor_reader_buggy.c
gdb ./sensor_buggy
(gdb) run
# Program received signal SIGSEGV, Segmentation fault.
(gdb) backtrace
# Shows exactly which line crashed
(gdb) print fp
# $1 = (FILE *) 0x0 ← NULL pointer!
(gdb) quit
GDB tells you exactly where the crash happened and why (NULL pointer dereference).
Checkpoint 8
You can compile with debug symbols, set breakpoints, step through code, inspect variables, and diagnose a segfault with GDB.
9. Remote Debugging with gdbserver
In embedded development, you compile on your laptop (fast, lots of RAM) and debug on the Pi (limited resources). gdbserver bridges the two.
9.1 Install
9.2 Start gdbserver on the Pi
# On the Pi — start the program under gdbserver, listening on port 9000
gdbserver :9000 ./sensor_reader
# Process ./sensor_reader created; pid = 1234
# Listening on port 9000
9.3 Connect from Your Laptop
# On your laptop — you need a copy of the binary (same build)
gdb-multiarch ./sensor_reader
(gdb) target remote <pi-ip>:9000
(gdb) break main
(gdb) continue
# Breakpoint 1, main () at sensor_reader.c:6
(gdb) next
(gdb) print fp
(gdb) continue
(gdb) quit
Replace <pi-ip> with your Pi's IP address (e.g., 192.168.1.42).
9.4 Why Remote Debugging?
| Local GDB (on Pi) | Remote GDB (gdbserver) | |
|---|---|---|
| Compile | On Pi (slow) | On laptop (fast) |
| Debug UI | Terminal only | Can use IDE (VS Code, CLion) |
| Pi resources | GDB uses CPU/RAM | Only gdbserver (lightweight) |
| Workflow | Single machine | Cross-development (industry standard) |
Tip
VS Code with the "Native Debug" or "cortex-debug" extension can connect to gdbserver, giving you a graphical debugging experience with breakpoints, variable watch, and call stack — all running on your Pi remotely.
Checkpoint 9
You can start gdbserver on the Pi, connect with gdb-multiarch from your laptop, and debug remotely with breakpoints and variable inspection.
10. QEMU for Testing and Debugging
QEMU lets you run ARM binaries on your x86 laptop without a Pi. Combined with GDB, it gives you full debugging without any target hardware.
10.1 User-Mode Emulation
# On your laptop — run an ARM binary on x86
# First, cross-compile (see ELF tutorial Section 5)
arm-linux-gnueabihf-gcc -static -g -O0 -o sensor_reader_arm sensor_reader.c
# Run with QEMU
qemu-arm ./sensor_reader_arm
Note
The sysfs thermal path likely doesn't exist on your laptop, so the program will print an error. This is expected — QEMU emulates the CPU, not the Pi's hardware.
10.2 QEMU + GDB
# Terminal 1: Start under QEMU, waiting for GDB
qemu-arm -g 1234 ./sensor_reader_arm
# Terminal 2: Attach GDB
gdb-multiarch ./sensor_reader_arm
(gdb) target remote :1234
(gdb) break main
(gdb) continue
(gdb) next
(gdb) print fp
(gdb) quit
This is identical to the gdbserver workflow — GDB doesn't care whether the remote is a real Pi, QEMU, or an FPGA.
10.3 System Emulation (Overview)
For testing kernel changes or boot sequences, QEMU can emulate an entire ARM system:
# Boot a full ARM Linux in QEMU (example — paths vary)
qemu-system-arm -M vexpress-a9 -kernel zImage \
-dtb vexpress-v2p-ca9.dtb -initrd rootfs.cpio.gz \
-append "console=ttyAMA0" -nographic
This boots a complete Linux system — kernel, init, services — all emulated on your laptop. Useful for testing Buildroot images without flashing an SD card.
10.4 Comparison: When to Use Each Approach
| Scenario | Best Approach |
|---|---|
| Normal application debugging on Pi | GDB (local) or gdbserver (remote) |
| No Pi available / CI testing | QEMU user-mode |
| Need to debug with full IDE on laptop | gdbserver (remote) or QEMU + GDB |
| Testing kernel/boot changes | QEMU system emulation |
| Kernel panic, no dmesg output | JTAG (see Section 12) |
| Hardware-specific bug (timing, electrical) | Must use real hardware + oscilloscope |
Checkpoint 10
You can run ARM binaries on x86 with QEMU, attach GDB for debugging, and choose the right approach for different scenarios.
11. Kernel Logging Deep-Dive
Kernel messages are your primary tool for diagnosing driver and boot problems.
11.1 printk and Device Logging
In kernel code, printk writes to the kernel ring buffer:
// In a kernel driver
printk(KERN_INFO "mcp9808: probe successful, temp=%d\n", temp);
// Better: use dev_* functions — automatically include device name
dev_info(&client->dev, "probe successful, temp=%d\n", temp);
dev_err(&client->dev, "failed to read register: %d\n", ret);
| Function | Level | Use For |
|---|---|---|
dev_err |
Error | Failures that prevent operation |
dev_warn |
Warning | Recoverable issues |
dev_info |
Info | Successful initialization, key events |
dev_dbg |
Debug | Verbose tracing (off by default) |
pr_debug |
Debug | Module-level debug (no device context) |
11.2 Reading Kernel Logs
# Human-readable timestamps
dmesg -T
# Only errors and warnings
dmesg --level=err,warn
# Follow in real time (like tail -f for the kernel)
dmesg -w
# Filter by keyword
dmesg | grep -i spi
dmesg | grep -i "probe\|error\|fail"
11.3 journalctl for System Logs
# Kernel messages only (same as dmesg, but persistent across reboots)
journalctl -k
# Specific service
journalctl -u data-logger.service
# Last 5 minutes
journalctl --since "5 min ago"
# Only errors and above
journalctl -p err
# Follow in real time
journalctl -f
11.4 Structured Logging
For automated log parsing, use key=value format in your applications:
# In your application or script
echo "ts=$(date +%s) sensor=mcp9808 temp_mC=45200 status=ok" >> /var/log/sensor.log
# Parse with standard tools
grep "status=error" /var/log/sensor.log | awk -F'temp_mC=' '{print $2}' | cut -d' ' -f1
This is the same pattern used by journalctl --output=json and production logging frameworks.
Checkpoint 11
You can read kernel logs at different severity levels, filter with dmesg and journalctl, and understand the dev_* logging functions used in drivers.
12. JTAG/SWD Awareness
Note
This section is informational — no hardware setup required. JTAG debugging requires a debug probe (e.g., Segger J-Link, FTDI-based adapter) which is not part of the standard kit.
12.1 What Is JTAG/SWD?
JTAG (Joint Test Action Group) and SWD (Serial Wire Debug) are hardware debug interfaces — physical pins on the processor that let an external tool:
- Halt the CPU at any point
- Read/write memory and registers directly
- Set hardware breakpoints (no software instrumentation needed)
- Debug without any OS — works from first instruction after reset
12.2 When You Need It
| Symptom | Why GDB/SSH Won't Work | JTAG Can Help |
|---|---|---|
| Board doesn't boot at all | No OS = no SSH, no GDB | Halt at reset vector, step through bootloader |
| Kernel panic with no serial output | dmesg buffer may be lost | Read memory directly, inspect crash state |
| Driver causes hard lockup | CPU is frozen, no response | Halt CPU, inspect registers and call stack |
| Hardware bring-up (new board) | Nothing works yet | Verify CPU runs, test memory, load first code |
12.3 Raspberry Pi 4 JTAG
The Pi 4 exposes JTAG on GPIO pins 22-27. OpenOCD (Open On-Chip Debugger) is the software bridge:
OpenOCD translates GDB commands into JTAG signals. From GDB's perspective, it looks the same as target remote — but the target is raw hardware, not a running OS.
12.4 Decision Table: Which Debug Approach?
| Situation | Start With | Escalate To |
|---|---|---|
| App produces wrong output | printf / logging |
GDB |
| App crashes (segfault) | GDB + backtrace | strace (if syscall-related) |
| Service won't start | journalctl + systemctl | strace on ExecStart |
| Device not detected | dmesg + lsmod | I2C/SPI bus scan |
| Driver probe fails | dmesg + dev_err output | GDB on kernel module (advanced) |
| System won't boot | Serial console + dmesg | JTAG |
| Intermittent timing issue | Logging + timestamps | oscilloscope + logic analyzer |
Checkpoint 12
You can explain when JTAG is needed, how it differs from software debugging, and choose the right debugging approach for different failure modes.
When to Escalate
| Symptom | First Tool | Second Tool | Third Tool |
|---|---|---|---|
| App crashes silently | strace |
GDB + backtrace | dmesg |
| App produces wrong values | printf / logging |
GDB + breakpoints | valgrind |
| Intermittent failures | journalctl --since |
GDB + watchpoints | Logic analyzer |
| Hardware not responding | i2cdetect / lsmod |
dmesg + strace |
Oscilloscope |
| Kernel oops/panic | Serial console, dmesg |
journalctl -k -b -1 |
JTAG |
| Performance degradation | top, htop |
perf stat, iostat |
perf record + flamegraph |
| Service won't start | systemctl status |
journalctl -u |
strace on ExecStart |
| Missing library at runtime | ldd |
readelf -d |
LD_DEBUG=libs |
Challenge
Create a program debug_challenge.c with three deliberate bugs. Use the tools from this tutorial to find and fix each one:
- Bug 1 (strace): The program tries to open
/dev/thermalinstead of/sys/class/thermal/thermal_zone0/temp— find it with strace. - Bug 2 (GDB): An off-by-one error in a loop — find it with GDB breakpoints and variable inspection.
- Bug 3 (dmesg): The program tries to access
/dev/memwithout root — find the "Permission denied" in dmesg/strace.
Deliverable: For each bug, document:
- Which tool you used to find it
- The exact command you ran
- What the tool output showed
- How you fixed it
Summary
| Section | Tool | What You Learned |
|---|---|---|
| 1-7 | dmesg, journalctl, strace, systemctl | System-level diagnostics |
| 8 | GDB | Interactive debugging — breakpoints, backtrace, variable inspection |
| 9 | gdbserver | Remote cross-debugging (laptop → Pi) |
| 10 | QEMU | Run and debug ARM binaries without hardware |
| 11 | dmesg, journalctl | Kernel logging — severity levels, filtering, structured logs |
| 12 | JTAG/OpenOCD | Hardware debugging awareness — when software tools aren't enough |