Lesson 5: Graphics Stack
Óbuda University — Linux in Embedded Systems
Starting Point: What You Already Know
Every time you use your PC — Ubuntu, Windows, macOS — this happens:
Browser Editor Terminal File Manager
│ │ │ │
└───────────┴───────────┴──────────────┘
│
┌──────▼──────┐
│ Desktop │ Ubuntu GNOME / Windows DWM / macOS
│ Environment │
│ - arranges windows on screen
│ - routes keyboard and mouse to the right app
│ - draws shadows, taskbar, animations
└──────┬──────┘
┌──────▼──────┐
│ Display │ kernel graphics driver
│ Driver │
└──────┬──────┘
┌──────▼──────┐
│ Monitor │ HDMI / laptop panel
└─────────────┘
This is the full desktop stack — multiple windows, taskbar, animations, drag-and-drop. You use it every day.
Now imagine: your product has one fullscreen app, no file manager, no desktop, no overlapping windows. Do you still need all those layers?
The Embedded Question
Desktop (what you know): Embedded (what you're building):
Firefox VS Code Terminal Your single app
│ │ │ │
┌──▼────────▼────────▼───┐ ┌──────▼──────┐
│ Compositor │ │ DRM/KMS │ ← direct, no compositor
└──────────┬─────────────┘ └──────┬──────┘
┌──────────▼─────────────┐ ┌──────▼──────┐
│ DRM/KMS │ │ Display │
└──────────┬─────────────┘ └─────────────┘
┌──────────▼─────────────┐
│ Display │ Removed: compositor, window manager,
└────────────────────────┘ desktop environment, login screen
On embedded, you strip away layers until only the essential path remains. The question is: how far can you strip?
The goal of this lecture: understand what each layer does, so you can decide which ones to keep and which to remove.
Today's Map
- Block 1 (45 min): The display hardware pipeline, three graphics levels (from simplest to desktop), fbdev vs DRM/KMS architecture, GPU stack, display interfaces.
- Block 2 (45 min): Tearing experiment: display scan-out, tearing mechanism, VSync and page flipping, write-and-fix exercise.
What the Display Hardware Actually Does
Before comparing the three levels, understand the hardware that all of them sit on top of.
Every display system has the same pipeline — from pixels in memory to light on the screen:

The display controller reads from buffers in memory and scans out pixels row by row, synchronized to the pixel clock. This happens continuously — 60 times per second at 60 Hz.
The key question: how does your application tell the display controller which buffer to read?
The Display Hardware Pipeline — In Detail
| Stage | Hardware | What it does |
|---|---|---|
| Framebuffer | Memory buffer(s) | Stores pixel data — one or more buffers in RAM |
| Plane | Pixel mixer | Rotation, scaling, format conversion, layer blending |
| CRTC | Timing generator | Generates pixel clock, HSync, VSync — drives scan-out |
| Encoder | Interface adapter | Physical adaptation — converts to the wire protocol (TMDS, DSI, LVDS) |
| Bridge | Interface transcoder | Converts between display interfaces (e.g., DSI → DPI). Optional — not all paths have one. |
| Connector | Physical port | The socket: HDMI, DSI ribbon, SPI pins |
| Panel / Monitor | Display surface | Emits or reflects light. A panel is just the LCD; a monitor integrates a panel + housing + EDID. |
Pi 4 Examples
HDMI path: Framebuffer → Plane → CRTC → HDMI Encoder ──────────────► HDMI Monitor
(no bridge — direct TMDS)
DSI path: Framebuffer → Plane → CRTC → DSI Encoder → TC358762 bridge → 7" LCD Panel
(DSI → DPI transcoding)
The TC358762 is a bridge chip that converts DSI packets to parallel DPI signals — the LCD panel cannot speak DSI directly. This is common in embedded: the SoC outputs DSI, but the panel expects DPI, LVDS, or eDP, so a bridge chip translates.
This hardware chain exists whether you use fbdev, DRM, or a compositor. The difference is how much the software models it.
Inside the CRTC: Where Planes Become Pixels
The CRTC is the most complex stage. Here is what happens inside:
Plane 0 (primary) ──► DMA read ──┐
Plane 1 (overlay) ──► DMA read ──┤──► Compositor ──► Sync generator ──► Encoder
Plane 2 (cursor) ──► DMA read ──┘ (blend) (HSync, VSync,
pixel clock)
| Internal stage | What it does |
|---|---|
| Pixel fetch (DMA) | Reads pixel data from each plane's buffer in memory. The display controller has its own DMA engine — no CPU involvement. |
| Compositor | Blends all active planes together — alpha blending, z-ordering, scaling, color conversion. This is hardware compositing, not software. |
| Sync generator | Produces the timing signals: pixel clock, HSync (end of line), VSync (end of frame). These drive the encoder and ultimately the display. |
Why planes matter: A video player puts the video stream on one plane and subtitles on an overlay plane. The CRTC composites them in hardware — zero CPU work, zero memory copies. Without planes, the CPU would have to alpha-blend every frame.
Without planes (CPU compositing): With planes (HW compositing):
CPU reads video buffer Plane 0 → video buffer
CPU reads subtitle buffer Plane 1 → subtitle buffer
CPU blends pixel-by-pixel CRTC blends in hardware
CPU writes to display buffer → zero CPU work per frame
→ CPU busy every frame
Hardware planes are the reason DRM/KMS can display video + UI overlay at 60 FPS on a low-power SoC without breaking a sweat.
Three Ways to Talk to This Hardware
Now you know what the display hardware does. Linux gives you three software paths to control it — from simplest to most capable:
| Level | Approach | What it hides | What it gives you |
|---|---|---|---|
| A | Raw Framebuffer (fbdev) | Everything — one flat buffer | open(), mmap(), write pixels |
| B | DRM/KMS | Nothing — full pipeline exposed | Planes, CRTC, page flip, VSync |
| C | Full Compositor (Wayland/X11) | DRM details — apps just render | Multiple windows, input routing |
You started at Level C (your laptop desktop). Embedded systems work at Level A or B. Let's look at each.
Level A — Framebuffer (fbdev)
fbdev gives you the simplest possible view of this hardware: one flat buffer.
Your Application
│
│ open("/dev/fb0")
│ mmap() → pointer to pixel memory
│ write pixels directly
│
▼
┌──────────────────────────────────────────────────┐
│ fbdev kernel driver │
│ │
│ ┌──────────┐ │
│ │ Buffer │ ← your pixels go here │
│ └────┬─────┘ │
│ │ │
│ ▼ (everything below is hidden from you) │
│ Plane → CRTC → Encoder → Connector → Panel │
└──────────────────────────────────────────────────┘
fbdev hides the display pipeline. You get one buffer, one resolution (set at boot or by fbset), no timing control, no page flipping. The driver handles everything internally.
The simplicity is the point. open(), mmap(), write pixels. Done.
fbdev — What You Can and Cannot Do
| Capability | fbdev | Notes |
|---|---|---|
| Write pixels | Yes | mmap() + direct memory writes |
| Read resolution | Yes | ioctl(FBIOGET_VSCREENINFO) |
| Change resolution | Fragile | fbset — not all drivers support it |
| VSync / page flip | No | You write while the display reads → tearing |
| Multiple planes | No | One buffer, one layer |
| Multi-display | No | Each /dev/fbN is independent, no coordination |
| GPU acceleration | No | CPU draws every pixel |
fbdev is excellent for quick experiments and simple displays (OLED, e-ink, small LCDs). For production on modern hardware, prefer DRM/KMS.
Level B — DRM/KMS (Kernel Mode Setting)
DRM/KMS exposes the full hardware pipeline to userspace:
Your Application
│
│ open("/dev/dri/card0")
│ enumerate connectors → find display
│ set mode → resolution + timing
│ allocate dumb buffer → draw pixels
│ page flip at VBlank → tear-free
│
▼
┌────────────────────────────────────────────────────────────┐
│ DRM/KMS kernel subsystem │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Buffer A │ │ Buffer B │ │ Buffer C │ (GEM objects) │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Primary │ │ Overlay │ │ Cursor │ (drm_plane) │
│ │ Plane │ │ Plane │ │ Plane │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ └──────────────┴─────────────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ │ CRTC │ timing + pixel streaming │
│ └─────┬─────┘ (drm_crtc) │
│ ┌─────▼─────┐ │
│ │ Encoder │ protocol adaptation │
│ └─────┬─────┘ (drm_encoder) │
│ ┌─────▼─────┐ │
│ │ Bridge │ interface transcoding (opt.) │
│ └─────┬─────┘ (drm_bridge) │
│ ┌─────▼─────┐ │
│ │ Connector │ physical port │
│ └─────┬─────┘ (drm_connector) │
└──────────────────────┼─────────────────────────────────────┘
▼
┌──────────────┐
│ Panel/Monitor│ (drm_panel)
└──────────────┘
You see every stage and its kernel struct. You choose which buffer maps to which plane, when the page flip happens, which connector to use. The hardware pipeline is no longer hidden — DRM models it directly.
fbdev vs DRM/KMS — Architecture Comparison
fbdev: DRM/KMS:
┌─────────────┐ ┌─────────────┐
│ Application │ │ Application │
└──────┬──────┘ └──────┬──────┘
│ │
open("/dev/fb0") open("/dev/dri/card0")
mmap() libdrm / ioctl
write pixels enumerate, configure, flip
│ │
┌──────▼──────┐ ┌──────▼──────┐
│ fb driver │ │ DRM core │
│ (one buffer │ │ (buffers, │
│ one mode │ │ planes, │
│ hidden HW) │ │ CRTCs, │
└──────┬──────┘ │ encoders, │
│ │ connectors)│
▼ └──────┬──────┘
Display HW │
▼
Display HW
| Aspect | fbdev | DRM/KMS |
|---|---|---|
| Hardware model | Flat buffer — pipeline hidden | Full pipeline — planes, CRTC, encoder, bridge, connector |
| Buffer management | One buffer, driver-managed | Multiple buffers, app-managed (GEM) |
| Mode setting | fbset (fragile) |
drmModeSetCrtc() (reliable) |
| VSync / page flip | Not supported | drmModePageFlip() at VBlank |
| Multiple displays | Separate /dev/fbN, no coordination |
Single /dev/dri/card0, coordinated |
| Hardware planes | Not exposed | Primary, overlay, cursor — HW compositing |
| API stability | Deprecated since ~2015 | Current kernel standard |
| Kernel code path | Many fbdev drivers are DRM wrappers now | Native |
| Device node | /dev/fb0 |
/dev/dri/card0 |
The takeaway: fbdev pretends the hardware is a flat buffer. DRM/KMS models what the hardware actually is.
DRM Objects on Real Hardware
On a Raspberry Pi 4 with HDMI and DSI connected:
Framebuffer A ──► Primary Plane 0 ──┐
Framebuffer B ──► Overlay Plane 0 ──┤
Framebuffer C ──► Cursor Plane 0 ───┤
▼
CRTC 0 ──► HDMI Encoder ───────────────► HDMI-A-1 ──► Monitor
(no bridge — direct TMDS)
Framebuffer D ──► Primary Plane 1 ──┐
▼
CRTC 1 ──► DSI Encoder ──► TC358762 ──► DSI-1 ──► 7" LCD
(bridge) (connector)
The HDMI path has no bridge — the encoder outputs TMDS directly to the connector. The DSI path has a bridge chip (TC358762) that converts DSI to DPI for the LCD panel.
Inspect on your Pi:
# List all DRM objects
sudo modetest -M vc4 # shows connectors, encoders, CRTCs, planes
# Or with Python
python3 -c "import subprocess; subprocess.run(['modetest', '-M', 'vc4', '-c'])"
Full Software Stack — Without GPU
For CPU-rendered applications (fbdev-style drawing through DRM):
┌─────────────────────────────────────────────────────────┐
│ Your Application (C / Python / SDL2) │
│ draw_pixel(x, y, color) │
├─────────────────────────────────────────────────────────┤
│ libdrm (user-space library) │
│ drmModeSetCrtc(), drmModePageFlip() │
│ drmIoctl() → /dev/dri/card0 │
├─────────────────────────────────────────────────────────┤
│ DRM/KMS Core (kernel) │
│ mode setting, buffer management, VBlank events │
├─────────────────────────────────────────────────────────┤
│ GEM (Graphics Execution Manager) │
│ allocates "dumb buffers" in video/system memory │
├─────────────────────────────────────────────────────────┤
│ Display Controller Hardware (vc4 / v3d on Pi) │
│ reads buffer → CRTC → encoder → connector → panel │
└─────────────────────────────────────────────────────────┘
No GPU involved. The CPU writes pixels to a dumb buffer. The display controller hardware scans it out. This is what modetest and our DRM/KMS tutorials use.
Full Software Stack — With GPU (OpenGL / Vulkan)
For GPU-accelerated rendering (3D, animations, Qt QML):
┌─────────────────────────────────────────────────────────┐
│ Your Application │
│ glDrawArrays(), SDL_RenderPresent() │
├─────────────────────────────────────────────────────────┤
│ OpenGL ES / Vulkan API │
├─────────────────────────────────────────────────────────┤
│ Mesa (user-space GPU driver) │
│ translates GL calls → GPU commands │
│ manages shader compilation, state tracking │
├─────────────────────────────────────────────────────────┤
│ GBM (Generic Buffer Manager) │
│ allocates GPU-accessible render targets │
├─────────────────────────────────────────────────────────┤
│ EGL (platform glue) │
│ connects GL context to DRM display surface │
│ EGL_PLATFORM_GBM → no compositor needed │
├─────────────────────────────────────────────────────────┤
│ DRM/KMS Core (kernel) │
│ page flip rendered buffer to display │
├─────────────────────────────────────────────────────────┤
│ GPU Hardware (V3D on Pi) Display Controller │
│ executes shaders, scans out buffer │
│ rasterizes triangles CRTC → encoder → out │
└─────────────────────────────────────────────────────────┘
Mesa is the open-source GPU driver stack. On the Pi 4, it uses the V3D driver for the VideoCore VI GPU. Mesa translates OpenGL/Vulkan calls into GPU hardware commands.
EGL is the glue between the rendering API (OpenGL) and the display system (DRM). On embedded Linux without a compositor, EGL binds directly to GBM/DRM — this is what Qt EGLFS and SDL2 KMSDRM use.
Where SDL2 and Qt Fit
SDL2 and Qt are application toolkits — they sit on top of these stacks and choose the right path:
SDL2 Qt
│ │
┌──────────┼──────────┐ ┌──────────┼──────────┐
│ │ │ │ │ │
KMSDRM fbcon Wayland EGLFS Wayland XCB
backend backend backend plugin plugin plugin
│ │ │ │ │ │
DRM/KMS fbdev Compositor EGL+DRM Compositor X11
(direct) (legacy) (desktop) (direct) (desktop) (legacy)
For embedded (no compositor): - SDL2 → KMSDRM backend → DRM/KMS directly - Qt → EGLFS plugin → EGL + DRM directly
For desktop: - SDL2 → Wayland backend → compositor → DRM/KMS - Qt → Wayland plugin → compositor → DRM/KMS
The application code does not change. The backend/plugin selection decides the display path.
# Force SDL2 to use DRM directly (no compositor)
export SDL_VIDEODRIVER=kmsdrm
./my_sdl2_app
# Force Qt to use EGLFS (no compositor)
export QT_QPA_PLATFORM=eglfs
./my_qt_app
Level C — Full Graphics Stack (Wayland/X11)
A compositor sits between your application and the display:
┌───────────────────────────────────────────┐
│ App 1 App 2 App 3 Cursor │
│ │ │ │ │ │
│ └─────────┴─────────┴─────────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ Compositor │ (Weston, Mutter)│
│ │ - window placement │
│ │ - input routing │
│ │ - GPU compositing │
│ └──────┬──────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ DRM/KMS │ │
│ └──────┬──────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ Display │ │
│ └─────────────┘ │
└───────────────────────────────────────────┘
The compositor manages window placement, input routing, and GPU-accelerated compositing. This is desktop Linux.
Full Stack Trade-offs
Pros:
- Rich UI — multiple windows, drag-and-drop, tooltips, cursor
- Hardware acceleration — GPU compositing, OpenGL/Vulkan
- UI toolkits — Qt, GTK, Flutter work out of the box
- Standard input handling — keyboard, mouse, touch, gestures
Cons:
- Adds 2-15 seconds to boot time (depending on stack)
- Consumes 50-200+ MB of RAM
- Extra buffering layer between app and display
- More components to configure, update, and debug
- More failure points — compositor crash = black screen
For a single fullscreen embedded app, the compositor manages windows that will never appear. You pay the full cost for zero benefit.
Comparison Table
| Approach | Boot Impact | Memory | CPU Overhead | Complexity |
|---|---|---|---|---|
| Raw framebuffer (fbdev) | None | ~1 MB | Minimal | Low |
| DRM/KMS (dumb buffer) | None | ~2-4 MB | Low | Medium |
| Wayland + Weston | +2-5 s | ~50-100 MB | Medium | High |
| X11 + Desktop | +5-15 s | ~200+ MB | High | Very High |
The difference between fbdev/DRM and a full compositor is not incremental — it is an order-of-magnitude jump in resource consumption and complexity.
On a 256 MB device with a 10-second boot budget, a compositor consumes half your RAM and half your boot time before your application even starts.
Wayland vs X11

Embedded vs Desktop Mindset
On desktop Linux, the graphics stack is chosen for you. Your distribution ships GNOME or KDE with a Wayland compositor. You never think about it.
On embedded Linux, you choose explicitly. Every component is a decision.
| Resource | Desktop (8 GB RAM, SSD) | Embedded (256 MB RAM, eMMC) |
|---|---|---|
| 100 MB for compositor | 1.25% of RAM | 39% of RAM |
| 5 s for compositor boot | Unnoticeable | 50% of boot budget |
| 20 packages to maintain | Lost in 2000+ packages | 20% of total image |
What is invisible on desktop dominates on embedded. This is why embedded engineers must understand the graphics stack — not just use it.
Four Decision Factors
When choosing your graphics level, evaluate these four factors:
1. Boot time — heavier stacks take longer to initialize. A compositor adds seconds. fbdev/DRM add nothing.
2. Reliability — more components = more failure points. A compositor crash means black screen. Direct DRM means one less thing to break.
3. Maintenance cost — the compositor needs configuration, updates, and debugging. Direct rendering has fewer moving parts to maintain over a 10-year product lifecycle.
4. UI complexity — only use a heavy stack if you actually need its features. Multiple overlapping windows? You need a compositor. Single fullscreen app? You do not.
Start from the lightest option that meets requirements. Move up only when you hit a concrete limitation.
"No GUI" Still Needs Graphics
A common misconception: removing the desktop environment means giving up display output.
Wrong. Most embedded Linux products with displays run without a desktop but still draw to screen.
Key distinction:
| Concept | What It Means |
|---|---|
| Desktop GUI | Window manager, taskbar, file manager, system tray |
| Display output | Application renders directly to hardware |
Removing the desktop removes window management — not the ability to put pixels on a screen. The vast majority of embedded displays (kiosks, HMIs, dashboards, digital signage) have no desktop environment at all.
Common Headless Display Patterns
Real products that render to display without a desktop:
-
PIL/Pillow -> fbi -> framebuffer — industrial panels, point-of-sale terminals. Generate an image in Python, push it to
/dev/fb0. -
OpenCV -> framebuffer — machine vision HMIs. Process camera frames, render results directly to display.
-
DRM dumb buffer — kiosks, digital signage, transportation displays. Allocate a buffer, draw pixels, page flip.
-
Custom fb driver — LED matrices, e-ink displays, segment LCDs. Write a minimal kernel driver that exposes
/dev/fb0for non-standard display hardware. -
SDL2 + DRM backend — games, simulators, status dashboards. SDL2 can render directly via DRM/KMS without any compositor.
All of these produce display output. None of them need a window manager.
Decision Flowchart
Need display output?
|
+----+----+
No Yes
| |
[Done] Multiple windows needed?
|
+----+----+
Yes No
| |
[Wayland/ Need HW acceleration (GPU)?
X11] |
+----+----+
Yes No
| |
[DRM/KMS Quick prototype / simple display?
+ GPU] |
+----+----+
Yes No
| |
[fbdev] [DRM/KMS
dumb buffer]
Follow this flowchart from top to bottom. Most embedded products land on DRM/KMS (dumb buffer) or fbdev. Only products with genuine multi-window needs should reach for a compositor.
From Software to Wire: The Physical Display Pipeline
The graphics stack (fbdev/DRM/KMS/compositor) is the software side. Below it, the SoC's display controller pushes pixels over a physical interface to the panel:
Application ──► DRM/KMS ──► Display Controller ──► Physical Interface ──► Panel
(SoC HW)
┌─────────────────────────────────────────────────┐
│ Which interface? │
│ │
│ HDMI: TMDS encoding → 3 data + 1 clk pair │
│ DSI: D-PHY packets → 2 data + 1 clk lane │
│ SPI: CPU-driven → 1 data line (no GPU!) │
└─────────────────────────────────────────────────┘
HDMI and DSI are GPU-driven — the display controller reads from the DRM buffer and clocks pixels out automatically. SPI is CPU-driven — your code (or DMA) must push every pixel through the SPI bus.
Physical Interface Bandwidth on the Pi
| Interface | Bandwidth | Max Resolution | GPU Driven? | Cable |
|---|---|---|---|---|
| HDMI 2.0 | 18 Gbit/s | 4K @ 60 FPS | Yes | Micro-HDMI |
| MIPI DSI (2-lane) | ~2 Gbit/s | 800×480 @ 60 FPS | Yes | 15-pin FPC ribbon |
| SPI | ~32 Mbit/s | 320×240 @ 25 FPS | No (CPU) | GPIO wires |
Why does DSI use so much less bandwidth than HDMI? Smaller resolution. The 7" DSI panel (800×480) needs ~553 Mbit/s. A 4K HDMI monitor (3840×2160) needs ~12 Gbit/s. The interface matches the panel.
Quick bandwidth formula:
BW = Width × Height × BitsPerPixel × FPS × overhead
800 × 480 × 24 × 60 × 1.2 = 663 Mbit/s (DSI, 7" panel)
1920 × 1080 × 24 × 60 × 1.25 = 4.5 Gbit/s (HDMI, 1080p monitor)
Theory: Camera and Display Interfaces — D-PHY signaling, CSI-2, DSI packets, EDID, bandwidth math
UI Toolkit: Qt vs SDL2
You've chosen DRM/KMS — now pick your application-level toolkit.
The kernel display path decides how pixels reach the screen. The toolkit decides how your application produces those pixels.
| Qt + EGLFS | SDL2 + KMS/DRM | |
|---|---|---|
| What it is | Full UI framework, renders via EGL directly on KMS | Minimal render loop, you draw everything |
| Runtime footprint | ~30-80 MB | ~2-5 MB |
| GPU required? | Yes (EGL/OpenGL) | Optional |
| Best for | Dashboards, menus, touch HMIs | Gauges, data viz, custom rendering |
| UI components | Widgets, QML, animations built-in | None — bring your own |
| Cross-SoC portability | Excellent | Good (but UI is custom) |
Qt + EGLFS = invest upfront in framework, get layout/touch/animations for free. SDL2 + KMS/DRM = minimal footprint, maximum control, build UI yourself.
The Hybrid Sweet Spot
Many production HMIs combine both approaches:
┌─────────────────────────────────┐
│ Qt Quick (QML) │
│ ┌───────┐ ┌───────┐ ┌───────┐ │
│ │ Menu │ │Status │ │ Nav │ │ ← QML handles UI chrome
│ └───────┘ └───────┘ └───────┘ │
│ ┌───────────────────────────┐ │
│ │ Custom OpenGL/Vulkan │ │ ← GPU renders gauges,
│ │ render area │ │ waveforms, 3D views
│ └───────────────────────────┘ │
└─────────────────────────────────┘
- QML for menus, status bars, touch navigation — saves development time
- OpenGL/Vulkan scene node for real-time gauges and data visualization — full GPU control
- Both run in a single process on EGLFS — no compositor needed
For the labs: start with SDL2 (smallest footprint, teaches the hardware path). Move to Qt + EGLFS for the dashboard project.
Pitfall 1 — fbdev Compatibility Shim on DRM Systems
Modern kernels may expose /dev/fb0 as a compatibility layer over a DRM driver. This looks like fbdev but does not behave identically.
Your app thinks: Reality:
┌────────────┐ ┌────────────┐
│ /dev/fb0 │ │ /dev/fb0 │ (compat shim)
│ (fbdev) │ │ | │
└─────┬──────┘ │ DRM/KMS │ (actual driver)
| │ | │
Hardware │ Hardware │
└────────────┘
Page flipping, mode setting, and buffer management behave differently through the shim. Double buffering may not work. Mode changes may be ignored.
Rule: If the kernel uses a DRM driver, use the DRM API directly. Do not rely on the fbdev compatibility layer for anything beyond quick tests.
Pitfall 2 — Pixel Format and Stride Mismatch
Display hardware expects pixels in a specific format. Your renderer may produce a different one.
| Format | Bytes/pixel | Layout |
|---|---|---|
| RGB565 | 2 | 5 red, 6 green, 5 blue |
| RGB888 | 3 | 8 red, 8 green, 8 blue |
| ARGB8888 | 4 | 8 alpha, 8 red, 8 green, 8 blue |
| BGR888 | 3 | 8 blue, 8 green, 8 red (swapped) |
Stride (bytes per row) may include padding for alignment. A 800-pixel-wide RGB888 display might have stride = 2400 or stride = 2432 (padded to 64-byte boundary).
If you assume the wrong format or stride, the image appears garbled, color-shifted, or diagonally skewed. Always query the actual format and stride from the driver — never hardcode them.
Pitfall 3 — Adding a Compositor When Not Needed
Every additional layer is a potential failure point:
Without compositor: With compositor:
┌──────────┐ ┌──────────┐
│ App │ │ App │
└────┬─────┘ └────┬─────┘
| |
┌────▼─────┐ ┌────▼─────┐
│ DRM/KMS │ │ Wayland │ ← can crash
└────┬─────┘ └────┬─────┘
| ┌────▼─────┐
┌────▼─────┐ │ Weston │ ← can crash
│ Display │ └────┬─────┘
└──────────┘ ┌────▼─────┐
│ DRM/KMS │
└────┬─────┘
┌────▼─────┐
│ Display │
└──────────┘
Single fullscreen app + compositor = longer boot time + more complexity + more failure modes, with zero functional benefit. The compositor manages windows that will never appear.
Start minimal. Add complexity only when you hit a concrete limitation.
Pitfall 4 — No Fallback if Display Init Fails
If the display is unplugged, the cable is damaged, or the driver probe fails at boot, what happens to your application?
Bad design: Application blocks on display init, never starts, product appears dead.
Good design: Display is treated as an optional output, not a hard dependency.
App starts
|
+---> Try to open display
| |
| +----+----+
| OK FAIL
| | |
| Render Log warning, continue without display
| to (network, logging, control still work)
| display
|
+---> Core logic runs regardless
Design the display as one of several outputs. The product should still function (logging, network, control) even if the screen is missing.
Understanding the Full Desktop Stack
Before we dismiss the compositor approach for embedded, let's understand what it actually does — this helps you recognize when you truly need it and when you don't.
Key Terms — What Is What?
These terms are often confused. Here's exactly what each one is:
| Name | What it IS | What it DOES | What it does NOT do |
|---|---|---|---|
| Wayland | A protocol (not software) | Defines how apps talk to the compositor — buffer sharing, input events, window surfaces | Does not draw anything. Does not manage windows. It's a specification, like HTTP. |
| Weston | A compositor (reference implementation) | Implements the Wayland protocol. Combines app buffers, routes input, outputs to DRM. Minimal, used in embedded. | Not a toolkit. Not a desktop environment. No taskbar, no app launcher. |
| Mutter | A compositor (GNOME's) | Same job as Weston but part of GNOME. Adds desktop features: workspaces, overview, animations. | Not standalone — needs GNOME Shell. Too heavy for embedded. |
| Sway | A compositor (tiling) | Wayland compositor inspired by i3. Tiling window layout, keyboard-driven. | No embedded profile. Desktop-focused. |
| KWin | A compositor (KDE's) | KDE Plasma's compositor. Rich effects, desktop integration. | Very heavy. Not for embedded. |
| Xorg | A display server (X11) | The X11 server. Receives draw commands from apps, renders to screen, routes input. | Does not composite by itself — needs a separate compositor (picom, compton) for transparency. |
Wayland Is a Protocol, Not a Program
This is the most common misconception. You don't "install Wayland" — you install a compositor that speaks the Wayland protocol.
"I use Wayland" actually means:
┌─────────────────────────────────────────────────────┐
│ │
│ App (GTK/Qt) ──── Wayland protocol ──── Weston │
│ │
│ The protocol ← this is "Wayland" │
│ The compositor ← this is "Weston" │
│ │
└─────────────────────────────────────────────────────┘
Like saying "I use HTTP" — you mean you use a browser (Chrome)
that speaks HTTP to a server (Nginx). HTTP is the protocol.
Wayland is the protocol. Weston/Mutter/Sway is the "browser."
Why this matters for embedded: If someone says "use Wayland on the Pi," the real question is: which compositor? Weston is the lightweight choice. Mutter (GNOME) would be far too heavy.
Weston — The Embedded Compositor
Weston is the reference implementation of the Wayland protocol, maintained by the same team. It's designed to be minimal:
What Weston does:
- Accepts connections from Wayland client apps
- Receives their rendered buffers (shared GPU memory)
- Composites all buffers into the final screen image
- Routes input events (touch, keyboard, mouse) to the focused app
- Outputs the composited image via DRM/KMS
- Handles display hotplug (HDMI connected/disconnected)
What Weston does NOT do:
- No taskbar, no app launcher, no system tray
- No window decorations (no title bars, close buttons)
- No file manager, no settings panel
- No login screen (use a separate program for that)
Weston provides the plumbing — apps appear on screen and receive input. Everything else (UI, layout, interaction) is the application's responsibility.
Embedded use: Weston's "kiosk shell" plugin runs a single app fullscreen with no chrome — essentially a Wayland-speaking DRM wrapper. Some products use this instead of direct DRM access when they want Wayland protocol compatibility (e.g., for Flutter or Chromium).
Xorg — The Legacy Display Server
Xorg is the implementation of the X11 protocol. It's a single large process (~500K lines) that:
What Xorg does:
- Owns the display — apps cannot draw directly, they ask Xorg to draw for them
- Manages a shared 2D canvas (the "root window")
- Routes keyboard/mouse events to the focused window
- Provides network transparency — apps can run on one machine, display on another (
ssh -X) - Loads input drivers (keyboard, mouse, touchpad) and display drivers
What Xorg does NOT do:
- Does not decide where windows go — that's the window manager (a separate process)
- Does not composite (blend) overlapping windows by default — needs a compositor (picom, compton)
- Does not provide a desktop environment — that's GNOME/KDE/XFCE running on top
Why X11 is declining: The "apps can't draw directly" design adds latency. The "any app can read any window" design is a security hole. The shared-canvas model doesn't work well with GPUs. Wayland fixes all three by letting apps render to their own buffers.
Compositor vs Window Manager vs Desktop Environment
These three concepts are often conflated. They are different layers:
┌───────────────────────────────────────────────┐
│ Desktop Environment (GNOME, KDE, XFCE) │ ← the "experience"
│ Taskbar, app launcher, file manager, │ (apps + config + theme)
│ settings, notifications, lock screen │
│ │
│ ┌─────────────────────────────────────────┐ │
│ │ Window Manager (Mutter, KWin, i3) │ │ ← window placement
│ │ Position, size, stacking, focus, │ │ rules and policy
│ │ keyboard shortcuts, tiling/floating │ │
│ │ │ │
│ │ ┌───────────────────────────────────┐ │ │
│ │ │ Compositor (built-in or picom) │ │ │ ← pixel blending
│ │ │ Transparency, shadows, blur, │ │ │ the "how" of
│ │ │ animations, buffer management │ │ │ putting it on screen
│ │ └───────────────────────────────────┘ │ │
│ └─────────────────────────────────────────┘ │
└───────────────────────────────────────────────┘
On Wayland: the compositor and window manager are the same process (Mutter, Sway, Weston). You can't mix and match.
On X11: they're separate. You can run i3 (tiling WM) + picom (compositor) on Xorg. Or Openbox (floating WM) with no compositor at all.
On embedded: you skip all three. Your app talks to DRM directly.
Putting It All Together — Who Uses What?
| Product / Use case | Graphics approach | Why |
|---|---|---|
| Your laptop (Ubuntu) | Mutter (Wayland compositor) + GNOME | Multiple apps, desktop experience |
| Raspberry Pi Desktop | Wayfire (Wayland compositor) + RPi Desktop | Full desktop for education |
| Automotive HMI | Weston kiosk + Qt EGLFS | Single app, Wayland protocol for IVI |
| Industrial panel | Qt EGLFS on DRM | Single app, no compositor overhead |
| Our course labs | SDL2 on DRM/KMS | Minimal, teaches hardware path |
| Our Qt launcher | Qt EGLFS on DRM | Rich UI without compositor |
| Digital signage | DRM dumb buffer | Static content, minimal CPU |
| ATM / POS terminal | Weston kiosk or DRM direct | Security + single app |
Notice: even commercial products that "use Wayland" often use Weston's kiosk shell — which is essentially a thin layer over DRM that adds Wayland protocol compatibility for the app framework.
X11 Architecture (Legacy Desktop)
X11 (1987) uses a client-server model where the display server owns the screen:
┌────────┐ ┌────────┐ ┌────────┐
│ App 1 │ │ App 2 │ │ App 3 │ ← X11 clients
└───┬────┘ └───┬────┘ └───┬────┘
│ │ │ X11 protocol
└───────────┴───────────┘ (network-transparent)
│
┌──────▼──────┐
│ X Server │ ← owns the screen
│ (Xorg) │
│ ┌────────┐ │
│ │ Window │ │ ← decides where windows go
│ │ Manager│ │
│ └────────┘ │
│ ┌────────┐ │
│ │ Compos-│ │ ← blends overlapping windows
│ │ itor │ │
│ └────────┘ │
└──────┬──────┘
│
┌──────▼──────┐
│ DRM/KMS │ ← hardware
└─────────────┘
Key idea: Apps don't touch the display — they send draw commands to the X Server, which renders on their behalf. The Window Manager is a separate process that tells the X Server where to position each window.
Wayland Architecture (Modern Desktop)
Wayland (2012) merges the server, window manager, and compositor into one process:
┌────────┐ ┌────────┐ ┌────────┐
│ App 1 │ │ App 2 │ │ App 3 │ ← Wayland clients
└───┬────┘ └───┬────┘ └───┬────┘
│ │ │
│ Each app renders │ Apps render to their OWN
│ to a buffer (GPU) │ buffer — not shared
│ │ │
└───────────┴───────────┘ Wayland protocol
│
┌──────▼──────┐
│ Compositor │ ← ONE process does everything:
│ (Weston, │ window placement
│ Mutter, │ input routing
│ Sway) │ GPU compositing
└──────┬──────┘ output to display
│
┌──────▼──────┐
│ DRM/KMS │
└─────────────┘
Key difference from X11: Apps render to their own buffers (not through the server). The compositor only combines the finished buffers into the final screen image. This is simpler, more secure (apps can't snoop on each other's pixels), and lower latency.
What Each Layer Does
| Layer | Role | Desktop example | Can you skip it? |
|---|---|---|---|
| Window Manager | Decides window position, size, decorations (title bar, close button), stacking order | GNOME Shell, KWin, i3, Sway | Yes — if you have one fullscreen app |
| Compositor | Blends multiple app buffers into one image (transparency, shadows, animations), outputs to display | Mutter, Weston, Picom | Yes — if you have one fullscreen app |
| Display Server | Routes input events (keyboard, mouse) to the correct app, manages shared display access | Xorg (X11) or built into compositor (Wayland) | Yes — handle input yourself (evdev/SDL2) |
| Toolkit | Draws widgets (buttons, text, lists), handles layout | GTK, Qt, Flutter | Optional — you can draw pixels directly |
On embedded: you typically skip the first three layers entirely. Your single app opens DRM directly, renders with SDL2 or Qt EGLFS, and reads input from /dev/input/ or SDL2's event system.
X11 vs Wayland — Quick Comparison
| X11 | Wayland | |
|---|---|---|
| Age | 1987 (40+ years) | 2012 (~13 years) |
| Architecture | Client → Server renders | Client renders → Compositor combines |
| Network transparency | Built-in (forward over SSH) | Not built-in (use pipewire/RDP) |
| Security | Any app can read any window (screen capture is trivial) | Apps are isolated by default |
| Tearing | Common (requires compositor hacks) | Solved by design |
| Code complexity | ~500K lines (Xorg) | ~50K lines (Weston) |
| Embedded use | Mostly legacy | Weston has an embedded profile |
For this course: we skip both X11 and Wayland. Our apps use DRM/KMS directly (SDL2, Qt EGLFS). But knowing the layers helps when you debug a desktop system or explain to a manager why the kiosk doesn't need a desktop environment.
Qt EGLFS — The Embedded Shortcut
Qt's EGLFS platform plugin lets you run a Qt application fullscreen on DRM/KMS without any compositor:
Desktop Qt: Embedded Qt (EGLFS):
┌─────────────┐ ┌─────────────┐
│ Qt App │ │ Qt App │
└──────┬──────┘ └──────┬──────┘
│ │
┌──────▼──────┐ ┌──────▼──────┐
│ Wayland / │ │ EGL + DRM │ ← direct GPU
│ X11 │ │ (no compositor)
└──────┬──────┘ └──────┬──────┘
┌──────▼──────┐ ┌──────▼──────┐
│ Compositor │ │ Display │
└──────┬──────┘ └─────────────┘
┌──────▼──────┐
│ DRM/KMS │
└──────┬──────┘
┌──────▼──────┐
│ Display │
└─────────────┘
3 extra layers 0 extra layers
EGLFS = "EGL Full Screen." It gives you all of Qt's UI power (QML, touch, animations) with the performance of direct DRM access. This is what the Qt App Launcher uses.
FAQ — Common Student Questions
Q: Why can't I run two SDL2 apps at the same time? Because both try to open DRM and become "DRM master" — only one process can control the display at a time. This is by design. The Qt launcher solves this by releasing DRM master before spawning a child app, and reclaiming it after. A compositor would also solve it, but at the cost we discussed.
Q: Why does my app work in SSH but show a black screen on the Pi?
Graphics apps need access to /dev/dri/card0 (DRM) and possibly /dev/fb0. Over SSH, you're on a different TTY. Make sure the app runs on the correct VT, or use SDL_VIDEODRIVER=kmsdrm to force DRM mode. Also check permissions: the user needs to be in the video group.
Q: Can I use OpenGL without a compositor?
Yes. EGL can bind directly to a DRM device (EGL_PLATFORM_GBM). This is what SDL2's KMSDRM backend and Qt's EGLFS do. You get full GPU acceleration without any windowing system.
Q: Why is the Pi's display upside down / rotated?
The display panel's physical scanning direction may not match the expected orientation. Fix with display_rotate=2 in config.txt (fbdev) or video=DSI-1:panel_orientation=upside_down (DRM). KMS also supports rotation via the rotation plane property.
Q: Why does my framebuffer app work but the colors are wrong?
Pixel format mismatch. The display might expect BGR888 but you're writing RGB888 (red and blue swapped). Always query the format with ioctl(FBIOGET_VSCREENINFO) or check the DRM plane's format list. Common formats: ARGB8888, XRGB8888, RGB565.
Q: What's the difference between a window manager and a compositor? A window manager decides where windows go (position, size, stacking). A compositor blends all windows into the final image (handles transparency, shadows, animations). On Wayland, these are the same process. On X11, they can be separate (e.g., i3 window manager + picom compositor).
Q: Do I need a GPU for embedded graphics? Not necessarily. DRM "dumb buffers" are CPU-rendered. SDL2 can software-render via DRM. For simple UIs (status displays, dashboards), CPU rendering is fast enough. You need a GPU when: rendering complex 3D (OpenGL/Vulkan), running Qt QML with animations, or compositing multiple layers at high frame rates.
Block 1 Summary
Framebuffer (fbdev): simple, direct, legacy. Best for quick prototypes and simple displays. Deprecated — new drivers target DRM.
DRM/KMS: modern, hardware-aware, tear-free. The right default for embedded products. Works without GPU using dumb buffers.
Full stack (Wayland/X11): powerful, heavy, complex. Only justified when you need multiple windows or rich desktop-style UI. Wayland is simpler and more secure than X11. Both are overkill for single-app embedded.
Qt EGLFS: the embedded sweet spot when you need rich UI — full Qt power with zero compositor overhead.
Most embedded products: single-app fullscreen pipeline using DRM/KMS or framebuffer. No compositor, no window manager, no desktop.
Decision principle: start from the simplest option that meets your requirements. Move up only when you hit a concrete limitation that the simpler approach cannot solve.
Block 2
Experiment: "Write Pixels Faster Than Refresh"
The Question
What happens when your application writes pixels to the framebuffer faster than the display can show them?
The framebuffer is shared memory: your application writes to it, and the display controller reads from it — simultaneously, without coordination.
If the application writes faster than the display scans, the display will read partially updated data.
Let's understand why, and then see it happen.
Display Scan-Out Explained
The display controller reads the framebuffer line by line, top to bottom, at a fixed rate:
Framebuffer memory Display panel
┌──────────────────┐ ┌──────────────────┐
│ Row 0 │ -------> │ Row 0 │ <- scan position
│ Row 1 │ │ Row 1 │
│ Row 2 │ │ │
│ Row 3 │ │ │
│ ... │ │ │
│ Row 479 │ │ │
└──────────────────┘ └──────────────────┘
At 60 Hz: full scan every 16.7 ms
The controller reads ~29 rows per millisecond (for 480 rows)
After row 479, it returns to row 0 (VBlank interval)
The scan-out is a continuous, periodic process driven by the display hardware clock. It does not wait for your application. It does not check if the buffer is "ready."
The Tearing Mechanism
When the app overwrites the buffer while the display is reading it, you see parts of two different frames:
Frame N Frame N+1
┌──────────────┐ ┌──────────────┐
│ AAAAAAAAAAAA │ │ BBBBBBBBBBBB │
│ AAAAAAAAAAAA │ │ BBBBBBBBBBBB │
│ AAAAAAAAAAAA │ │ BBBBBBBBBBBB │
│ AAAAAAAAAAAA │ │ BBBBBBBBBBBB │
└──────────────┘ └──────────────┘
What you SEE (scan-out catches the switch mid-frame):
┌──────────────┐
│ AAAAAAAAAAAA │ <- scanned before app started writing
│ AAAAAAAAAAAA │
│ BBBBBBBBBBBB │ <- scanned after app wrote these rows (TEAR LINE)
│ BBBBBBBBBBBB │
└──────────────┘
The horizontal boundary where two frames meet is the tear line. Its position moves because the write speed and scan-out speed are not synchronized.
VSync and Page Flipping
Solution: do not write to the buffer the display is currently reading.
Double buffering with page flipping:
Back buffer (app draws here) Front buffer (display reads here)
┌──────────────────┐ ┌───────────────────┐
│ Frame N+1 │ │ Frame N │
│ (being drawn) │ │ (being displayed)│
└──────────────────┘ └───────────────────┘
| |
| At VBlank: SWAP |
+----------->--------------------+
pointers swap, display now reads Frame N+1
VBlank = the brief interval between the last row and the first row
of the next scan. Safe moment to switch buffers.
The app writes to the back buffer. When drawing is complete, it requests a page flip at VBlank. The display switches to the new buffer only between frames. Every displayed frame is complete — no tearing.
DRM/KMS supports this natively. fbdev does not.
Experiment Setup
Write solid colors to the framebuffer as fast as possible and observe the display:
# framebuffer_flood.py - write solid colors as fast as possible
import mmap, struct, time, os
fb = os.open('/dev/fb0', os.O_RDWR)
mm = mmap.mmap(fb, 800 * 480 * 4) # adjust to your resolution
colors = [0x00FF0000, 0x0000FF00, 0x000000FF] # R, G, B
while True:
for color in colors:
row = struct.pack('I', color) * 800
mm.seek(0)
for y in range(480):
mm.write(row)
This script writes red, then green, then blue — as fast as the CPU can go, with no synchronization to the display refresh.
Run it on a device with a connected display and look at the screen.
What to Observe
When you run framebuffer_flood.py, you will see:
- Horizontal tear lines where two colors meet mid-screen
- The tear position moves — sometimes near the top, sometimes near the bottom
- On fast CPUs, you may see multiple tear lines (three colors visible at once)
What you expect: What you see:
┌──────────────┐ ┌──────────────┐
│ RRRRRRRRRRRR │ │ RRRRRRRRRRRR │
│ RRRRRRRRRRRR │ │ RRRRRRRRRRRR │
│ RRRRRRRRRRRR │ │ GGGGGGGGGGGG │ <- tear
│ RRRRRRRRRRRR │ │ GGGGGGGGGGGG │
└──────────────┘ └──────────────┘
This is tearing — the display reads the buffer while the application is writing to it. The tear line appears wherever the scan-out position and the write position cross.
This is exactly why DRM/KMS with page flipping exists.
The Fix — DRM Page Flip
The proper solution uses double buffering with VSync through DRM/KMS:
Step 1: Allocate two dumb buffers (front and back)
Step 2: Draw to the back buffer (the display is not reading it)
Step 3: Request atomic page flip with DRM_MODE_PAGE_FLIP_EVENT
Step 4: Wait for VBlank event (kernel signals when flip completes)
Step 5: Swap front/back buffer pointers
Time -->
| Draw to back | Flip | Draw to back | Flip |
| buffer | at | buffer | at |
| (invisible) | VBlank | (invisible) | VBlank |
^ ^
| |
Display switches Display switches
to new buffer to new buffer
Result: every frame is fully drawn before the display reads it. No tearing. No timing hacks.
Fallback Pitfall — sleep() Is Not VSync
Some teams try to "fix" tearing by throttling write speed:
This is fragile and will fail because:
- CPU speed varies (thermal throttling, load changes)
sleep()precision is ~1-10 ms on Linux (not exact)- System load affects scheduling — your process may not wake on time
- Display refresh rate may not be exactly 60 Hz
- The sleep duration and scan-out timing drift relative to each other
The proper fix is VSync synchronization through DRM, where the kernel signals the exact VBlank moment. Sleep hacks create the illusion of working on a quiet system and break under real-world conditions.
Quick Checks
Before shipping a product with display output, answer these questions:
1. Is your display path deterministic at boot? Do you know exactly when the display initializes and what appears first? Or does it depend on service startup order?
2. Do you control mode setting explicitly? Are resolution, refresh rate, and pixel format set by your code? Or do you hope the defaults are correct?
3. Can the app recover if the display disconnects and reconnects? Hot-plug events happen — cable gets bumped, connector oxidizes, display power-cycles. Does your app handle this, or does it crash?
4. Does startup still meet your boot budget? After adding the display stack, is boot time still within the product requirement? Measure it — do not assume.
Mini Exercise
Given:
- Single fullscreen UI application
- Boot time requirement: < 10 seconds
- Remote update capability required
- Product lifecycle: 5+ years
Task: Select your graphics stack and justify your choice in 5 lines. Consider:
- Boot impact of your chosen approach
- Memory usage on a 256 MB system
- Maintenance cost over the product lifecycle
- What happens when the display cable is unplugged
Write your answer before looking at the next slide. There is no single correct answer, but there are answers that ignore constraints.
Key Takeaways
-
Framebuffer is simple but legacy — good for prototyping and simple displays, but deprecated and lacking VSync support.
-
DRM/KMS is the modern low-level choice — hardware-aware, tear-free, and the current kernel standard. Use dumb buffers when you do not need GPU acceleration.
-
Full stacks are powerful but heavy — only justified when you genuinely need multiple windows or rich desktop-style UI.
-
Embedded systems use single-app fullscreen pipelines — no compositor, no window manager, no desktop. This is the norm, not the exception.
-
Tearing is solved by VSync + page flipping, not by sleep hacks — DRM provides proper synchronization;
time.sleep()provides false confidence.
Hands-On Next
Put this theory into practice with the following tutorials:
Framebuffer Basics — draw pixels directly to
/dev/fb0, understand pixel formats and stride, render shapes and text from Python.OLED Framebuffer Driver — write a kernel framebuffer driver for the SSD1306 OLED over I2C. Implements
fb_info,fb_ops, and deferred I/O.Pong on Framebuffer — build a user-space game that opens
/dev/fbN, queries resolution viaioctl, and draws withmmap(). Works on both OLED and BUSE displays.DRM/KMS Test — use the modern graphics API: enumerate connectors, set display modes, allocate dumb buffers, perform tear-free page flips.
Display Applications — create interactive applications with OpenCV and evdev for touch/button input, rendering directly to the display without a compositor.