Lesson 5: Graphics Stack

Óbuda University — Linux in Embedded Systems

Starting Point: What You Already Know

Every time you use your PC — Ubuntu, Windows, macOS — this happens:

  Browser     Editor      Terminal     File Manager
    │           │           │              │
    └───────────┴───────────┴──────────────┘
                    │
             ┌──────▼──────┐
             │   Desktop   │  Ubuntu GNOME / Windows DWM / macOS
             │ Environment │
             │  - arranges windows on screen
             │  - routes keyboard and mouse to the right app
             │  - draws shadows, taskbar, animations
             └──────┬──────┘
             ┌──────▼──────┐
             │   Display   │  kernel graphics driver
             │   Driver    │
             └──────┬──────┘
             ┌──────▼──────┐
             │   Monitor   │  HDMI / laptop panel
             └─────────────┘

This is the full desktop stack — multiple windows, taskbar, animations, drag-and-drop. You use it every day.

Now imagine: your product has one fullscreen app, no file manager, no desktop, no overlapping windows. Do you still need all those layers?

The Embedded Question

  Desktop (what you know):              Embedded (what you're building):

  Firefox  VS Code  Terminal            Your single app
     │        │        │                      │
  ┌──▼────────▼────────▼───┐           ┌──────▼──────┐
  │     Compositor         │           │  DRM/KMS    │  ← direct, no compositor
  └──────────┬─────────────┘           └──────┬──────┘
  ┌──────────▼─────────────┐           ┌──────▼──────┐
  │       DRM/KMS          │           │  Display    │
  └──────────┬─────────────┘           └─────────────┘
  ┌──────────▼─────────────┐
  │       Display          │            Removed: compositor, window manager,
  └────────────────────────┘            desktop environment, login screen

On embedded, you strip away layers until only the essential path remains. The question is: how far can you strip?

The goal of this lecture: understand what each layer does, so you can decide which ones to keep and which to remove.

Today's Map

Block 1 (45 min): The display hardware pipeline, three graphics levels (from simplest to desktop), fbdev vs DRM/KMS architecture, GPU stack, display interfaces.
Block 2 (45 min): Tearing experiment: display scan-out, tearing mechanism, VSync and page flipping, write-and-fix exercise.

What the Display Hardware Actually Does

Before comparing the three levels, understand the hardware that all of them sit on top of.

Every display system has the same pipeline — from pixels in memory to light on the screen:

center

The display controller reads from buffers in memory and scans out pixels row by row, synchronized to the pixel clock. This happens continuously — 60 times per second at 60 Hz.

The key question: how does your application tell the display controller which buffer to read?

The Display Hardware Pipeline — In Detail

Stage	Hardware	What it does
Framebuffer	Memory buffer(s)	Stores pixel data — one or more buffers in RAM
Plane	Pixel mixer	Rotation, scaling, format conversion, layer blending
CRTC	Timing generator	Generates pixel clock, HSync, VSync — drives scan-out
Encoder	Interface adapter	Physical adaptation — converts to the wire protocol (TMDS, DSI, LVDS)
Bridge	Interface transcoder	Converts between display interfaces (e.g., DSI → DPI). Optional — not all paths have one.
Connector	Physical port	The socket: HDMI, DSI ribbon, SPI pins
Panel / Monitor	Display surface	Emits or reflects light. A panel is just the LCD; a monitor integrates a panel + housing + EDID.

Pi 4 Examples

  HDMI path:  Framebuffer → Plane → CRTC → HDMI Encoder ──────────────► HDMI Monitor
                                                          (no bridge — direct TMDS)

  DSI path:   Framebuffer → Plane → CRTC → DSI Encoder → TC358762 bridge → 7" LCD Panel
                                                          (DSI → DPI transcoding)

The TC358762 is a bridge chip that converts DSI packets to parallel DPI signals — the LCD panel cannot speak DSI directly. This is common in embedded: the SoC outputs DSI, but the panel expects DPI, LVDS, or eDP, so a bridge chip translates.

This hardware chain exists whether you use fbdev, DRM, or a compositor. The difference is how much the software models it.

Inside the CRTC: Where Planes Become Pixels

The CRTC is the most complex stage. Here is what happens inside:

  Plane 0 (primary)  ──► DMA read ──┐
  Plane 1 (overlay)  ──► DMA read ──┤──► Compositor ──► Sync generator ──► Encoder
  Plane 2 (cursor)   ──► DMA read ──┘    (blend)        (HSync, VSync,
                                                          pixel clock)

Internal stage	What it does
Pixel fetch (DMA)	Reads pixel data from each plane's buffer in memory. The display controller has its own DMA engine — no CPU involvement.
Compositor	Blends all active planes together — alpha blending, z-ordering, scaling, color conversion. This is hardware compositing, not software.
Sync generator	Produces the timing signals: pixel clock, HSync (end of line), VSync (end of frame). These drive the encoder and ultimately the display.

Why planes matter: A video player puts the video stream on one plane and subtitles on an overlay plane. The CRTC composites them in hardware — zero CPU work, zero memory copies. Without planes, the CPU would have to alpha-blend every frame.

  Without planes (CPU compositing):     With planes (HW compositing):

  CPU reads video buffer                 Plane 0 → video buffer
  CPU reads subtitle buffer              Plane 1 → subtitle buffer
  CPU blends pixel-by-pixel              CRTC blends in hardware
  CPU writes to display buffer           → zero CPU work per frame
  → CPU busy every frame

Hardware planes are the reason DRM/KMS can display video + UI overlay at 60 FPS on a low-power SoC without breaking a sweat.

Three Ways to Talk to This Hardware

Now you know what the display hardware does. Linux gives you three software paths to control it — from simplest to most capable:

Level	Approach	What it hides	What it gives you
A	Raw Framebuffer (fbdev)	Everything — one flat buffer	`open()`, `mmap()`, write pixels
B	DRM/KMS	Nothing — full pipeline exposed	Planes, CRTC, page flip, VSync
C	Full Compositor (Wayland/X11)	DRM details — apps just render	Multiple windows, input routing

You started at Level C (your laptop desktop). Embedded systems work at Level A or B. Let's look at each.

Level A — Framebuffer (fbdev)

fbdev gives you the simplest possible view of this hardware: one flat buffer.

  Your Application
      │
      │  open("/dev/fb0")
      │  mmap() → pointer to pixel memory
      │  write pixels directly
      │
      ▼
  ┌──────────────────────────────────────────────────┐
  │  fbdev kernel driver                             │
  │                                                  │
  │  ┌──────────┐                                    │
  │  │  Buffer  │ ← your pixels go here              │
  │  └────┬─────┘                                    │
  │       │                                          │
  │       ▼  (everything below is hidden from you)   │
  │  Plane → CRTC → Encoder → Connector → Panel      │
  └──────────────────────────────────────────────────┘

fbdev hides the display pipeline. You get one buffer, one resolution (set at boot or by fbset), no timing control, no page flipping. The driver handles everything internally.

The simplicity is the point. open(), mmap(), write pixels. Done.

fbdev — What You Can and Cannot Do

Capability	fbdev	Notes
Write pixels	Yes	`mmap()` + direct memory writes
Read resolution	Yes	`ioctl(FBIOGET_VSCREENINFO)`
Change resolution	Fragile	`fbset` — not all drivers support it
VSync / page flip	No	You write while the display reads → tearing
Multiple planes	No	One buffer, one layer
Multi-display	No	Each `/dev/fbN` is independent, no coordination
GPU acceleration	No	CPU draws every pixel

fbdev is excellent for quick experiments and simple displays (OLED, e-ink, small LCDs). For production on modern hardware, prefer DRM/KMS.

Level B — DRM/KMS (Kernel Mode Setting)

DRM/KMS exposes the full hardware pipeline to userspace:

  Your Application
      │
      │  open("/dev/dri/card0")
      │  enumerate connectors → find display
      │  set mode → resolution + timing
      │  allocate dumb buffer → draw pixels
      │  page flip at VBlank → tear-free
      │
      ▼
  ┌────────────────────────────────────────────────────────────┐
  │  DRM/KMS kernel subsystem                                  │
  │                                                            │
  │  ┌──────────┐  ┌──────────┐  ┌──────────┐                  │
  │  │ Buffer A │  │ Buffer B │  │ Buffer C │  (GEM objects)   │
  │  └────┬─────┘  └────┬─────┘  └────┬─────┘                  │
  │       │             │              │                       │
  │       ▼             ▼              ▼                       │
  │  ┌──────────┐   ┌──────────┐  ┌──────────┐                 │
  │  │ Primary  │   │ Overlay  │  │ Cursor   │  (drm_plane)    │
  │  │  Plane   │   │  Plane   │  │  Plane   │                 │ 
  │  └────┬─────┘   └────┬─────┘  └────┬─────┘                 │
  │       └──────────────┴─────────────┘                       │
  │                      │                                     │
  │                ┌─────▼─────┐                               │
  │                │   CRTC    │  timing + pixel streaming     │
  │                └─────┬─────┘  (drm_crtc)                   │
  │                ┌─────▼─────┐                               │
  │                │  Encoder  │  protocol adaptation          │
  │                └─────┬─────┘  (drm_encoder)                │
  │                ┌─────▼─────┐                               │
  │                │  Bridge   │  interface transcoding (opt.) │
  │                └─────┬─────┘  (drm_bridge)                 │
  │                ┌─────▼─────┐                               │
  │                │ Connector │  physical port                │
  │                └─────┬─────┘  (drm_connector)              │
  └──────────────────────┼─────────────────────────────────────┘
                         ▼
                  ┌──────────────┐
                  │ Panel/Monitor│  (drm_panel)
                  └──────────────┘

You see every stage and its kernel struct. You choose which buffer maps to which plane, when the page flip happens, which connector to use. The hardware pipeline is no longer hidden — DRM models it directly.

fbdev vs DRM/KMS — Architecture Comparison

  fbdev:                              DRM/KMS:
  ┌─────────────┐                     ┌─────────────┐
  │ Application │                     │ Application │
  └──────┬──────┘                     └──────┬──────┘
         │                                   │
    open("/dev/fb0")                    open("/dev/dri/card0")
    mmap()                              libdrm / ioctl
    write pixels                        enumerate, configure, flip
         │                                   │
  ┌──────▼──────┐                     ┌──────▼──────┐
  │  fb driver  │                     │  DRM core   │
  │ (one buffer │                     │ (buffers,   │
  │  one mode   │                     │  planes,    │
  │  hidden HW) │                     │  CRTCs,     │
  └──────┬──────┘                     │  encoders,  │
         │                            │  connectors)│
         ▼                            └──────┬──────┘
    Display HW                               │
                                             ▼
                                        Display HW

Aspect	fbdev	DRM/KMS
Hardware model	Flat buffer — pipeline hidden	Full pipeline — planes, CRTC, encoder, bridge, connector
Buffer management	One buffer, driver-managed	Multiple buffers, app-managed (GEM)
Mode setting	`fbset` (fragile)	`drmModeSetCrtc()` (reliable)
VSync / page flip	Not supported	`drmModePageFlip()` at VBlank
Multiple displays	Separate `/dev/fbN`, no coordination	Single `/dev/dri/card0`, coordinated
Hardware planes	Not exposed	Primary, overlay, cursor — HW compositing
API stability	Deprecated since ~2015	Current kernel standard
Kernel code path	Many fbdev drivers are DRM wrappers now	Native
Device node	`/dev/fb0`	`/dev/dri/card0`

The takeaway: fbdev pretends the hardware is a flat buffer. DRM/KMS models what the hardware actually is.

DRM Objects on Real Hardware

On a Raspberry Pi 4 with HDMI and DSI connected:

  Framebuffer A ──► Primary Plane 0 ──┐
  Framebuffer B ──► Overlay Plane 0 ──┤
  Framebuffer C ──► Cursor Plane 0 ───┤
                                      ▼
                   CRTC 0 ──► HDMI Encoder ───────────────► HDMI-A-1 ──► Monitor
                                               (no bridge — direct TMDS)

  Framebuffer D ──► Primary Plane 1 ──┐
                                      ▼
                   CRTC 1 ──► DSI Encoder ──► TC358762 ──► DSI-1 ──► 7" LCD
                                              (bridge)    (connector)

The HDMI path has no bridge — the encoder outputs TMDS directly to the connector. The DSI path has a bridge chip (TC358762) that converts DSI to DPI for the LCD panel.

Inspect on your Pi:

# List all DRM objects
sudo modetest -M vc4    # shows connectors, encoders, CRTCs, planes

# Or with Python
python3 -c "import subprocess; subprocess.run(['modetest', '-M', 'vc4', '-c'])"

Full Software Stack — Without GPU

For CPU-rendered applications (fbdev-style drawing through DRM):

  ┌─────────────────────────────────────────────────────────┐
  │  Your Application (C / Python / SDL2)                   │
  │    draw_pixel(x, y, color)                              │
  ├─────────────────────────────────────────────────────────┤
  │  libdrm (user-space library)                            │
  │    drmModeSetCrtc(), drmModePageFlip()                  │
  │    drmIoctl() → /dev/dri/card0                          │
  ├─────────────────────────────────────────────────────────┤
  │  DRM/KMS Core (kernel)                                  │
  │    mode setting, buffer management, VBlank events       │
  ├─────────────────────────────────────────────────────────┤
  │  GEM (Graphics Execution Manager)                       │
  │    allocates "dumb buffers" in video/system memory      │
  ├─────────────────────────────────────────────────────────┤
  │  Display Controller Hardware (vc4 / v3d on Pi)          │
  │    reads buffer → CRTC → encoder → connector → panel    │
  └─────────────────────────────────────────────────────────┘

No GPU involved. The CPU writes pixels to a dumb buffer. The display controller hardware scans it out. This is what modetest and our DRM/KMS tutorials use.

Full Software Stack — With GPU (OpenGL / Vulkan)

For GPU-accelerated rendering (3D, animations, Qt QML):

  ┌─────────────────────────────────────────────────────────┐
  │  Your Application                                       │
  │    glDrawArrays(), SDL_RenderPresent()                  │
  ├─────────────────────────────────────────────────────────┤
  │  OpenGL ES / Vulkan API                                 │
  ├─────────────────────────────────────────────────────────┤
  │  Mesa (user-space GPU driver)                           │
  │    translates GL calls → GPU commands                   │
  │    manages shader compilation, state tracking           │
  ├─────────────────────────────────────────────────────────┤
  │  GBM (Generic Buffer Manager)                           │
  │    allocates GPU-accessible render targets              │
  ├─────────────────────────────────────────────────────────┤
  │  EGL (platform glue)                                    │
  │    connects GL context to DRM display surface           │
  │    EGL_PLATFORM_GBM → no compositor needed              │
  ├─────────────────────────────────────────────────────────┤
  │  DRM/KMS Core (kernel)                                  │
  │    page flip rendered buffer to display                 │
  ├─────────────────────────────────────────────────────────┤
  │  GPU Hardware (V3D on Pi)        Display Controller     │
  │    executes shaders,              scans out buffer      │
  │    rasterizes triangles           CRTC → encoder → out  │
  └─────────────────────────────────────────────────────────┘

Mesa is the open-source GPU driver stack. On the Pi 4, it uses the V3D driver for the VideoCore VI GPU. Mesa translates OpenGL/Vulkan calls into GPU hardware commands.

EGL is the glue between the rendering API (OpenGL) and the display system (DRM). On embedded Linux without a compositor, EGL binds directly to GBM/DRM — this is what Qt EGLFS and SDL2 KMSDRM use.

Where SDL2 and Qt Fit

SDL2 and Qt are application toolkits — they sit on top of these stacks and choose the right path:

                    SDL2                              Qt
                     │                                 │
          ┌──────────┼──────────┐           ┌──────────┼──────────┐
          │          │          │           │          │          │
      KMSDRM      fbcon     Wayland      EGLFS     Wayland       XCB
      backend     backend   backend      plugin    plugin       plugin
          │          │          │           │          │          │
       DRM/KMS    fbdev    Compositor   EGL+DRM   Compositor     X11
       (direct)  (legacy)  (desktop)    (direct)  (desktop)   (legacy)

For embedded (no compositor): - SDL2 → KMSDRM backend → DRM/KMS directly - Qt → EGLFS plugin → EGL + DRM directly

For desktop: - SDL2 → Wayland backend → compositor → DRM/KMS - Qt → Wayland plugin → compositor → DRM/KMS

The application code does not change. The backend/plugin selection decides the display path.

# Force SDL2 to use DRM directly (no compositor)
export SDL_VIDEODRIVER=kmsdrm
./my_sdl2_app

# Force Qt to use EGLFS (no compositor)
export QT_QPA_PLATFORM=eglfs
./my_qt_app

Level C — Full Graphics Stack (Wayland/X11)

A compositor sits between your application and the display:

  ┌───────────────────────────────────────────┐
  │  App 1     App 2     App 3     Cursor     │
  │    │         │         │         │        │
  │    └─────────┴─────────┴─────────┘        │
  │                  │                        │
  │           ┌──────▼──────┐                 │
  │           │  Compositor │ (Weston, Mutter)│
  │           │  - window placement           │
  │           │  - input routing              │
  │           │  - GPU compositing            │
  │           └──────┬──────┘                 │
  │                  │                        │
  │           ┌──────▼──────┐                 │
  │           │  DRM/KMS    │                 │
  │           └──────┬──────┘                 │
  │                  │                        │
  │           ┌──────▼──────┐                 │
  │           │   Display   │                 │
  │           └─────────────┘                 │
  └───────────────────────────────────────────┘

The compositor manages window placement, input routing, and GPU-accelerated compositing. This is desktop Linux.

Full Stack Trade-offs

Pros:

Rich UI — multiple windows, drag-and-drop, tooltips, cursor
Hardware acceleration — GPU compositing, OpenGL/Vulkan
UI toolkits — Qt, GTK, Flutter work out of the box
Standard input handling — keyboard, mouse, touch, gestures

Cons:

Adds 2-15 seconds to boot time (depending on stack)
Consumes 50-200+ MB of RAM
Extra buffering layer between app and display
More components to configure, update, and debug
More failure points — compositor crash = black screen

For a single fullscreen embedded app, the compositor manages windows that will never appear. You pay the full cost for zero benefit.

Comparison Table

Approach	Boot Impact	Memory	CPU Overhead	Complexity
Raw framebuffer (fbdev)	None	~1 MB	Minimal	Low
DRM/KMS (dumb buffer)	None	~2-4 MB	Low	Medium
Wayland + Weston	+2-5 s	~50-100 MB	Medium	High
X11 + Desktop	+5-15 s	~200+ MB	High	Very High

The difference between fbdev/DRM and a full compositor is not incremental — it is an order-of-magnitude jump in resource consumption and complexity.

On a 256 MB device with a 10-second boot budget, a compositor consumes half your RAM and half your boot time before your application even starts.

Wayland vs X11

Embedded vs Desktop Mindset

On desktop Linux, the graphics stack is chosen for you. Your distribution ships GNOME or KDE with a Wayland compositor. You never think about it.

On embedded Linux, you choose explicitly. Every component is a decision.

Resource	Desktop (8 GB RAM, SSD)	Embedded (256 MB RAM, eMMC)
100 MB for compositor	1.25% of RAM	39% of RAM
5 s for compositor boot	Unnoticeable	50% of boot budget
20 packages to maintain	Lost in 2000+ packages	20% of total image

What is invisible on desktop dominates on embedded. This is why embedded engineers must understand the graphics stack — not just use it.

Four Decision Factors

When choosing your graphics level, evaluate these four factors:

1. Boot time — heavier stacks take longer to initialize. A compositor adds seconds. fbdev/DRM add nothing.

2. Reliability — more components = more failure points. A compositor crash means black screen. Direct DRM means one less thing to break.

3. Maintenance cost — the compositor needs configuration, updates, and debugging. Direct rendering has fewer moving parts to maintain over a 10-year product lifecycle.

4. UI complexity — only use a heavy stack if you actually need its features. Multiple overlapping windows? You need a compositor. Single fullscreen app? You do not.

Start from the lightest option that meets requirements. Move up only when you hit a concrete limitation.

"No GUI" Still Needs Graphics

A common misconception: removing the desktop environment means giving up display output.

Wrong. Most embedded Linux products with displays run without a desktop but still draw to screen.

Key distinction:

Concept	What It Means
Desktop GUI	Window manager, taskbar, file manager, system tray
Display output	Application renders directly to hardware

Removing the desktop removes window management — not the ability to put pixels on a screen. The vast majority of embedded displays (kiosks, HMIs, dashboards, digital signage) have no desktop environment at all.

Common Headless Display Patterns

Real products that render to display without a desktop:

PIL/Pillow -> fbi -> framebuffer — industrial panels, point-of-sale terminals. Generate an image in Python, push it to /dev/fb0.
OpenCV -> framebuffer — machine vision HMIs. Process camera frames, render results directly to display.
DRM dumb buffer — kiosks, digital signage, transportation displays. Allocate a buffer, draw pixels, page flip.
Custom fb driver — LED matrices, e-ink displays, segment LCDs. Write a minimal kernel driver that exposes /dev/fb0 for non-standard display hardware.
SDL2 + DRM backend — games, simulators, status dashboards. SDL2 can render directly via DRM/KMS without any compositor.

All of these produce display output. None of them need a window manager.

Decision Flowchart

  Need display output?
         |
    +----+----+
    No        Yes
    |         |
  [Done]   Multiple windows needed?
              |
         +----+----+
         Yes       No
         |         |
    [Wayland/   Need HW acceleration (GPU)?
     X11]         |
             +----+----+
             Yes       No
             |         |
         [DRM/KMS   Quick prototype / simple display?
          + GPU]       |
                  +----+----+
                  Yes       No
                  |         |
             [fbdev]   [DRM/KMS
                        dumb buffer]

Follow this flowchart from top to bottom. Most embedded products land on DRM/KMS (dumb buffer) or fbdev. Only products with genuine multi-window needs should reach for a compositor.

From Software to Wire: The Physical Display Pipeline

The graphics stack (fbdev/DRM/KMS/compositor) is the software side. Below it, the SoC's display controller pushes pixels over a physical interface to the panel:

  Application ──► DRM/KMS ──► Display Controller ──► Physical Interface ──► Panel
                                    (SoC HW)
                              ┌─────────────────────────────────────────────────┐
                              │              Which interface?                   │
                              │                                                 │
                              │  HDMI:  TMDS encoding → 3 data + 1 clk pair     │
                              │  DSI:   D-PHY packets → 2 data + 1 clk lane     │
                              │  SPI:   CPU-driven → 1 data line (no GPU!)      │
                              └─────────────────────────────────────────────────┘

HDMI and DSI are GPU-driven — the display controller reads from the DRM buffer and clocks pixels out automatically. SPI is CPU-driven — your code (or DMA) must push every pixel through the SPI bus.

Physical Interface Bandwidth on the Pi

Interface	Bandwidth	Max Resolution	GPU Driven?	Cable
HDMI 2.0	18 Gbit/s	4K @ 60 FPS	Yes	Micro-HDMI
MIPI DSI (2-lane)	~2 Gbit/s	800×480 @ 60 FPS	Yes	15-pin FPC ribbon
SPI	~32 Mbit/s	320×240 @ 25 FPS	No (CPU)	GPIO wires

Why does DSI use so much less bandwidth than HDMI? Smaller resolution. The 7" DSI panel (800×480) needs ~553 Mbit/s. A 4K HDMI monitor (3840×2160) needs ~12 Gbit/s. The interface matches the panel.

Quick bandwidth formula:

  BW = Width × Height × BitsPerPixel × FPS × overhead
  800 × 480 × 24 × 60 × 1.2 = 663 Mbit/s  (DSI, 7" panel)
  1920 × 1080 × 24 × 60 × 1.25 = 4.5 Gbit/s  (HDMI, 1080p monitor)

Theory: Camera and Display Interfaces — D-PHY signaling, CSI-2, DSI packets, EDID, bandwidth math

UI Toolkit: Qt vs SDL2

You've chosen DRM/KMS — now pick your application-level toolkit.

The kernel display path decides how pixels reach the screen. The toolkit decides how your application produces those pixels.

	Qt + EGLFS	SDL2 + KMS/DRM
What it is	Full UI framework, renders via EGL directly on KMS	Minimal render loop, you draw everything
Runtime footprint	~30-80 MB	~2-5 MB
GPU required?	Yes (EGL/OpenGL)	Optional
Best for	Dashboards, menus, touch HMIs	Gauges, data viz, custom rendering
UI components	Widgets, QML, animations built-in	None — bring your own
Cross-SoC portability	Excellent	Good (but UI is custom)

Qt + EGLFS = invest upfront in framework, get layout/touch/animations for free. SDL2 + KMS/DRM = minimal footprint, maximum control, build UI yourself.

The Hybrid Sweet Spot

Many production HMIs combine both approaches:

  ┌─────────────────────────────────┐
  │  Qt Quick (QML)                 │
  │  ┌───────┐ ┌───────┐ ┌───────┐  │
  │  │ Menu  │ │Status │ │ Nav   │  │  ← QML handles UI chrome
  │  └───────┘ └───────┘ └───────┘  │
  │  ┌───────────────────────────┐  │
  │  │  Custom OpenGL/Vulkan     │  │  ← GPU renders gauges,
  │  │  render area              │  │     waveforms, 3D views
  │  └───────────────────────────┘  │
  └─────────────────────────────────┘

QML for menus, status bars, touch navigation — saves development time
OpenGL/Vulkan scene node for real-time gauges and data visualization — full GPU control
Both run in a single process on EGLFS — no compositor needed

For the labs: start with SDL2 (smallest footprint, teaches the hardware path). Move to Qt + EGLFS for the dashboard project.

Pitfall 1 — fbdev Compatibility Shim on DRM Systems

Modern kernels may expose /dev/fb0 as a compatibility layer over a DRM driver. This looks like fbdev but does not behave identically.

  Your app thinks:       Reality:
  ┌────────────┐        ┌────────────┐
  │ /dev/fb0   │        │ /dev/fb0   │  (compat shim)
  │  (fbdev)   │        │     |      │
  └─────┬──────┘        │  DRM/KMS   │  (actual driver)
        |               │     |      │
     Hardware           │  Hardware  │
                        └────────────┘

Page flipping, mode setting, and buffer management behave differently through the shim. Double buffering may not work. Mode changes may be ignored.

Rule: If the kernel uses a DRM driver, use the DRM API directly. Do not rely on the fbdev compatibility layer for anything beyond quick tests.

Pitfall 2 — Pixel Format and Stride Mismatch

Display hardware expects pixels in a specific format. Your renderer may produce a different one.

Format	Bytes/pixel	Layout
RGB565	2	5 red, 6 green, 5 blue
RGB888	3	8 red, 8 green, 8 blue
ARGB8888	4	8 alpha, 8 red, 8 green, 8 blue
BGR888	3	8 blue, 8 green, 8 red (swapped)

Stride (bytes per row) may include padding for alignment. A 800-pixel-wide RGB888 display might have stride = 2400 or stride = 2432 (padded to 64-byte boundary).

If you assume the wrong format or stride, the image appears garbled, color-shifted, or diagonally skewed. Always query the actual format and stride from the driver — never hardcode them.

Pitfall 3 — Adding a Compositor When Not Needed

Every additional layer is a potential failure point:

  Without compositor:          With compositor:
  ┌──────────┐                ┌──────────┐
  │   App    │                │   App    │
  └────┬─────┘                └────┬─────┘
       |                           |
  ┌────▼─────┐                ┌────▼─────┐
  │ DRM/KMS  │                │ Wayland  │  ← can crash
  └────┬─────┘                └────┬─────┘
       |                      ┌────▼─────┐
  ┌────▼─────┐                │ Weston   │  ← can crash
  │ Display  │                └────┬─────┘
  └──────────┘                ┌────▼─────┐
                              │ DRM/KMS  │
                              └────┬─────┘
                              ┌────▼─────┐
                              │ Display  │
                              └──────────┘

Single fullscreen app + compositor = longer boot time + more complexity + more failure modes, with zero functional benefit. The compositor manages windows that will never appear.

Start minimal. Add complexity only when you hit a concrete limitation.

Pitfall 4 — No Fallback if Display Init Fails

If the display is unplugged, the cable is damaged, or the driver probe fails at boot, what happens to your application?

Bad design: Application blocks on display init, never starts, product appears dead.

Good design: Display is treated as an optional output, not a hard dependency.

  App starts
      |
      +---> Try to open display
      |         |
      |    +----+----+
      |    OK        FAIL
      |    |         |
      |    Render    Log warning, continue without display
      |    to        (network, logging, control still work)
      |    display
      |
      +---> Core logic runs regardless

Design the display as one of several outputs. The product should still function (logging, network, control) even if the screen is missing.

Understanding the Full Desktop Stack

Before we dismiss the compositor approach for embedded, let's understand what it actually does — this helps you recognize when you truly need it and when you don't.

Key Terms — What Is What?

These terms are often confused. Here's exactly what each one is:

Name	What it IS	What it DOES	What it does NOT do
Wayland	A protocol (not software)	Defines how apps talk to the compositor — buffer sharing, input events, window surfaces	Does not draw anything. Does not manage windows. It's a specification, like HTTP.
Weston	A compositor (reference implementation)	Implements the Wayland protocol. Combines app buffers, routes input, outputs to DRM. Minimal, used in embedded.	Not a toolkit. Not a desktop environment. No taskbar, no app launcher.
Mutter	A compositor (GNOME's)	Same job as Weston but part of GNOME. Adds desktop features: workspaces, overview, animations.	Not standalone — needs GNOME Shell. Too heavy for embedded.
Sway	A compositor (tiling)	Wayland compositor inspired by i3. Tiling window layout, keyboard-driven.	No embedded profile. Desktop-focused.
KWin	A compositor (KDE's)	KDE Plasma's compositor. Rich effects, desktop integration.	Very heavy. Not for embedded.
Xorg	A display server (X11)	The X11 server. Receives draw commands from apps, renders to screen, routes input.	Does not composite by itself — needs a separate compositor (picom, compton) for transparency.

Wayland Is a Protocol, Not a Program

This is the most common misconception. You don't "install Wayland" — you install a compositor that speaks the Wayland protocol.

  "I use Wayland" actually means:

  ┌─────────────────────────────────────────────────────┐
  │                                                     │
  │   App (GTK/Qt)  ──── Wayland protocol ────  Weston  │
  │                                                     │
  │   The protocol         ← this is "Wayland"          │
  │   The compositor       ← this is "Weston"           │
  │                                                     │
  └─────────────────────────────────────────────────────┘

  Like saying "I use HTTP" — you mean you use a browser (Chrome)
  that speaks HTTP to a server (Nginx). HTTP is the protocol.
  Wayland is the protocol. Weston/Mutter/Sway is the "browser."

Why this matters for embedded: If someone says "use Wayland on the Pi," the real question is: which compositor? Weston is the lightweight choice. Mutter (GNOME) would be far too heavy.

Weston — The Embedded Compositor

Weston is the reference implementation of the Wayland protocol, maintained by the same team. It's designed to be minimal:

What Weston does:

Accepts connections from Wayland client apps
Receives their rendered buffers (shared GPU memory)
Composites all buffers into the final screen image
Routes input events (touch, keyboard, mouse) to the focused app
Outputs the composited image via DRM/KMS
Handles display hotplug (HDMI connected/disconnected)

What Weston does NOT do:

No taskbar, no app launcher, no system tray
No window decorations (no title bars, close buttons)
No file manager, no settings panel
No login screen (use a separate program for that)

Weston provides the plumbing — apps appear on screen and receive input. Everything else (UI, layout, interaction) is the application's responsibility.

Embedded use: Weston's "kiosk shell" plugin runs a single app fullscreen with no chrome — essentially a Wayland-speaking DRM wrapper. Some products use this instead of direct DRM access when they want Wayland protocol compatibility (e.g., for Flutter or Chromium).

Xorg — The Legacy Display Server

Xorg is the implementation of the X11 protocol. It's a single large process (~500K lines) that:

What Xorg does:

Owns the display — apps cannot draw directly, they ask Xorg to draw for them
Manages a shared 2D canvas (the "root window")
Routes keyboard/mouse events to the focused window
Provides network transparency — apps can run on one machine, display on another (ssh -X)
Loads input drivers (keyboard, mouse, touchpad) and display drivers

What Xorg does NOT do:

Does not decide where windows go — that's the window manager (a separate process)
Does not composite (blend) overlapping windows by default — needs a compositor (picom, compton)
Does not provide a desktop environment — that's GNOME/KDE/XFCE running on top

Why X11 is declining: The "apps can't draw directly" design adds latency. The "any app can read any window" design is a security hole. The shared-canvas model doesn't work well with GPUs. Wayland fixes all three by letting apps render to their own buffers.

Compositor vs Window Manager vs Desktop Environment

These three concepts are often conflated. They are different layers:

  ┌───────────────────────────────────────────────┐
  │  Desktop Environment  (GNOME, KDE, XFCE)      │  ← the "experience"
  │  Taskbar, app launcher, file manager,         │     (apps + config + theme)
  │  settings, notifications, lock screen         │
  │                                               │
  │  ┌─────────────────────────────────────────┐  │
  │  │  Window Manager  (Mutter, KWin, i3)     │  │  ← window placement
  │  │  Position, size, stacking, focus,       │  │     rules and policy
  │  │  keyboard shortcuts, tiling/floating    │  │
  │  │                                         │  │
  │  │  ┌───────────────────────────────────┐  │  │
  │  │  │  Compositor  (built-in or picom)  │  │  │  ← pixel blending
  │  │  │  Transparency, shadows, blur,     │  │  │     the "how" of
  │  │  │  animations, buffer management    │  │  │     putting it on screen
  │  │  └───────────────────────────────────┘  │  │
  │  └─────────────────────────────────────────┘  │
  └───────────────────────────────────────────────┘

On Wayland: the compositor and window manager are the same process (Mutter, Sway, Weston). You can't mix and match.

On X11: they're separate. You can run i3 (tiling WM) + picom (compositor) on Xorg. Or Openbox (floating WM) with no compositor at all.

On embedded: you skip all three. Your app talks to DRM directly.

Putting It All Together — Who Uses What?

Product / Use case	Graphics approach	Why
Your laptop (Ubuntu)	Mutter (Wayland compositor) + GNOME	Multiple apps, desktop experience
Raspberry Pi Desktop	Wayfire (Wayland compositor) + RPi Desktop	Full desktop for education
Automotive HMI	Weston kiosk + Qt EGLFS	Single app, Wayland protocol for IVI
Industrial panel	Qt EGLFS on DRM	Single app, no compositor overhead
Our course labs	SDL2 on DRM/KMS	Minimal, teaches hardware path
Our Qt launcher	Qt EGLFS on DRM	Rich UI without compositor
Digital signage	DRM dumb buffer	Static content, minimal CPU
ATM / POS terminal	Weston kiosk or DRM direct	Security + single app

Notice: even commercial products that "use Wayland" often use Weston's kiosk shell — which is essentially a thin layer over DRM that adds Wayland protocol compatibility for the app framework.

X11 Architecture (Legacy Desktop)

X11 (1987) uses a client-server model where the display server owns the screen:

  ┌────────┐  ┌────────┐  ┌────────┐
  │ App 1  │  │ App 2  │  │ App 3  │   ← X11 clients
  └───┬────┘  └───┬────┘  └───┬────┘
      │           │           │          X11 protocol
      └───────────┴───────────┘          (network-transparent)
                  │
           ┌──────▼──────┐
           │  X Server   │              ← owns the screen
           │  (Xorg)     │
           │  ┌────────┐ │
           │  │ Window │ │              ← decides where windows go
           │  │ Manager│ │
           │  └────────┘ │
           │  ┌────────┐ │
           │  │ Compos-│ │              ← blends overlapping windows
           │  │ itor   │ │
           │  └────────┘ │
           └──────┬──────┘
                  │
           ┌──────▼──────┐
           │  DRM/KMS    │              ← hardware
           └─────────────┘

Key idea: Apps don't touch the display — they send draw commands to the X Server, which renders on their behalf. The Window Manager is a separate process that tells the X Server where to position each window.

Wayland Architecture (Modern Desktop)

Wayland (2012) merges the server, window manager, and compositor into one process:

  ┌────────┐  ┌────────┐  ┌────────┐
  │ App 1  │  │ App 2  │  │ App 3  │   ← Wayland clients
  └───┬────┘  └───┬────┘  └───┬────┘
      │           │           │
      │    Each app renders   │         Apps render to their OWN
      │    to a buffer (GPU)  │         buffer — not shared
      │           │           │
      └───────────┴───────────┘          Wayland protocol
                  │
           ┌──────▼──────┐
           │  Compositor │              ← ONE process does everything:
           │  (Weston,   │                 window placement
           │   Mutter,   │                 input routing
           │   Sway)     │                 GPU compositing
           └──────┬──────┘                  output to display
                  │
           ┌──────▼──────┐
           │  DRM/KMS    │
           └─────────────┘

Key difference from X11: Apps render to their own buffers (not through the server). The compositor only combines the finished buffers into the final screen image. This is simpler, more secure (apps can't snoop on each other's pixels), and lower latency.

What Each Layer Does

Layer	Role	Desktop example	Can you skip it?
Window Manager	Decides window position, size, decorations (title bar, close button), stacking order	GNOME Shell, KWin, i3, Sway	Yes — if you have one fullscreen app
Compositor	Blends multiple app buffers into one image (transparency, shadows, animations), outputs to display	Mutter, Weston, Picom	Yes — if you have one fullscreen app
Display Server	Routes input events (keyboard, mouse) to the correct app, manages shared display access	Xorg (X11) or built into compositor (Wayland)	Yes — handle input yourself (evdev/SDL2)
Toolkit	Draws widgets (buttons, text, lists), handles layout	GTK, Qt, Flutter	Optional — you can draw pixels directly

On embedded: you typically skip the first three layers entirely. Your single app opens DRM directly, renders with SDL2 or Qt EGLFS, and reads input from /dev/input/ or SDL2's event system.

X11 vs Wayland — Quick Comparison

	X11	Wayland
Age	1987 (40+ years)	2012 (~13 years)
Architecture	Client → Server renders	Client renders → Compositor combines
Network transparency	Built-in (forward over SSH)	Not built-in (use pipewire/RDP)
Security	Any app can read any window (screen capture is trivial)	Apps are isolated by default
Tearing	Common (requires compositor hacks)	Solved by design
Code complexity	~500K lines (Xorg)	~50K lines (Weston)
Embedded use	Mostly legacy	Weston has an embedded profile

For this course: we skip both X11 and Wayland. Our apps use DRM/KMS directly (SDL2, Qt EGLFS). But knowing the layers helps when you debug a desktop system or explain to a manager why the kiosk doesn't need a desktop environment.

Qt EGLFS — The Embedded Shortcut

Qt's EGLFS platform plugin lets you run a Qt application fullscreen on DRM/KMS without any compositor:

  Desktop Qt:                    Embedded Qt (EGLFS):
  ┌─────────────┐               ┌─────────────┐
  │  Qt App     │               │  Qt App     │
  └──────┬──────┘               └──────┬──────┘
         │                             │
  ┌──────▼──────┐               ┌──────▼──────┐
  │  Wayland /  │               │  EGL + DRM  │  ← direct GPU
  │  X11        │               │  (no compositor)
  └──────┬──────┘               └──────┬──────┘
  ┌──────▼──────┐               ┌──────▼──────┐
  │ Compositor  │               │  Display    │
  └──────┬──────┘               └─────────────┘
  ┌──────▼──────┐
  │  DRM/KMS    │
  └──────┬──────┘
  ┌──────▼──────┐
  │  Display    │
  └─────────────┘

  3 extra layers                 0 extra layers

EGLFS = "EGL Full Screen." It gives you all of Qt's UI power (QML, touch, animations) with the performance of direct DRM access. This is what the Qt App Launcher uses.

FAQ — Common Student Questions

Q: Why can't I run two SDL2 apps at the same time? Because both try to open DRM and become "DRM master" — only one process can control the display at a time. This is by design. The Qt launcher solves this by releasing DRM master before spawning a child app, and reclaiming it after. A compositor would also solve it, but at the cost we discussed.

Q: Why does my app work in SSH but show a black screen on the Pi? Graphics apps need access to /dev/dri/card0 (DRM) and possibly /dev/fb0. Over SSH, you're on a different TTY. Make sure the app runs on the correct VT, or use SDL_VIDEODRIVER=kmsdrm to force DRM mode. Also check permissions: the user needs to be in the video group.

Q: Can I use OpenGL without a compositor? Yes. EGL can bind directly to a DRM device (EGL_PLATFORM_GBM). This is what SDL2's KMSDRM backend and Qt's EGLFS do. You get full GPU acceleration without any windowing system.

Q: Why is the Pi's display upside down / rotated? The display panel's physical scanning direction may not match the expected orientation. Fix with display_rotate=2 in config.txt (fbdev) or video=DSI-1:panel_orientation=upside_down (DRM). KMS also supports rotation via the rotation plane property.

Q: Why does my framebuffer app work but the colors are wrong? Pixel format mismatch. The display might expect BGR888 but you're writing RGB888 (red and blue swapped). Always query the format with ioctl(FBIOGET_VSCREENINFO) or check the DRM plane's format list. Common formats: ARGB8888, XRGB8888, RGB565.

Q: What's the difference between a window manager and a compositor? A window manager decides where windows go (position, size, stacking). A compositor blends all windows into the final image (handles transparency, shadows, animations). On Wayland, these are the same process. On X11, they can be separate (e.g., i3 window manager + picom compositor).

Q: Do I need a GPU for embedded graphics? Not necessarily. DRM "dumb buffers" are CPU-rendered. SDL2 can software-render via DRM. For simple UIs (status displays, dashboards), CPU rendering is fast enough. You need a GPU when: rendering complex 3D (OpenGL/Vulkan), running Qt QML with animations, or compositing multiple layers at high frame rates.

Block 1 Summary

Framebuffer (fbdev): simple, direct, legacy. Best for quick prototypes and simple displays. Deprecated — new drivers target DRM.

DRM/KMS: modern, hardware-aware, tear-free. The right default for embedded products. Works without GPU using dumb buffers.

Full stack (Wayland/X11): powerful, heavy, complex. Only justified when you need multiple windows or rich desktop-style UI. Wayland is simpler and more secure than X11. Both are overkill for single-app embedded.

Qt EGLFS: the embedded sweet spot when you need rich UI — full Qt power with zero compositor overhead.

Most embedded products: single-app fullscreen pipeline using DRM/KMS or framebuffer. No compositor, no window manager, no desktop.

Decision principle: start from the simplest option that meets your requirements. Move up only when you hit a concrete limitation that the simpler approach cannot solve.

Block 2

Experiment: "Write Pixels Faster Than Refresh"

The Question

What happens when your application writes pixels to the framebuffer faster than the display can show them?

The framebuffer is shared memory: your application writes to it, and the display controller reads from it — simultaneously, without coordination.

If the application writes faster than the display scans, the display will read partially updated data.

Let's understand why, and then see it happen.

Display Scan-Out Explained

The display controller reads the framebuffer line by line, top to bottom, at a fixed rate:

  Framebuffer memory            Display panel
  ┌──────────────────┐          ┌──────────────────┐
  │ Row 0            │ -------> │ Row 0            │  <- scan position
  │ Row 1            │          │ Row 1            │
  │ Row 2            │          │                  │
  │ Row 3            │          │                  │
  │ ...              │          │                  │
  │ Row 479          │          │                  │
  └──────────────────┘          └──────────────────┘

  At 60 Hz: full scan every 16.7 ms
  The controller reads ~29 rows per millisecond (for 480 rows)
  After row 479, it returns to row 0 (VBlank interval)

The scan-out is a continuous, periodic process driven by the display hardware clock. It does not wait for your application. It does not check if the buffer is "ready."

The Tearing Mechanism

When the app overwrites the buffer while the display is reading it, you see parts of two different frames:

  Frame N              Frame N+1
  ┌──────────────┐    ┌──────────────┐
  │ AAAAAAAAAAAA │    │ BBBBBBBBBBBB │
  │ AAAAAAAAAAAA │    │ BBBBBBBBBBBB │
  │ AAAAAAAAAAAA │    │ BBBBBBBBBBBB │
  │ AAAAAAAAAAAA │    │ BBBBBBBBBBBB │
  └──────────────┘    └──────────────┘

  What you SEE (scan-out catches the switch mid-frame):
  ┌──────────────┐
  │ AAAAAAAAAAAA │  <- scanned before app started writing
  │ AAAAAAAAAAAA │
  │ BBBBBBBBBBBB │  <- scanned after app wrote these rows (TEAR LINE)
  │ BBBBBBBBBBBB │
  └──────────────┘

The horizontal boundary where two frames meet is the tear line. Its position moves because the write speed and scan-out speed are not synchronized.

VSync and Page Flipping

Solution: do not write to the buffer the display is currently reading.

Double buffering with page flipping:

  Back buffer (app draws here)     Front buffer (display reads here)
  ┌──────────────────┐             ┌───────────────────┐
  │  Frame N+1       │             │  Frame N          │
  │  (being drawn)   │             │  (being displayed)│
  └──────────────────┘             └───────────────────┘
            |                                |
            |     At VBlank: SWAP            |
            +----------->--------------------+
            pointers swap, display now reads Frame N+1

  VBlank = the brief interval between the last row and the first row
           of the next scan. Safe moment to switch buffers.

The app writes to the back buffer. When drawing is complete, it requests a page flip at VBlank. The display switches to the new buffer only between frames. Every displayed frame is complete — no tearing.

DRM/KMS supports this natively. fbdev does not.

Experiment Setup

Write solid colors to the framebuffer as fast as possible and observe the display:

# framebuffer_flood.py - write solid colors as fast as possible
import mmap, struct, time, os

fb = os.open('/dev/fb0', os.O_RDWR)
mm = mmap.mmap(fb, 800 * 480 * 4)   # adjust to your resolution

colors = [0x00FF0000, 0x0000FF00, 0x000000FF]  # R, G, B

while True:
    for color in colors:
        row = struct.pack('I', color) * 800
        mm.seek(0)
        for y in range(480):
            mm.write(row)

This script writes red, then green, then blue — as fast as the CPU can go, with no synchronization to the display refresh.

Run it on a device with a connected display and look at the screen.

What to Observe

When you run framebuffer_flood.py, you will see:

Horizontal tear lines where two colors meet mid-screen
The tear position moves — sometimes near the top, sometimes near the bottom
On fast CPUs, you may see multiple tear lines (three colors visible at once)

  What you expect:       What you see:
  ┌──────────────┐      ┌──────────────┐
  │ RRRRRRRRRRRR │      │ RRRRRRRRRRRR │
  │ RRRRRRRRRRRR │      │ RRRRRRRRRRRR │
  │ RRRRRRRRRRRR │      │ GGGGGGGGGGGG │  <- tear
  │ RRRRRRRRRRRR │      │ GGGGGGGGGGGG │
  └──────────────┘      └──────────────┘

This is tearing — the display reads the buffer while the application is writing to it. The tear line appears wherever the scan-out position and the write position cross.

This is exactly why DRM/KMS with page flipping exists.

The Fix — DRM Page Flip

The proper solution uses double buffering with VSync through DRM/KMS:

Step 1: Allocate two dumb buffers (front and back)

Step 2: Draw to the back buffer (the display is not reading it)

Step 3: Request atomic page flip with DRM_MODE_PAGE_FLIP_EVENT

Step 4: Wait for VBlank event (kernel signals when flip completes)

Step 5: Swap front/back buffer pointers

  Time -->
  |  Draw to back   |  Flip  |  Draw to back   |  Flip  |
  |  buffer         |  at    |  buffer         |  at    |
  |  (invisible)    | VBlank |  (invisible)    | VBlank |
                    ^                          ^
                    |                          |
              Display switches           Display switches
              to new buffer              to new buffer

Result: every frame is fully drawn before the display reads it. No tearing. No timing hacks.

Fallback Pitfall — sleep() Is Not VSync

Some teams try to "fix" tearing by throttling write speed:

# BAD: timing-based "fix"
while True:
    draw_frame()
    time.sleep(0.016)  # ~60 fps

This is fragile and will fail because:

CPU speed varies (thermal throttling, load changes)
sleep() precision is ~1-10 ms on Linux (not exact)
System load affects scheduling — your process may not wake on time
Display refresh rate may not be exactly 60 Hz
The sleep duration and scan-out timing drift relative to each other

The proper fix is VSync synchronization through DRM, where the kernel signals the exact VBlank moment. Sleep hacks create the illusion of working on a quiet system and break under real-world conditions.

Quick Checks

Before shipping a product with display output, answer these questions:

1. Is your display path deterministic at boot? Do you know exactly when the display initializes and what appears first? Or does it depend on service startup order?

2. Do you control mode setting explicitly? Are resolution, refresh rate, and pixel format set by your code? Or do you hope the defaults are correct?

3. Can the app recover if the display disconnects and reconnects? Hot-plug events happen — cable gets bumped, connector oxidizes, display power-cycles. Does your app handle this, or does it crash?

4. Does startup still meet your boot budget? After adding the display stack, is boot time still within the product requirement? Measure it — do not assume.

Mini Exercise

Given:

Single fullscreen UI application
Boot time requirement: < 10 seconds
Remote update capability required
Product lifecycle: 5+ years

Task: Select your graphics stack and justify your choice in 5 lines. Consider:

Boot impact of your chosen approach
Memory usage on a 256 MB system
Maintenance cost over the product lifecycle
What happens when the display cable is unplugged

Write your answer before looking at the next slide. There is no single correct answer, but there are answers that ignore constraints.

Key Takeaways

Framebuffer is simple but legacy — good for prototyping and simple displays, but deprecated and lacking VSync support.
DRM/KMS is the modern low-level choice — hardware-aware, tear-free, and the current kernel standard. Use dumb buffers when you do not need GPU acceleration.
Full stacks are powerful but heavy — only justified when you genuinely need multiple windows or rich desktop-style UI.
Embedded systems use single-app fullscreen pipelines — no compositor, no window manager, no desktop. This is the norm, not the exception.
Tearing is solved by VSync + page flipping, not by sleep hacks — DRM provides proper synchronization; time.sleep() provides false confidence.

Hands-On Next

Put this theory into practice with the following tutorials:

Framebuffer Basics — draw pixels directly to /dev/fb0, understand pixel formats and stride, render shapes and text from Python.

OLED Framebuffer Driver — write a kernel framebuffer driver for the SSD1306 OLED over I2C. Implements fb_info, fb_ops, and deferred I/O.

Pong on Framebuffer — build a user-space game that opens /dev/fbN, queries resolution via ioctl, and draws with mmap(). Works on both OLED and BUSE displays.

DRM/KMS Test — use the modern graphics API: enumerate connectors, set display modes, allocate dumb buffers, perform tear-free page flips.

Display Applications — create interactive applications with OpenCV and evdev for touch/button input, rendering directly to the display without a compositor.