Framebuffer Driver Tutorial

Implement a SPI-Controlled LED Matrix Driver with GPIO-Based CS Control

Time estimate: ~120+ minutes Prerequisites: MCP9808 Kernel Driver, Graphics Stack

Don't have the BUSE LED matrix?

The OLED Framebuffer Driver tutorial teaches the same fbdev concepts (framebuffer registration, fb_info, fb_ops, deferred I/O) using an I2C OLED display — no special hardware required beyond the SSD1306 module from the OLED Display tutorial.

Learning Objectives

By the end of this tutorial you will be able to:

Explain the Linux framebuffer subsystem architecture
Understand SPI-controlled display driving with manual CS timing
Build and load a framebuffer kernel driver
Test a framebuffer driver with a user-space demo application
Analyze timing constraints in display drivers

Framebuffer Drivers and the Linux Graphics Stack

The Linux framebuffer subsystem (fbdev) provides a simple, hardware-independent interface for graphics output. Applications write pixel data to /dev/fbN, and the kernel driver translates those writes into hardware-specific operations -- in this case, SPI transfers to shift registers that drive an LED matrix.

A framebuffer driver implements the fb_ops structure: fb_read/fb_write for data access, fb_ioctl for control, and fb_mmap for zero-copy memory mapping. The driver allocates a kernel buffer (fb_info.screen_base) and continuously refreshes the hardware display from this buffer.

For displays without built-in frame memory (like the BUSE LED matrix), the driver must actively refresh the display at a fixed rate. This creates a real-time constraint: the driver uses hrtimer for microsecond-precision timing and workqueue for deferred SPI transfers that cannot run in interrupt context.

Manual chip-select (CS) control is required here because the LED drivers use CS hold time to control brightness -- a timing-sensitive operation that the SPI controller's automatic CS cannot handle. This is a common pattern in industrial display drivers where hardware timing constraints exceed the standard bus protocol.

The fbdev interface is the embedded-friendly alternative to the full DRM/KMS graphics stack used on desktop Linux. For simple displays (OLEDs, LED matrices, small TFTs), fbdev is sufficient and much simpler to implement.

For background theory, see Graphics Stack and Device Tree and Drivers. The BUSE LED matrix hardware is documented in the Hardware Reference.

Understanding the Hardware

[!tip]+ You can find a detailed description of the BUSE LED display here

Display Architecture

Resolution: 128×19 monochrome LEDs
Panel Grouping: 4 Panels horizontally
Column Grouping: 4 Groups per Panel
Data Loading: Serial via SPI into shift registers
Display Activation: Controlled via a Chip Select (CS) line
Timing-Sensitive: The duration CS is LOW determines brightness.

Problem:

Hardware CS cannot be used because the LED drivers require manual reassertion of CS after a fixed time to control brightness.
Therefore, we must control CS via a GPIO pin, not by relying on SPI hardware CS.

Defining System Requirements

Functional Requirements

Provide a Linux framebuffer interface (/dev/fbX).
Convert framebuffer content to hardware-specific SPI frame format.
Control CS line timing manually for consistent brightness.

Non-Functional Requirements

Timing Accuracy: CS hold time must be precise (microseconds range).
Low CPU Overhead: Avoid busy-wait loops.
Responsiveness: Avoid blocking critical system tasks.

Evaluating linux kernel features in the driver

Mechanism	Purpose	Suitability for Our Case
hrtimer	High-resolution timing	✅ Accurate microsecond-level timing
workqueue	Deferred execution in process context	✅ Allows SPI and GPIO operations safely
spinlock	Atomic protection of framebuffer	✅ Prevents race conditions during frame capture
fbdev (framebuffer API)	Provides `/dev/fbX` interface	✅ Standard Linux display interface

Choosing the Design Strategy

Framebuffer Registration
Provide /dev/fbX access for user applications.
Frame Conversion Engine
Convert Linux framebuffer data into panel-specific SPI data format.
Controlled Group-by-Group Transmission
Divide data into groups.
Send one group at a time via SPI.
Manual CS Control with Accurate Timing
Pull CS LOW after SPI transfer.
Use hrtimer to hold CS LOW for a fixed time (e.g., 50 µs).
Reassert CS and queue next group or new frame.
Continuous Refresh Loop
Repeat the process to maintain display state.

Key Trade-offs and Limitations

GPIO-Controlled CS Pros and Cons

✅ Advantages	❌ Limitations
Full control over CS timing	Higher CPU involvement
Supports brightness adjustment	GPIO operations may have unpredictable latency on non-RT kernels
Hardware-independent CS control	Not synchronized with SPI controller automatically

SPI Bus Speed Considerations

Maximum Speed: Limited by hardware and signal integrity (you defined 3 MHz).
Trade-off: Higher speed reduces refresh latency but increases signal integrity risks.

CPU and Task Scheduling Constraints

Workqueue allows safe SPI access, but may be delayed if CPU is heavily loaded.
CPU Pinning (optional) can help reduce variability by isolating tasks to specific cores.

Step-by-Step Development Plan

Step 1: Register Framebuffer

Provide /dev/fbX for user-space access.
Allocate framebuffer memory with kzalloc.

Step 2: Framebuffer Processing Logic

Copy framebuffer memory under spinlock to prevent data races.
Convert bitmap to hardware-specific SPI frame format.

Step 3: SPI Frame Transmission

Implement process_next_group() to send one group at a time via SPI.

Step 4: Manual CS Control with hrtimer

After SPI transfer, pull CS LOW.
Start hrtimer to hold CS LOW for DISPLAY_BRIGHTNESS_USEC.
After timer expires, reassert CS and queue next group or frame.

Step 5: Continuous Loop

Schedule next group or frame using workqueue and hrtimer callback.

Verification Checklist

Device Tree Overlay correctly sets up SPI and GPIO.
/dev/fbX becomes available.
SPI traffic can be observed (e.g., with a logic analyzer).
Brightness control works as expected.
Frame content reflects /dev/fbX writes.

Optional Improvements to Consider

Dynamic Brightness Control: Expose brightness as a sysfs attribute.
Power Management: Pause updates when framebuffer is idle.
Use skeletonfb.c: Align with kernel standards for maintainability.
Multi-Threaded Workqueue: Improve responsiveness under CPU load.

Example Use-Case Recap

Why GPIO + hrtimer + Workqueue?

Requirement	Solution
Manual CS timing	GPIO + hrtimer
Safe SPI operations	Workqueue in process context
Accurate timing	hrtimer with REL_PINNED mode
Standard Linux interface	fbdev `/dev/fbX`

Next Steps

Implement the driver as described.
Test with static and animated content.
Observe timing behavior under CPU load.
Evaluate whether to refactor using skeletonfb.c.

Framebuffer Architecture

The framebuffer subsystem (fbdev) provides a hardware-independent API for graphics output, primarily used in embedded systems or scenarios where full GPU acceleration isn't required. It abstracts the complexities of video hardware, enabling applications to interact with a simple memory buffer representing the display.

Each framebuffer device is represented by a struct fb_info, which includes:

fb_ops: Function pointers defining operations like fb_read, fb_write, fb_ioctl, and fb_mmap.
fix and var screen info: Structures (fb_fix_screeninfo and fb_var_screeninfo) that describe fixed and variable display parameters, such as resolution and color depth.
Framebuffer memory pointer: A reference to the memory region used for the framebuffer.Kernel.org

These structures are defined in the kernel header include/linux/fb.h

Hands-on

Download the prepared BUSE driver from GitHub

git clone https://github.com/gsebik/buse_fb_driver.git

# open directory
cd buse_fb_driver

Note

If the repository is unavailable, the driver source is also provided in src/embedded-linux/drivers/ within the course repository.

Compile DT overlay

# compile overlay
dtc -@ -I dts -O dtb -o busefb.dtbo busefb-overlay.dts
# copy to overlays
sudo cp busefb.dtbo /boot/overlays/

# add to /boot/firmware/config.txt
dtoverlay=busefb

# reboot
sudo reboot

Checkpoint — Overlay Installed

After reboot, verify the overlay is loaded by checking the live device tree:

ls /proc/device-tree/soc/spi@7e204000/busefb@0

You should see the node's properties (compatible, reg, spi-max-frequency, etc.). If the directory does not exist, the overlay was not applied — check that dtoverlay=busefb is in /boot/firmware/config.txt and that busefb.dtbo is in /boot/overlays/.

Also check dmesg for SPI driver messages:

dmesg | grep busefb

Build kernel module

# open directory
cd buse_fb_driver

# build
make
# install module
sudo insmod busefb.ko

Check `dev`

ls /dev/fb*

There should be somehting like:

linux@eslinux:~ $ ls /dev/fb*
/dev/fb0

Check the kernel log

dmesg | tail

[    8.230723] NET: Registered PF_ALG protocol family
[    8.380470] bcmgenet fd580000.ethernet: configuring instance for external RGMII (RX delay)
[    8.381229] bcmgenet fd580000.ethernet eth0: Link is Down
[    8.401801] brcmfmac: brcmf_cfg80211_set_power_mgmt: power save enabled
[   11.453561] bcmgenet fd580000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
[   57.352977] systemd[762]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[   74.067208] busefb: loading out-of-tree module taints kernel.
[   74.067931] SPI driver busefb has no spi_device_id for buse,buse128x19
[   74.068751] Console: switching to mono frame buffer device 4x2
[   74.068865] busefb spi0.0: busefb registered as /dev/fb

Checkpoint — Module Loaded

ls /dev/fb* should show /dev/fb0. The dmesg output should confirm busefb registered as /dev/fb.

Remove the driver for testing

echo 0 | sudo tee /sys/class/vtconsole/vtcon1/bind; sudo rmmod busefb

[!caution]+ In case the driver not working properly check the parameters in the driver and the device tree!

Testing the driver with a demo application

For testing the driver can use the following demo application. The demo application shows a bouncing ball on the display if it working properly.

import numpy as np
import time
import random

WIDTH = 128
HEIGHT = 19

def clear_frame():
    return np.zeros((HEIGHT, WIDTH), dtype=np.uint8)

def draw_pixel(frame, x, y):
    if 0 <= x < WIDTH and 0 <= y < HEIGHT:
        frame[y, x] = 1

def draw_ball(frame, cx, cy, radius=1):
    for dy in range(-radius, radius + 1):
        for dx in range(-radius, radius + 1):
            if dx * dx + dy * dy <= radius * radius:
                draw_pixel(frame, cx + dx, cy + dy)

def render_frame(frame):
    packed = np.packbits(frame, axis=1, bitorder='little')
    with open("/dev/fb0", "wb") as fb:
        fb.write(packed.tobytes())

# Ball state
x = random.randint(2, WIDTH - 3)
y = random.randint(2, HEIGHT - 3)
vx = random.choice([-1, 1])
vy = random.choice([-1, 1])
radius =3

try:
    while True:
        frame = clear_frame()
        draw_ball(frame, x, y, radius)
        render_frame(frame)

        # Update ball position
        x += vx
        y += vy

        # Bounce off edges
        if x - radius <= 0 or x + radius >= WIDTH - 1:
            vx = -vx
        if y - radius <= 0 or y + radius >= HEIGHT - 1:
            vy = -vy

        time.sleep(0.05)
except KeyboardInterrupt:
    pass

Save and execute

python3 ball.py

[!tip] If you got and error like: OSError: [Errno 27] File too large ->Set the proper display size!

Checkpoint — Demo Working

You should see a bouncing ball animation on the LED matrix display. If the display shows garbage or nothing, check the driver parameters and device tree configuration.

Install the driver

Caution

Install the driver only after it has been tested and confirmed to be working properly!

# build
make
# copy
sudo cp busefb.ko /lib/modules/$(uname -r)/kernel/drivers/misc/
# create module dependencies to enable driver
sudo depmod -a

# and restart
sudo reboot

Create a custom application for the display

Create a custom application for the display like this classic ping-pong game

https://github.com/gsebik/buse-pingpong.git

[!tip] Check the README.md

How hrtimer and workqueue work?

High-Resolution Timers (hrtimer)

hrtimer is a high-precision kernel timer capable of nanosecond or microsecond resolution. It is built on top of Linux kernel's clock framework and can run in softirq or hardirq context, depending on its configuration.

Property	Description
High precision	Microsecond or nanosecond level timing, depending on hardware capabilities
Clock sources	CLOCK_MONOTONIC, CLOCK_REALTIME, etc.
Modes	REL (Relative), ABS (Absolute), with or without CPU pinning
Context	Runs in interrupt context, not process context
Non-blocking	You cannot call blocking functions like `spi_sync()` inside timer callback

Linux Kernel Clock Sources Comparison

Clock Source	Behavior	Use Cases
CLOCK_MONOTONIC	Always increasing, unaffected by system time changes (NTP, `settimeofday`, etc.)	Accurate relative timing, scheduling
CLOCK_REALTIME	Represents the wall-clock time (e.g., `date` command), can jump if time is adjusted	Timestamps, user-visible clocks
CLOCK_BOOTTIME	Like `CLOCK_MONOTONIC` but includes system suspend time	Measuring real elapsed time
CLOCK_TAI	Similar to `CLOCK_REALTIME`, monotonic but not affected by leap seconds	Precision scientific measurements

1. CLOCK_MONOTONIC

Always increases, even if the system time is corrected by NTP.
Does not represent real-world time, but measures durations reliably.
Example use:
- Timers
- Relative delays
- Measuring uptime

2. CLOCK_REALTIME

Represents actual wall-clock time (calendar time).
Can jump forward or backward if the user or NTP adjusts system time.
Example use:
- Timestamps in logs
- Real-time clock display

3. CLOCK_BOOTTIME

Same as CLOCK_MONOTONIC, but includes suspend-to-RAM time.
Example use:
- Tracking true elapsed time even across suspend/resume cycles.

4. CLOCK_TAI

International Atomic Time (TAI), monotonic and not affected by leap seconds.

- Less common, used in scientific or distributed systems where precise timekeeping is critical.

In more details>

https://www.kernel.org/doc/Documentation/timers/hrtimers.txt
https://elixir.bootlin.com/linux/v6.11/source/include/linux/hrtimer.h

Example Comparison

Action	CLOCK_MONOTONIC	CLOCK_REALTIME
System boot	Starts at 0	Shows wall-clock time
NTP adjusts system clock	Unchanged	Changes (jumps)
Measuring duration	Accurate	Inaccurate if clock changed
Showing current date & time	Not applicable	Applicable

Why we need a workqueue after hrtimer

Since hrtimer callbacks run in interrupt context, you cannot: - Sleep or yield - Call spi_sync(), schedule(), mutex_lock(), or anything that can block

So, we use queue_work() inside the timer callback to defer execution to process context, which can safely block.

How hrtimer works internally

Initialization
You call hrtimer_init() specifying:
- The clock source (e.g., CLOCK_MONOTONIC)
- The mode (e.g., HRTIMER_MODE_REL_PINNED)
- The callback function to invoke when the timer expires
Starting the Timer
You call hrtimer_start() with a relative or absolute time.
Expiration and Callback Execution
When the time expires, the kernel wakes an interrupt handler, which executes your callback.
You cannot block here.
Rescheduling (Optional)
You can restart the timer inside the callback if needed, or return HRTIMER_NORESTART.

Why Use `CLOCK_MONOTONIC`?

In Your Driver why we use CLOCK_MONOTONIC for hrtimer

Monotonic: It never goes backward, unaffected by time adjustments (e.g., NTP, system time changes).
- Stable for timing control.
It gives stable, monotonic, non-jumping timing behavior.
Ideal for relative delays, like your brightness timing.
Immune to time corrections or timezone changes.

But why GPIO set can vary in timing even with accurate hrtimer?

Root causes

GPIO Library Layer and Abstraction
- gpiod_set_value() runs through the GPIO subsystem, which often uses sleeping or deferred operations, depending on how the GPIO is implemented (MMIO-memory mapped- vs. external expander).
Process Context vs. Interrupt Context
- GPIO can only be toggled safely in process context if it is not memory-mapped or has locking requirements (such as using I2C or SPI expanders).
GPIO Subsystem Can Sleep
- If gpiod_cansleep() returns true, Linux may need to schedule a task to toggle the GPIO, introducing non-deterministic delays.
Non-Real-Time Kernel Scheduling
- On Raspberry Pi OS or general-purpose Linux:
  - CPU scheduling is non-deterministic.
  - Kernel preemption may delay execution.
  - Even if you have accurate hrtimer triggering, the actual GPIO access may get delayed if Linux schedules something else.
GPIO Driver Implementation
- Some GPIOs are direct memory-mapped (fast).
- Some are over slower buses (e.g., I2C GPIO expanders).
CPU and Thread Scheduling Contention
- If other high-priority tasks compete for CPU, your workqueue or GPIO set may wait its turn, adding variable latency.

How to detect If Your GPIO may be delayed

1. Check with `gpiod_cansleep()`

if (gpiod_cansleep(par->cs_gpio)) {     
    dev_warn(&spi->dev, "GPIO may introduce variable latency due to sleeping capability.\n"); 
}

True: GPIO may sleep, meaning timing is not guaranteed.
False: Likely direct memory-mapped and fast.

2. Observe with Oscilloscope or Logic Analyzer

Measure the actual waveform timing.
Compare expected vs. actual timing.

Yes, it can vary in timing....

3. Analyze CPU Scheduling with `trace-cmd` or `ftrace`

Check when your workqueue or hrtimer callback is scheduled.
See if other tasks block your driver.

Typical GPIO Delay Sources

Source	Effect on GPIO Timing
GPIO driver uses sleepable backend	Delay due to kernel scheduling
Workqueue scheduling delay	Work waits behind other kernel tasks
CPU preemption	Higher priority tasks preempt your work
Non-memory-mapped GPIO (e.g., I2C)	Extra latency due to bus transactions
Interrupt handler running elsewhere	Delay until your code gets scheduled

✅ Example: Why You Observe Less Variability on 2 CPUs

Less CPU contention
You restrict your task to only 2 CPUs, so:
- Fewer context switches
- More predictable timing
Lower system load
- Less interference from other kernel or user-space threads.

Example usage:

taskset -c 0,1 python3 your_demo.py

✅ How to Improve or Mitigate GPIO Variability

Use Direct MMIO GPIOs
Avoid GPIO expanders or bus-controlled GPIOs.
Reduce CPU Load or Isolate CPU
- Use CPU affinity (taskset) or cgroups to reserve CPU time.
Use Preempt-RT Kernel
- If true real-time is required, consider running a PREEMPT-RT patched kernel.
Optimize Driver Scheduling
- Ensure you use pinned hrtimers.
- Keep workqueue tasks short and efficient.
Tune Workqueue Priority
- Consider using high-priority workqueues or RT threads.

What Just Happened?

You implemented a complete framebuffer display driver — one of the more complex kernel driver types. The key engineering challenges were:

SPI timing — shift register data must be clocked in at the right speed
CS control — manual GPIO control with hrtimer for precise brightness timing
Framebuffer API — implementing the standard /dev/fbX interface so any Linux application can draw to the display
Continuous refresh — the display has no built-in memory, so the driver must constantly retransmit frame data

This is the same architecture used in industrial display drivers, albeit with more complex hardware interfaces (LVDS, MIPI-DSI) in production systems.

Course Overview | Next: Pong on Framebuffer →