Framebuffer Driver Tutorial
Implement a SPI-Controlled LED Matrix Driver with GPIO-Based CS Control
Time estimate: ~120+ minutes Prerequisites: MCP9808 Kernel Driver, Graphics Stack
Don't have the BUSE LED matrix?
The OLED Framebuffer Driver tutorial teaches the same fbdev concepts (framebuffer registration, fb_info, fb_ops, deferred I/O) using an I2C OLED display — no special hardware required beyond the SSD1306 module from the OLED Display tutorial.
Learning Objectives
By the end of this tutorial you will be able to:
- Explain the Linux framebuffer subsystem architecture
- Understand SPI-controlled display driving with manual CS timing
- Build and load a framebuffer kernel driver
- Test a framebuffer driver with a user-space demo application
- Analyze timing constraints in display drivers
Framebuffer Drivers and the Linux Graphics Stack
The Linux framebuffer subsystem (fbdev) provides a simple, hardware-independent interface for graphics output. Applications write pixel data to /dev/fbN, and the kernel driver translates those writes into hardware-specific operations -- in this case, SPI transfers to shift registers that drive an LED matrix.
A framebuffer driver implements the fb_ops structure: fb_read/fb_write for data access, fb_ioctl for control, and fb_mmap for zero-copy memory mapping. The driver allocates a kernel buffer (fb_info.screen_base) and continuously refreshes the hardware display from this buffer.
For displays without built-in frame memory (like the BUSE LED matrix), the driver must actively refresh the display at a fixed rate. This creates a real-time constraint: the driver uses hrtimer for microsecond-precision timing and workqueue for deferred SPI transfers that cannot run in interrupt context.
Manual chip-select (CS) control is required here because the LED drivers use CS hold time to control brightness -- a timing-sensitive operation that the SPI controller's automatic CS cannot handle. This is a common pattern in industrial display drivers where hardware timing constraints exceed the standard bus protocol.
The fbdev interface is the embedded-friendly alternative to the full DRM/KMS graphics stack used on desktop Linux. For simple displays (OLEDs, LED matrices, small TFTs), fbdev is sufficient and much simpler to implement.
For background theory, see Graphics Stack and Device Tree and Drivers. The BUSE LED matrix hardware is documented in the Hardware Reference.
Understanding the Hardware
[!tip]+ You can find a detailed description of the BUSE LED display here
Display Architecture
- Resolution: 128×19 monochrome LEDs
- Panel Grouping: 4 Panels horizontally
- Column Grouping: 4 Groups per Panel
- Data Loading: Serial via SPI into shift registers
- Display Activation: Controlled via a Chip Select (CS) line
- Timing-Sensitive: The duration CS is LOW determines brightness.
Problem:
- Hardware CS cannot be used because the LED drivers require manual reassertion of CS after a fixed time to control brightness.
- Therefore, we must control CS via a GPIO pin, not by relying on SPI hardware CS.
Defining System Requirements
Functional Requirements
- Provide a Linux framebuffer interface (
/dev/fbX). - Convert framebuffer content to hardware-specific SPI frame format.
- Control CS line timing manually for consistent brightness.
Non-Functional Requirements
- Timing Accuracy: CS hold time must be precise (microseconds range).
- Low CPU Overhead: Avoid busy-wait loops.
- Responsiveness: Avoid blocking critical system tasks.
Evaluating linux kernel features in the driver
| Mechanism | Purpose | Suitability for Our Case |
|---|---|---|
| hrtimer | High-resolution timing | ✅ Accurate microsecond-level timing |
| workqueue | Deferred execution in process context | ✅ Allows SPI and GPIO operations safely |
| spinlock | Atomic protection of framebuffer | ✅ Prevents race conditions during frame capture |
| fbdev (framebuffer API) | Provides /dev/fbX interface |
✅ Standard Linux display interface |
Choosing the Design Strategy
- Framebuffer Registration
-
Provide
/dev/fbXaccess for user applications. -
Frame Conversion Engine
-
Convert Linux framebuffer data into panel-specific SPI data format.
-
Controlled Group-by-Group Transmission
- Divide data into groups.
-
Send one group at a time via SPI.
-
Manual CS Control with Accurate Timing
- Pull CS LOW after SPI transfer.
- Use hrtimer to hold CS LOW for a fixed time (e.g., 50 µs).
-
Reassert CS and queue next group or new frame.
-
Continuous Refresh Loop
- Repeat the process to maintain display state.
Key Trade-offs and Limitations
GPIO-Controlled CS Pros and Cons
| ✅ Advantages | ❌ Limitations |
|---|---|
| Full control over CS timing | Higher CPU involvement |
| Supports brightness adjustment | GPIO operations may have unpredictable latency on non-RT kernels |
| Hardware-independent CS control | Not synchronized with SPI controller automatically |
![]() |
SPI Bus Speed Considerations
- Maximum Speed: Limited by hardware and signal integrity (you defined 3 MHz).
- Trade-off: Higher speed reduces refresh latency but increases signal integrity risks.
CPU and Task Scheduling Constraints
- Workqueue allows safe SPI access, but may be delayed if CPU is heavily loaded.
- CPU Pinning (optional) can help reduce variability by isolating tasks to specific cores.
Step-by-Step Development Plan
Step 1: Register Framebuffer
- Provide
/dev/fbXfor user-space access. - Allocate framebuffer memory with
kzalloc.
Step 2: Framebuffer Processing Logic
- Copy framebuffer memory under spinlock to prevent data races.
- Convert bitmap to hardware-specific SPI frame format.
Step 3: SPI Frame Transmission
- Implement process_next_group() to send one group at a time via SPI.
Step 4: Manual CS Control with hrtimer
- After SPI transfer, pull CS LOW.
- Start hrtimer to hold CS LOW for DISPLAY_BRIGHTNESS_USEC.
- After timer expires, reassert CS and queue next group or frame.
Step 5: Continuous Loop
- Schedule next group or frame using workqueue and hrtimer callback.
Verification Checklist
- Device Tree Overlay correctly sets up SPI and GPIO.
/dev/fbXbecomes available.- SPI traffic can be observed (e.g., with a logic analyzer).
- Brightness control works as expected.
- Frame content reflects
/dev/fbXwrites.
Optional Improvements to Consider
- Dynamic Brightness Control: Expose brightness as a sysfs attribute.
- Power Management: Pause updates when framebuffer is idle.
- Use
skeletonfb.c: Align with kernel standards for maintainability. - Multi-Threaded Workqueue: Improve responsiveness under CPU load.
Example Use-Case Recap
Why GPIO + hrtimer + Workqueue?
| Requirement | Solution |
|---|---|
| Manual CS timing | GPIO + hrtimer |
| Safe SPI operations | Workqueue in process context |
| Accurate timing | hrtimer with REL_PINNED mode |
| Standard Linux interface | fbdev /dev/fbX |
Next Steps
- Implement the driver as described.
- Test with static and animated content.
- Observe timing behavior under CPU load.
- Evaluate whether to refactor using
skeletonfb.c.
Framebuffer Architecture
The framebuffer subsystem (fbdev) provides a hardware-independent API for graphics output, primarily used in embedded systems or scenarios where full GPU acceleration isn't required. It abstracts the complexities of video hardware, enabling applications to interact with a simple memory buffer representing the display.
Each framebuffer device is represented by a struct fb_info, which includes:
fb_ops: Function pointers defining operations likefb_read,fb_write,fb_ioctl, andfb_mmap.-
fixandvarscreen info: Structures (fb_fix_screeninfoandfb_var_screeninfo) that describe fixed and variable display parameters, such as resolution and color depth. -
Framebuffer memory pointer: A reference to the memory region used for the framebuffer.Kernel.org
These structures are defined in the kernel header include/linux/fb.h
Hands-on
Download the prepared BUSE driver from GitHub
Note
If the repository is unavailable, the driver source is also provided in src/embedded-linux/drivers/ within the course repository.
Compile DT overlay
# compile overlay
dtc -@ -I dts -O dtb -o busefb.dtbo busefb-overlay.dts
# copy to overlays
sudo cp busefb.dtbo /boot/overlays/
# add to /boot/firmware/config.txt
dtoverlay=busefb
# reboot
sudo reboot
Checkpoint — Overlay Installed
After reboot, verify the overlay is loaded by checking the live device tree:
You should see the node's properties (compatible, reg, spi-max-frequency, etc.). If the directory does not exist, the overlay was not applied — check that dtoverlay=busefb is in /boot/firmware/config.txt and that busefb.dtbo is in /boot/overlays/.
Also check dmesg for SPI driver messages:
Build kernel module
Check dev
There should be somehting like:
Check the kernel log
[ 8.230723] NET: Registered PF_ALG protocol family
[ 8.380470] bcmgenet fd580000.ethernet: configuring instance for external RGMII (RX delay)
[ 8.381229] bcmgenet fd580000.ethernet eth0: Link is Down
[ 8.401801] brcmfmac: brcmf_cfg80211_set_power_mgmt: power save enabled
[ 11.453561] bcmgenet fd580000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
[ 57.352977] systemd[762]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[ 74.067208] busefb: loading out-of-tree module taints kernel.
[ 74.067931] SPI driver busefb has no spi_device_id for buse,buse128x19
[ 74.068751] Console: switching to mono frame buffer device 4x2
[ 74.068865] busefb spi0.0: busefb registered as /dev/fb
Checkpoint — Module Loaded
ls /dev/fb* should show /dev/fb0. The dmesg output should confirm busefb registered as /dev/fb.
Remove the driver for testing
[!caution]+ In case the driver not working properly check the parameters in the driver and the device tree!
Testing the driver with a demo application
For testing the driver can use the following demo application. The demo application shows a bouncing ball on the display if it working properly.
import numpy as np
import time
import random
WIDTH = 128
HEIGHT = 19
def clear_frame():
return np.zeros((HEIGHT, WIDTH), dtype=np.uint8)
def draw_pixel(frame, x, y):
if 0 <= x < WIDTH and 0 <= y < HEIGHT:
frame[y, x] = 1
def draw_ball(frame, cx, cy, radius=1):
for dy in range(-radius, radius + 1):
for dx in range(-radius, radius + 1):
if dx * dx + dy * dy <= radius * radius:
draw_pixel(frame, cx + dx, cy + dy)
def render_frame(frame):
packed = np.packbits(frame, axis=1, bitorder='little')
with open("/dev/fb0", "wb") as fb:
fb.write(packed.tobytes())
# Ball state
x = random.randint(2, WIDTH - 3)
y = random.randint(2, HEIGHT - 3)
vx = random.choice([-1, 1])
vy = random.choice([-1, 1])
radius =3
try:
while True:
frame = clear_frame()
draw_ball(frame, x, y, radius)
render_frame(frame)
# Update ball position
x += vx
y += vy
# Bounce off edges
if x - radius <= 0 or x + radius >= WIDTH - 1:
vx = -vx
if y - radius <= 0 or y + radius >= HEIGHT - 1:
vy = -vy
time.sleep(0.05)
except KeyboardInterrupt:
pass
Save and execute
[!tip] If you got and error like:
OSError: [Errno 27] File too large->Set the proper display size!
Checkpoint — Demo Working
You should see a bouncing ball animation on the LED matrix display. If the display shows garbage or nothing, check the driver parameters and device tree configuration.
Install the driver
Caution
Install the driver only after it has been tested and confirmed to be working properly!
# build
make
# copy
sudo cp busefb.ko /lib/modules/$(uname -r)/kernel/drivers/misc/
# create module dependencies to enable driver
sudo depmod -a
# and restart
sudo reboot
Create a custom application for the display
Create a custom application for the display like this classic ping-pong game
[!tip] Check the
README.md
How hrtimer and workqueue work?
High-Resolution Timers (hrtimer)
hrtimer is a high-precision kernel timer capable of nanosecond or microsecond resolution. It is built on top of Linux kernel's clock framework and can run in softirq or hardirq context, depending on its configuration.
| Property | Description |
|---|---|
| High precision | Microsecond or nanosecond level timing, depending on hardware capabilities |
| Clock sources | CLOCK_MONOTONIC, CLOCK_REALTIME, etc. |
| Modes | REL (Relative), ABS (Absolute), with or without CPU pinning |
| Context | Runs in interrupt context, not process context |
| Non-blocking | You cannot call blocking functions like spi_sync() inside timer callback |
Linux Kernel Clock Sources Comparison
| Clock Source | Behavior | Use Cases |
|---|---|---|
| CLOCK_MONOTONIC | Always increasing, unaffected by system time changes (NTP, settimeofday, etc.) |
Accurate relative timing, scheduling |
| CLOCK_REALTIME | Represents the wall-clock time (e.g., date command), can jump if time is adjusted |
Timestamps, user-visible clocks |
| CLOCK_BOOTTIME | Like CLOCK_MONOTONIC but includes system suspend time |
Measuring real elapsed time |
| CLOCK_TAI | Similar to CLOCK_REALTIME, monotonic but not affected by leap seconds |
Precision scientific measurements |
1. CLOCK_MONOTONIC
-
Always increases, even if the system time is corrected by NTP.
-
Does not represent real-world time, but measures durations reliably.
-
Example use:
-
Timers
-
Relative delays
-
Measuring uptime
-
2. CLOCK_REALTIME
-
Represents actual wall-clock time (calendar time).
-
Can jump forward or backward if the user or NTP adjusts system time.
-
Example use:
-
Timestamps in logs
-
Real-time clock display
-
3. CLOCK_BOOTTIME
-
Same as
CLOCK_MONOTONIC, but includes suspend-to-RAM time. -
Example use:
- Tracking true elapsed time even across suspend/resume cycles.
4. CLOCK_TAI
- International Atomic Time (TAI), monotonic and not affected by leap seconds.
- Less common, used in scientific or distributed systems where precise timekeeping is critical.
In more details>
- https://www.kernel.org/doc/Documentation/timers/hrtimers.txt
- https://elixir.bootlin.com/linux/v6.11/source/include/linux/hrtimer.h
Example Comparison
| Action | CLOCK_MONOTONIC | CLOCK_REALTIME |
|---|---|---|
| System boot | Starts at 0 | Shows wall-clock time |
| NTP adjusts system clock | Unchanged | Changes (jumps) |
| Measuring duration | Accurate | Inaccurate if clock changed |
| Showing current date & time | Not applicable | Applicable |
Why we need a workqueue after hrtimer
Since hrtimer callbacks run in interrupt context, you cannot:
- Sleep or yield
- Call spi_sync(), schedule(), mutex_lock(), or anything that can block
So, we use queue_work() inside the timer callback to defer execution to process context, which can safely block.
How hrtimer works internally
- Initialization
-
You call
hrtimer_init()specifying:- The clock source (e.g.,
CLOCK_MONOTONIC) - The mode (e.g.,
HRTIMER_MODE_REL_PINNED) - The callback function to invoke when the timer expires
- The clock source (e.g.,
-
Starting the Timer
-
You call
hrtimer_start()with a relative or absolute time. -
Expiration and Callback Execution
- When the time expires, the kernel wakes an interrupt handler, which executes your callback.
-
You cannot block here.
-
Rescheduling (Optional)
- You can restart the timer inside the callback if needed, or return
HRTIMER_NORESTART.
Why Use CLOCK_MONOTONIC?
In Your Driver why we use CLOCK_MONOTONIC for hrtimer
- Monotonic: It never goes backward, unaffected by time adjustments (e.g., NTP, system time changes).
-
- Stable for timing control.
- It gives stable, monotonic, non-jumping timing behavior.
- Ideal for relative delays, like your brightness timing.
- Immune to time corrections or timezone changes.
But why GPIO set can vary in timing even with accurate hrtimer?
Root causes
-
GPIO Library Layer and Abstraction
gpiod_set_value()runs through the GPIO subsystem, which often uses sleeping or deferred operations, depending on how the GPIO is implemented (MMIO-memory mapped- vs. external expander).
-
Process Context vs. Interrupt Context
- GPIO can only be toggled safely in process context if it is not memory-mapped or has locking requirements (such as using I2C or SPI expanders).
-
GPIO Subsystem Can Sleep
- If
gpiod_cansleep()returns true, Linux may need to schedule a task to toggle the GPIO, introducing non-deterministic delays.
- If
-
Non-Real-Time Kernel Scheduling
-
On Raspberry Pi OS or general-purpose Linux:
-
CPU scheduling is non-deterministic.
-
Kernel preemption may delay execution.
-
Even if you have accurate hrtimer triggering, the actual GPIO access may get delayed if Linux schedules something else.
-
-
-
GPIO Driver Implementation
-
Some GPIOs are direct memory-mapped (fast).
-
Some are over slower buses (e.g., I2C GPIO expanders).
-
-
CPU and Thread Scheduling Contention
- If other high-priority tasks compete for CPU, your workqueue or GPIO set may wait its turn, adding variable latency.
How to detect If Your GPIO may be delayed
1. Check with gpiod_cansleep()
if (gpiod_cansleep(par->cs_gpio)) {
dev_warn(&spi->dev, "GPIO may introduce variable latency due to sleeping capability.\n");
}
- True: GPIO may sleep, meaning timing is not guaranteed.
- False: Likely direct memory-mapped and fast.
2. Observe with Oscilloscope or Logic Analyzer
- Measure the actual waveform timing.
- Compare expected vs. actual timing.
Yes, it can vary in timing....


3. Analyze CPU Scheduling with trace-cmd or ftrace
- Check when your workqueue or hrtimer callback is scheduled.
- See if other tasks block your driver.
Typical GPIO Delay Sources
| Source | Effect on GPIO Timing |
|---|---|
| GPIO driver uses sleepable backend | Delay due to kernel scheduling |
| Workqueue scheduling delay | Work waits behind other kernel tasks |
| CPU preemption | Higher priority tasks preempt your work |
| Non-memory-mapped GPIO (e.g., I2C) | Extra latency due to bus transactions |
| Interrupt handler running elsewhere | Delay until your code gets scheduled |
✅ Example: Why You Observe Less Variability on 2 CPUs
-
Less CPU contention
You restrict your task to only 2 CPUs, so:-
Fewer context switches
-
More predictable timing
-
-
Lower system load
- Less interference from other kernel or user-space threads.
Example usage:
✅ How to Improve or Mitigate GPIO Variability
-
Use Direct MMIO GPIOs
Avoid GPIO expanders or bus-controlled GPIOs. -
Reduce CPU Load or Isolate CPU
- Use CPU affinity (
taskset) or cgroups to reserve CPU time.
- Use CPU affinity (
-
Use Preempt-RT Kernel
- If true real-time is required, consider running a PREEMPT-RT patched kernel.
-
Optimize Driver Scheduling
-
Ensure you use pinned hrtimers.
-
Keep workqueue tasks short and efficient.
-
-
Tune Workqueue Priority
- Consider using high-priority workqueues or RT threads.
What Just Happened?
You implemented a complete framebuffer display driver — one of the more complex kernel driver types. The key engineering challenges were:
- SPI timing — shift register data must be clocked in at the right speed
- CS control — manual GPIO control with hrtimer for precise brightness timing
- Framebuffer API — implementing the standard
/dev/fbXinterface so any Linux application can draw to the display - Continuous refresh — the display has no built-in memory, so the driver must constantly retransmit frame data
This is the same architecture used in industrial display drivers, albeit with more complex hardware interfaces (LVDS, MIPI-DSI) in production systems.
