Camera Pipeline: Capture, Process, and Display
Time estimate: ~45 minutes Prerequisites: SSH Login, Framebuffer Basics
Warning
Camera permissions: Your user must be in the video and render groups to access the camera:
setup_pi.sh script does this automatically.
On Raspberry Pi OS Trixie/Bookworm, the camera is enabled by default — no raspi-config step needed. On older images (Bullseye and earlier), enable it with sudo raspi-config → Interface Options → Legacy Camera → Enable, then reboot.
Learning Objectives
By the end of this tutorial you will be able to:
- Stream live camera frames using picamera2 (no desktop environment needed)
- Process frames in real time using OpenCV edge detection
- Display original and processed video side by side on the framebuffer
- Measure pipeline throughput (ms/frame and FPS)
- Explain the difference between libcamera and V4L2
Data Flow Architecture in Embedded Vision
Embedded vision systems follow a pipeline architecture: data flows through a series of processing stages, each transforming the output of the previous one. For the hardware interface details (MIPI CSI-2, D-PHY signaling, ISP, bandwidth math), see Camera and Display Interfaces. The canonical pattern is:
Sensor --> Capture --> Preprocessing --> Processing --> Output
(camera) (driver) (resize, color) (detect, classify) (file, network, display)
Each stage has a latency budget. If the total pipeline latency exceeds the frame interval (e.g., 100 ms for 10 FPS), the system drops frames. On embedded hardware, the bottleneck is usually the processing stage -- edge detection, object recognition, or compression.
On the Raspberry Pi, the camera pipeline involves two subsystems: libcamera (which manages the ISP -- Image Signal Processor -- for debayering, white balance, and exposure) and V4L2 (the traditional Linux video API for simpler USB cameras). The ISP handles the complex analog-to-digital conversion that turns raw sensor data into usable images.
In production systems, file I/O is replaced with in-memory ring buffers, and processing runs in a dedicated thread or on a hardware accelerator (GPU, NPU) to meet real-time constraints.
This pipeline pattern is a specific instance of the general sensor-to-actuator data flow common to all embedded systems -- whether the output is a saved image, a network stream, or a motor command.
Introduction
Many embedded Linux devices run real-time vision pipelines — security cameras, quality inspection systems, agricultural drones, and operator displays all capture, process, and display camera frames continuously.
On the Raspberry Pi, camera access uses libcamera rather than the older V4L2 interface:
- V4L2 (Video4Linux2) — the traditional Linux camera API, works with many USB cameras
- libcamera — a newer framework that handles the complex ISP (Image Signal Processor) pipeline on modern cameras like the RPi Camera Module
For this tutorial, we use picamera2 (the Python library for libcamera) for live capture, OpenCV for processing, and pygame to display the results on screen — all streaming in real time.
Warning
On Raspberry Pi OS Bookworm and newer, the libcamera-* CLI commands have been renamed to rpicam-*. Use rpicam-still instead of libcamera-still if you need the command-line tools. The picamera2 Python library works on both Bullseye and Bookworm.
1. Install Tools
Concept: Use the minimal tools needed for capture, processing, and display without a desktop environment.
Stuck?
python3-picamera2is pre-installed on Raspberry Pi OS Bookworm. On Bullseye, install it withsudo apt install -y python3-picamera2.- If
python3-pygameis not available, trypip3 install pygame.
2. Verify the Camera
Concept: Verify the camera pipeline works before writing code.
This captures a single image. If test.jpg is created, the camera hardware is working.
Checkpoint
Verify test.jpg exists and is not empty:
View live video over the network (headless setup):
If the Pi has no display connected, stream the camera feed to your laptop:
# On the Pi — start H.264 TCP stream
rpicam-vid -n -t 0 --width 640 --height 480 --framerate 30 \
--codec h264 --inline --listen --flush -o tcp://0.0.0.0:8888
Replace RASPBERRY_PI_IP with the Pi's actual IP address (e.g., 192.168.1.42). You should see a live 640×480 video stream. Press Ctrl+C on the Pi to stop.
| Flag | Purpose |
|---|---|
-n |
No preview window on the Pi (headless) |
-t 0 |
Run indefinitely (0 = no timeout) |
--inline |
Include SPS/PPS headers in stream (needed for VLC to decode) |
--listen |
Wait for a client to connect before streaming |
--flush |
Flush output after each frame (reduces latency) |
Stuck?
- "Camera not detected" — check the ribbon cable connection. On Bullseye: enable in
raspi-config. On Trixie/Bookworm: ensure user is invideoandrendergroups (groupsto check,sudo usermod -aG video,render $USERto fix) - "rpicam-still not found" — install with
sudo apt install -y rpicam-apps(on Bullseye:libcamera-apps) - Black image — the camera lens cap may still be on, or the sensor needs a moment to adjust exposure
3. Live Edge Detection Pipeline
Concept: A real embedded vision pipeline streams frames continuously — capture, process, and display in a loop. No intermediate files.
Create camera_edges.py:
import cv2, numpy as np, time
from picamera2 import Picamera2
import pygame
# --- Camera setup ---
picam2 = Picamera2()
config = picam2.create_preview_configuration(main={"size": (640, 480), "format": "RGB888"})
picam2.configure(config)
picam2.start()
# --- Display setup ---
pygame.init()
info = pygame.display.Info()
screen_w, screen_h = info.current_w, info.current_h
screen = pygame.display.set_mode((screen_w, screen_h), pygame.FULLSCREEN)
pygame.mouse.set_visible(False)
font = pygame.font.SysFont("monospace", 20)
half_w = screen_w // 2
running = True
while running:
# --- Capture ---
frame = picam2.capture_array() # numpy array, RGB, no file I/O
# --- Process ---
t0 = time.time()
gray = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY)
edges = cv2.Canny(gray, 80, 160)
edges_rgb = cv2.cvtColor(edges, cv2.COLOR_GRAY2RGB)
dt_ms = (time.time() - t0) * 1000
# --- Display: original (left) | edges (right) ---
screen.fill((0, 0, 0))
for i, img in enumerate([frame, edges_rgb]):
h, w = img.shape[:2]
scale = min(half_w / w, screen_h / h)
resized = cv2.resize(img, (int(w * scale), int(h * scale)))
surface = pygame.surfarray.make_surface(np.transpose(resized, (1, 0, 2)))
x = i * half_w + (half_w - resized.shape[1]) // 2
y = (screen_h - resized.shape[0]) // 2
screen.blit(surface, (x, y))
# --- Overlay timing ---
label = font.render(f"Process: {dt_ms:.1f} ms", True, (0, 255, 0))
screen.blit(label, (10, 10))
pygame.display.flip()
# --- Events ---
for event in pygame.event.get():
if event.type == pygame.QUIT:
running = False
if event.type == pygame.KEYDOWN and event.key == pygame.K_q:
running = False
picam2.stop()
pygame.quit()
Run it:
Press q to quit.
Checkpoint
You should see a live split-screen: the camera feed on the left, edge detection on the right, with processing time shown in the top-left corner.
Stuck?
- "No cameras available" — make sure
rpicam-stillworked in step 2 first - "No video device" — stop any running display service:
sudo systemctl stop getty@tty1 - Low FPS — reduce the resolution in
create_preview_configuration(e.g.,320x240)
4. Measure Pipeline Throughput
Concept: Embedded vision is often limited by processing time, not accuracy. The timing overlay in the live view shows per-frame processing cost, but to get a proper throughput measurement, run the pipeline for a fixed number of frames:
import cv2, time
from picamera2 import Picamera2
picam2 = Picamera2()
config = picam2.create_preview_configuration(main={"size": (640, 480), "format": "RGB888"})
picam2.configure(config)
picam2.start()
N = 50
t0 = time.time()
for _ in range(N):
frame = picam2.capture_array()
gray = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY)
edges = cv2.Canny(gray, 80, 160)
elapsed = time.time() - t0
picam2.stop()
print(f"{N} frames in {elapsed:.2f} s")
print(f"Average: {elapsed/N*1000:.1f} ms/frame ({N/elapsed:.1f} FPS)")
Checkpoint
Typical results on a Raspberry Pi 4 at 640x480: 5-15 FPS depending on the processing load.
What Just Happened?
You built a real-time embedded vision pipeline — capture, process, and display with no intermediate files:
This is the same architecture used in production embedded vision systems. The key difference from saving to files: frames stay in memory as numpy arrays, eliminating the JPEG encode/decode bottleneck.
Challenges
Challenge 1: Gaussian Blur Preprocessing
Add Gaussian blur before edge detection and compare the visual result:
Does blurring increase or decrease the processing time? By how much?Challenge 2: Adjustable Thresholds
Add keyboard controls to adjust the Canny thresholds live: - Up/Down arrows — change the high threshold - Left/Right arrows — change the low threshold
Display the current threshold values on screen alongside the timing.
Challenge 3: Save a Snapshot
Add a key (e.g., s) that saves the current frame pair (original + edges) to files. This is useful for documenting results without stopping the pipeline.
Deliverable
camera_edges.py— working live pipeline script- Screenshot or photo of the split-screen display (original vs edges)
- Throughput measurement (average ms/frame and FPS)
- Brief explanation: What would change if this pipeline needed to run at 30 FPS?
Course Overview | Next: Ball Position Detection → | Single-App UI →