Skip to content

Camera Pipeline: Capture, Process, and Display

Time estimate: ~45 minutes Prerequisites: SSH Login, Framebuffer Basics

Warning

Camera permissions: Your user must be in the video and render groups to access the camera:

sudo usermod -aG video,render $USER
Log out and back in for the change to take effect. The course setup_pi.sh script does this automatically.

On Raspberry Pi OS Trixie/Bookworm, the camera is enabled by default — no raspi-config step needed. On older images (Bullseye and earlier), enable it with sudo raspi-config → Interface Options → Legacy Camera → Enable, then reboot.

Learning Objectives

By the end of this tutorial you will be able to:

  • Stream live camera frames using picamera2 (no desktop environment needed)
  • Process frames in real time using OpenCV edge detection
  • Display original and processed video side by side on the framebuffer
  • Measure pipeline throughput (ms/frame and FPS)
  • Explain the difference between libcamera and V4L2
Data Flow Architecture in Embedded Vision

Embedded vision systems follow a pipeline architecture: data flows through a series of processing stages, each transforming the output of the previous one. For the hardware interface details (MIPI CSI-2, D-PHY signaling, ISP, bandwidth math), see Camera and Display Interfaces. The canonical pattern is:

Sensor  -->  Capture  -->  Preprocessing  -->  Processing  -->  Output
(camera)    (driver)      (resize, color)     (detect, classify)  (file, network, display)

Each stage has a latency budget. If the total pipeline latency exceeds the frame interval (e.g., 100 ms for 10 FPS), the system drops frames. On embedded hardware, the bottleneck is usually the processing stage -- edge detection, object recognition, or compression.

On the Raspberry Pi, the camera pipeline involves two subsystems: libcamera (which manages the ISP -- Image Signal Processor -- for debayering, white balance, and exposure) and V4L2 (the traditional Linux video API for simpler USB cameras). The ISP handles the complex analog-to-digital conversion that turns raw sensor data into usable images.

In production systems, file I/O is replaced with in-memory ring buffers, and processing runs in a dedicated thread or on a hardware accelerator (GPU, NPU) to meet real-time constraints.

This pipeline pattern is a specific instance of the general sensor-to-actuator data flow common to all embedded systems -- whether the output is a saved image, a network stream, or a motor command.


Introduction

Many embedded Linux devices run real-time vision pipelines — security cameras, quality inspection systems, agricultural drones, and operator displays all capture, process, and display camera frames continuously.

On the Raspberry Pi, camera access uses libcamera rather than the older V4L2 interface:

  • V4L2 (Video4Linux2) — the traditional Linux camera API, works with many USB cameras
  • libcamera — a newer framework that handles the complex ISP (Image Signal Processor) pipeline on modern cameras like the RPi Camera Module

For this tutorial, we use picamera2 (the Python library for libcamera) for live capture, OpenCV for processing, and pygame to display the results on screen — all streaming in real time.

Warning

On Raspberry Pi OS Bookworm and newer, the libcamera-* CLI commands have been renamed to rpicam-*. Use rpicam-still instead of libcamera-still if you need the command-line tools. The picamera2 Python library works on both Bullseye and Bookworm.


1. Install Tools

Concept: Use the minimal tools needed for capture, processing, and display without a desktop environment.

sudo apt-get update
sudo apt-get install -y python3-picamera2 python3-opencv python3-pygame
Stuck?
  • python3-picamera2 is pre-installed on Raspberry Pi OS Bookworm. On Bullseye, install it with sudo apt install -y python3-picamera2.
  • If python3-pygame is not available, try pip3 install pygame.

2. Verify the Camera

Concept: Verify the camera pipeline works before writing code.

rpicam-still -o test.jpg       # Bookworm and newer
# libcamera-still -o test.jpg  # Bullseye and older

This captures a single image. If test.jpg is created, the camera hardware is working.

Checkpoint

Verify test.jpg exists and is not empty:

ls -la test.jpg
file test.jpg
The file should be a valid JPEG, typically 1-5 MB.

View live video over the network (headless setup):

If the Pi has no display connected, stream the camera feed to your laptop:

# On the Pi — start H.264 TCP stream
rpicam-vid -n -t 0 --width 640 --height 480 --framerate 30 \
    --codec h264 --inline --listen --flush -o tcp://0.0.0.0:8888
# On your laptop — open in VLC
vlc tcp/h264://RASPBERRY_PI_IP:8888

Replace RASPBERRY_PI_IP with the Pi's actual IP address (e.g., 192.168.1.42). You should see a live 640×480 video stream. Press Ctrl+C on the Pi to stop.

Flag Purpose
-n No preview window on the Pi (headless)
-t 0 Run indefinitely (0 = no timeout)
--inline Include SPS/PPS headers in stream (needed for VLC to decode)
--listen Wait for a client to connect before streaming
--flush Flush output after each frame (reduces latency)
Stuck?
  • "Camera not detected" — check the ribbon cable connection. On Bullseye: enable in raspi-config. On Trixie/Bookworm: ensure user is in video and render groups (groups to check, sudo usermod -aG video,render $USER to fix)
  • "rpicam-still not found" — install with sudo apt install -y rpicam-apps (on Bullseye: libcamera-apps)
  • Black image — the camera lens cap may still be on, or the sensor needs a moment to adjust exposure

3. Live Edge Detection Pipeline

Concept: A real embedded vision pipeline streams frames continuously — capture, process, and display in a loop. No intermediate files.

Create camera_edges.py:

import cv2, numpy as np, time
from picamera2 import Picamera2
import pygame

# --- Camera setup ---
picam2 = Picamera2()
config = picam2.create_preview_configuration(main={"size": (640, 480), "format": "RGB888"})
picam2.configure(config)
picam2.start()

# --- Display setup ---
pygame.init()
info = pygame.display.Info()
screen_w, screen_h = info.current_w, info.current_h
screen = pygame.display.set_mode((screen_w, screen_h), pygame.FULLSCREEN)
pygame.mouse.set_visible(False)
font = pygame.font.SysFont("monospace", 20)

half_w = screen_w // 2

running = True
while running:
    # --- Capture ---
    frame = picam2.capture_array()  # numpy array, RGB, no file I/O

    # --- Process ---
    t0 = time.time()
    gray = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY)
    edges = cv2.Canny(gray, 80, 160)
    edges_rgb = cv2.cvtColor(edges, cv2.COLOR_GRAY2RGB)
    dt_ms = (time.time() - t0) * 1000

    # --- Display: original (left) | edges (right) ---
    screen.fill((0, 0, 0))

    for i, img in enumerate([frame, edges_rgb]):
        h, w = img.shape[:2]
        scale = min(half_w / w, screen_h / h)
        resized = cv2.resize(img, (int(w * scale), int(h * scale)))
        surface = pygame.surfarray.make_surface(np.transpose(resized, (1, 0, 2)))
        x = i * half_w + (half_w - resized.shape[1]) // 2
        y = (screen_h - resized.shape[0]) // 2
        screen.blit(surface, (x, y))

    # --- Overlay timing ---
    label = font.render(f"Process: {dt_ms:.1f} ms", True, (0, 255, 0))
    screen.blit(label, (10, 10))

    pygame.display.flip()

    # --- Events ---
    for event in pygame.event.get():
        if event.type == pygame.QUIT:
            running = False
        if event.type == pygame.KEYDOWN and event.key == pygame.K_q:
            running = False

picam2.stop()
pygame.quit()

Run it:

python3 camera_edges.py

Press q to quit.

Checkpoint

You should see a live split-screen: the camera feed on the left, edge detection on the right, with processing time shown in the top-left corner.

Stuck?
  • "No cameras available" — make sure rpicam-still worked in step 2 first
  • "No video device" — stop any running display service: sudo systemctl stop getty@tty1
  • Low FPS — reduce the resolution in create_preview_configuration (e.g., 320x240)

4. Measure Pipeline Throughput

Concept: Embedded vision is often limited by processing time, not accuracy. The timing overlay in the live view shows per-frame processing cost, but to get a proper throughput measurement, run the pipeline for a fixed number of frames:

import cv2, time
from picamera2 import Picamera2

picam2 = Picamera2()
config = picam2.create_preview_configuration(main={"size": (640, 480), "format": "RGB888"})
picam2.configure(config)
picam2.start()

N = 50
t0 = time.time()
for _ in range(N):
    frame = picam2.capture_array()
    gray = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY)
    edges = cv2.Canny(gray, 80, 160)

elapsed = time.time() - t0
picam2.stop()

print(f"{N} frames in {elapsed:.2f} s")
print(f"Average: {elapsed/N*1000:.1f} ms/frame ({N/elapsed:.1f} FPS)")
Checkpoint

Typical results on a Raspberry Pi 4 at 640x480: 5-15 FPS depending on the processing load.


What Just Happened?

You built a real-time embedded vision pipeline — capture, process, and display with no intermediate files:

Camera → picamera2 (in-memory) → OpenCV edge detect → pygame → Display

This is the same architecture used in production embedded vision systems. The key difference from saving to files: frames stay in memory as numpy arrays, eliminating the JPEG encode/decode bottleneck.


Challenges

Challenge 1: Gaussian Blur Preprocessing

Add Gaussian blur before edge detection and compare the visual result:

blurred = cv2.GaussianBlur(gray, (5, 5), 0)
edges = cv2.Canny(blurred, 80, 160)
Does blurring increase or decrease the processing time? By how much?

Challenge 2: Adjustable Thresholds

Add keyboard controls to adjust the Canny thresholds live: - Up/Down arrows — change the high threshold - Left/Right arrows — change the low threshold

Display the current threshold values on screen alongside the timing.

Challenge 3: Save a Snapshot

Add a key (e.g., s) that saves the current frame pair (original + edges) to files. This is useful for documenting results without stopping the pipeline.


Deliverable

  • camera_edges.py — working live pipeline script
  • Screenshot or photo of the split-screen display (original vs edges)
  • Throughput measurement (average ms/frame and FPS)
  • Brief explanation: What would change if this pipeline needed to run at 30 FPS?

Course Overview | Next: Ball Position Detection → | Single-App UI →