Skip to content

SDL2 + OpenGL ES Rotating Cube

Time estimate: ~45 minutes Prerequisites: SSH Login, DRM/KMS Test Pattern

Learning Objectives

By the end of this tutorial you will be able to:

  • Build and run an OpenGL ES 2.0 application with SDL2 on the Raspberry Pi
  • Explain the GPU rendering pipeline: vertex shader, fragment shader, and draw call
  • Incrementally build a 3D scene — from a static triangle to a rotating cube
  • Use SDL2's KMS/DRM backend for tear-free fullscreen rendering without a compositor
  • Measure frame rate and GPU utilization on embedded hardware
GPU Rendering Pipeline and VSync

A rotating 3D cube exercises every layer of the embedded graphics stack. SDL2 opens a DRM/KMS display, OpenGL ES 2.0 compiles vertex and fragment shaders on the GPU, and geometry is rendered into a back buffer. SDL_GL_SwapWindow() triggers a DRM page flip synchronised to VBlank (VSync), preventing tearing by ensuring the display controller switches buffers only during the vertical blanking interval. With VSync enabled, the render loop naturally caps at the display refresh rate (typically 60 Hz); disabling it lets you measure raw GPU throughput but causes visible horizontal tear lines. The three-layer driver stack — kernel DRM driver (vc4), Mesa userspace driver (v3d_dri.so), and EGL platform — translates OpenGL API calls into GPU command streams and manages buffer allocation.

See also: Graphics Stack reference | Real-Time Graphics reference

Course Source Repository

This tutorial references source files from the course repository. If you haven't cloned it yet on your Pi:

cd ~
git clone https://github.com/OE-KVK-H2IoT/embedded-linux.git

Source files for this tutorial are in ~/embedded-linux/solutions/sdl2-rotating-cube/.


Introduction

A rotating 3D cube is the "hello world" of GPU programming. It exercises every layer of the embedded graphics stack: SDL2 creates a window on the DRM/KMS display, OpenGL ES 2.0 compiles shaders and renders geometry on the GPU, and VSync synchronizes frame presentation with the display refresh.

Instead of dumping all the code at once, this tutorial builds the cube step by step — each step adds one concept with a visible result:

Step What You Build New Concept
1 Static triangle SDL2 window, shaders, VBO, glDrawArrays
2 Colored square Index buffer (IBO), glDrawElements
3 Rotating square Matrix math, uniforms, animation
4 3D rotating cube Perspective projection, depth buffer, full MVP chain
SDL2 Rendering: With and Without a GPU

SDL2 can render in three modes — understanding when the GPU is involved (and when it isn't) is essential:

1. Software renderer (CPU only — no GPU needed)

Your code → SDL_RenderDrawRect() → SDL software rasterizer → pixel buffer in RAM
          → SDL copies buffer to DRM framebuffer → display controller scans out

The CPU does all the drawing. Every SDL_RenderDrawLine(), SDL_RenderCopy(), etc. writes pixels in a CPU-side buffer. This works on any Linux system with a framebuffer — no GPU driver, no Mesa, no OpenGL. Use SDL_CreateRenderer(win, -1, SDL_RENDERER_SOFTWARE) to force this.

2. Hardware-accelerated 2D renderer (GPU assists)

Your code → SDL_RenderDrawRect() → SDL translates to OpenGL/Vulkan calls
          → Mesa GPU driver → GPU draws into render target → DRM page flip

Same SDL2 API, but the backend uses OpenGL internally. SDL_RenderCopy() becomes a GPU texture blit — much faster for image scaling, rotation, and alpha blending. Use SDL_CreateRenderer(win, -1, SDL_RENDERER_ACCELERATED). This is what the Level Display tutorial uses.

3. Direct OpenGL ES (full GPU control — this tutorial)

Your code → glDrawArrays() → Mesa compiles shaders → GPU executes pipeline
          → GPU writes to render target → SDL_GL_SwapWindow() → DRM page flip

You bypass SDL2's renderer entirely. You write shaders (vertex + fragment programs) in GLSL, upload geometry to GPU memory, and issue draw calls. The GPU runs your shaders on its massively parallel cores. This is what you need for 3D, custom visual effects, or maximum performance.

When does each make sense?

Mode CPU load GPU load Use case
Software High (draws every pixel) None Simple UI, no GPU driver, SPI displays
Accelerated 2D Low (sends textures to GPU) Low Dashboards, image display, 2D games
Direct OpenGL ES Minimal (uploads geometry) High (runs shaders) 3D, particles, data visualization

Can SDL2 work without a GPU? Yes — the software renderer works everywhere. But on the Pi 4, the VideoCore VI GPU is sitting idle when you use it. The hardware-accelerated path (SDL_RENDERER_ACCELERATED) uses the GPU for 2D operations automatically. For 3D, you need OpenGL ES shaders (this tutorial).

What Is a Shader, and Why Does the GPU Need One?

A CPU is designed for complex logic: branch prediction, out-of-order execution, deep caches — great for running your application, but it processes one thing at a time (per core).

A GPU is designed for simple math on massive data: thousands of tiny cores that all execute the same program on different data simultaneously. A GPU core is much simpler than a CPU core — no branch predictor, small cache — but there are hundreds of them.

A shader is that "same program" the GPU cores run. You write it in GLSL (OpenGL Shading Language), and the GPU driver (Mesa) compiles it to GPU machine code at runtime:

Your GLSL source code (text)
Mesa shader compiler (CPU)
    → compiles to GPU machine code (binary)
GPU loads compiled shader
    → runs it on hundreds of cores simultaneously
    → each core processes one vertex (vertex shader)
       or one pixel (fragment shader) independently

Two shader types in OpenGL ES 2.0:

Shader Runs per... Job Input Output
Vertex shader vertex (corner) Position the vertex on screen 3D coordinates, transform matrix 2D screen position
Fragment shader pixel (fragment) Determine the pixel color Interpolated values from vertices RGBA color

Between them, the rasterizer (fixed GPU hardware, not programmable) determines which pixels are covered by each triangle and interpolates vertex data across them.

Why not just use the CPU? A 1080p frame has ~2 million pixels. The fragment shader runs once per pixel per frame, at 60 FPS = 124 million shader executions per second. The Pi 4's GPU (VideoCore VI) has enough parallel cores to handle this; a single CPU core cannot.


1. Install Dependencies

Concept: SDL2 provides the window and input layer. OpenGL ES 2.0 provides the GPU rendering API. Both are available as system packages.

sudo apt-get update
sudo apt-get install -y build-essential cmake libsdl2-dev libgles2-mesa-dev

Verify the OpenGL ES library exists:

ls /usr/lib/arm-linux-gnueabihf/libGLESv2.so* 2>/dev/null || \
ls /usr/lib/aarch64-linux-gnu/libGLESv2.so* 2>/dev/null
The GPU Driver Stack on the Pi

Three layers make GPU rendering work on the Pi 4:

  1. Kernel DRM driver (vc4) — manages the GPU hardware, display timing, and buffer allocation. Loaded by dtoverlay=vc4-kms-v3d in config.txt. Source: drivers/gpu/drm/vc4/ and drivers/gpu/drm/v3d/.
  2. Mesa userspace driver (v3d_dri.so) — translates OpenGL ES API calls into GPU command streams. The libgles2-mesa-dev package provides the headers; the runtime library (libGLESv2.so) comes from Mesa.
  3. EGL platform — connects OpenGL ES to the display. On KMS/DRM, this is mesa-egl using the egl_dri2 platform, which opens /dev/dri/renderD128 for GPU rendering and /dev/dri/card0 for display output.

SDL2's kmsdrm video backend ties these together: it opens the DRM device, creates a GBM (Generic Buffer Manager) surface, and binds an EGL context to it. When you call SDL_GL_SwapWindow(), EGL signals the DRM subsystem to page-flip the rendered buffer to the display at VSync.

For custom images: You need the kernel with CONFIG_DRM_VC4=y and CONFIG_DRM_V3D=y, the Mesa userspace with v3d Gallium driver, and the vc4-kms-v3d overlay enabled. In Buildroot, enable BR2_PACKAGE_MESA3D with the v3d and vc4 Gallium drivers.

Checkpoint

cmake --version and pkg-config --cflags sdl2 both return valid output.


2. Project Setup

Concept: A minimal SDL2 + GLES2 project needs only two files: a CMake build file and the C source.

Create the project directory:

mkdir -p ~/sdl2-cube && cd ~/sdl2-cube

CMakeLists.txt

cat > CMakeLists.txt << 'EOF'
cmake_minimum_required(VERSION 3.16)
project(sdl2_cube C)

set(CMAKE_C_STANDARD 11)

find_package(SDL2 REQUIRED)

add_executable(sdl2_cube main.c)
target_include_directories(sdl2_cube PRIVATE ${SDL2_INCLUDE_DIRS})
target_link_libraries(sdl2_cube PRIVATE ${SDL2_LIBRARIES} GLESv2 m)
EOF
Checkpoint

You have CMakeLists.txt in ~/sdl2-cube/. The source file main.c will be created in the next step.


3. Step 1 — Static Triangle

Goal: Get pixels on screen with the absolute minimum OpenGL ES code.

You will write: an SDL2 window, a vertex shader, a fragment shader, one vertex buffer, and a single glDrawArrays call.

What Is a Shader?

A shader is a small program that runs on the GPU. OpenGL ES 2.0 requires two:

Vertex Data (3 corners)
Vertex Shader (runs per vertex)
    → Positions each corner on screen
    → Passes color to the next stage
Rasterizer (GPU hardware)
    → Fills in pixels between vertices
    → Interpolates colors smoothly
Fragment Shader (runs per pixel)
    → Outputs the final color
Framebuffer → Display

You write the vertex and fragment shaders in GLSL (OpenGL Shading Language). The rasterizer is fixed hardware — you don't program it.

Clip-Space Coordinates

Without a projection matrix, the vertex shader outputs clip-space coordinates:

  • X: −1 (left edge) to +1 (right edge)
  • Y: −1 (bottom) to +1 (top)
  • Z: −1 to +1 (depth, ignored for now)

A triangle at (0, 0.5), (−0.5, −0.5), (0.5, −0.5) sits centered in the window. No matrix math needed — coordinates map directly to the screen.

Create main.c:

cat > main.c << 'MAIN_EOF'
#include <SDL2/SDL.h>
#include <GLES2/gl2.h>
#include <stdio.h>

/* ── Shader helpers ─────────────────────────────────── */

static GLuint compile_shader(GLenum type, const char *src)
{
    GLuint s = glCreateShader(type);
    glShaderSource(s, 1, &src, NULL);
    glCompileShader(s);
    GLint ok = 0;
    glGetShaderiv(s, GL_COMPILE_STATUS, &ok);
    if (!ok) {
        char log[512];
        glGetShaderInfoLog(s, sizeof(log), NULL, log);
        fprintf(stderr, "Shader error: %s\n", log);
        return 0;
    }
    return s;
}

static GLuint link_program(GLuint vs, GLuint fs)
{
    GLuint p = glCreateProgram();
    glAttachShader(p, vs);
    glAttachShader(p, fs);
    glLinkProgram(p);
    GLint ok = 0;
    glGetProgramiv(p, GL_LINK_STATUS, &ok);
    if (!ok) {
        char log[512];
        glGetProgramInfoLog(p, sizeof(log), NULL, log);
        fprintf(stderr, "Link error: %s\n", log);
        return 0;
    }
    return p;
}

/* ── Main ───────────────────────────────────────────── */

int main(int argc, char **argv)
{
    (void)argc; (void)argv;

    if (SDL_Init(SDL_INIT_VIDEO | SDL_INIT_EVENTS) != 0) {
        fprintf(stderr, "SDL_Init: %s\n", SDL_GetError());
        return 1;
    }

    /* Request OpenGL ES 2.0 context */
    SDL_GL_SetAttribute(SDL_GL_CONTEXT_PROFILE_MASK, SDL_GL_CONTEXT_PROFILE_ES);
    SDL_GL_SetAttribute(SDL_GL_CONTEXT_MAJOR_VERSION, 2);
    SDL_GL_SetAttribute(SDL_GL_CONTEXT_MINOR_VERSION, 0);
    SDL_GL_SetAttribute(SDL_GL_DOUBLEBUFFER, 1);

    int w = 800, h = 480;
    SDL_Window *win = SDL_CreateWindow("Step 1: Triangle",
        SDL_WINDOWPOS_CENTERED, SDL_WINDOWPOS_CENTERED,
        w, h, SDL_WINDOW_OPENGL | SDL_WINDOW_FULLSCREEN_DESKTOP);
    if (!win) {
        fprintf(stderr, "Window: %s\n", SDL_GetError());
        SDL_Quit(); return 1;
    }

    SDL_GLContext ctx = SDL_GL_CreateContext(win);
    if (!ctx) {
        fprintf(stderr, "GL context: %s\n", SDL_GetError());
        SDL_DestroyWindow(win); SDL_Quit(); return 1;
    }
    SDL_GL_SetSwapInterval(1);  /* VSync ON */

    printf("GL Renderer: %s\n", glGetString(GL_RENDERER));
    printf("GL Version:  %s\n", glGetString(GL_VERSION));

    /* ── Shaders (no matrix — clip-space coordinates) ── */
    const char *vs_src =
        "attribute vec3 aPos;\n"
        "attribute vec3 aCol;\n"
        "varying vec3 vCol;\n"
        "void main() {\n"
        "    vCol = aCol;\n"
        "    gl_Position = vec4(aPos, 1.0);\n"
        "}\n";

    const char *fs_src =
        "precision mediump float;\n"
        "varying vec3 vCol;\n"
        "void main() {\n"
        "    gl_FragColor = vec4(vCol, 1.0);\n"
        "}\n";

    GLuint vs = compile_shader(GL_VERTEX_SHADER, vs_src);
    GLuint fs = compile_shader(GL_FRAGMENT_SHADER, fs_src);
    GLuint prog = link_program(vs, fs);
    glDeleteShader(vs);
    glDeleteShader(fs);
    if (!prog) return 1;

    GLint loc_pos = glGetAttribLocation(prog, "aPos");
    GLint loc_col = glGetAttribLocation(prog, "aCol");

    /* ── Triangle geometry: 3 vertices × (position + color) ── */
    const float verts[] = {
        /* x     y     z       r    g    b  */
         0.0f,  0.5f, 0.0f,  1.0f, 0.0f, 0.0f,   /* top    — red   */
        -0.5f, -0.5f, 0.0f,  0.0f, 1.0f, 0.0f,   /* left   — green */
         0.5f, -0.5f, 0.0f,  0.0f, 0.0f, 1.0f,   /* right  — blue  */
    };

    GLuint vbo;
    glGenBuffers(1, &vbo);
    glBindBuffer(GL_ARRAY_BUFFER, vbo);
    glBufferData(GL_ARRAY_BUFFER, sizeof(verts), verts, GL_STATIC_DRAW);

    /* ── Render loop ── */
    int running = 1;
    while (running) {
        SDL_Event e;
        while (SDL_PollEvent(&e)) {
            if (e.type == SDL_QUIT) running = 0;
            if (e.type == SDL_KEYDOWN && e.key.keysym.sym == SDLK_ESCAPE)
                running = 0;
        }

        glViewport(0, 0, w, h);
        glClearColor(0.12f, 0.12f, 0.14f, 1.0f);
        glClear(GL_COLOR_BUFFER_BIT);

        glUseProgram(prog);

        glBindBuffer(GL_ARRAY_BUFFER, vbo);
        glEnableVertexAttribArray((GLuint)loc_pos);
        glVertexAttribPointer((GLuint)loc_pos, 3, GL_FLOAT, GL_FALSE,
                              6 * sizeof(float), (void *)0);
        glEnableVertexAttribArray((GLuint)loc_col);
        glVertexAttribPointer((GLuint)loc_col, 3, GL_FLOAT, GL_FALSE,
                              6 * sizeof(float), (void *)(3 * sizeof(float)));

        glDrawArrays(GL_TRIANGLES, 0, 3);

        SDL_GL_SwapWindow(win);
    }

    glDeleteProgram(prog);
    glDeleteBuffers(1, &vbo);
    SDL_GL_DeleteContext(ctx);
    SDL_DestroyWindow(win);
    SDL_Quit();
    return 0;
}
MAIN_EOF

Build and Run

cd ~/sdl2-cube
cmake -S . -B build
cmake --build build -j$(nproc)

Run on KMS/DRM (no desktop):

export SDL_VIDEODRIVER=kmsdrm
./build/sdl2_cube

Or on a desktop session (X11/Wayland), just ./build/sdl2_cube. Press Escape to exit.

Checkpoint

A rainbow triangle (red/green/blue vertices with smooth gradient fill) is visible on the display.

Stuck?
  • "No available video device" — set SDL_VIDEODRIVER=kmsdrm and ensure no other application is using the DRM device
  • Black screen, no triangle — check GL Renderer output. If it says "llvmpipe" or "Software Rasterizer", the GPU driver is not working. Ensure dtoverlay=vc4-kms-v3d is in /boot/firmware/config.txt

4. Step 2 — Square from Two Triangles

Goal: Draw a square by reusing vertices with an index buffer.

The GPU can only draw triangles. A square needs two triangles — but they share two vertices. Without an index buffer, you'd duplicate those vertices. With an index buffer, you define 4 unique vertices and tell the GPU which three to use for each triangle.

Why Index Buffers?

A square has 4 corners, but two triangles need 6 vertex references:

0 ──── 1          Triangle 1: vertices 0, 1, 2
│ \    │          Triangle 2: vertices 2, 3, 0
│  \   │
│   \  │          Index buffer: [0, 1, 2, 2, 3, 0]
│    \ │
3 ──── 2          4 vertices stored, 6 indices reference them

For a cube with 8 vertices and 12 triangles (36 index entries), the savings are even larger. Index buffers also let the GPU cache transformed vertices — if index 2 appears in both triangles, the vertex shader runs only once for it.

What changes from Step 1:

  1. +1 vertex — add bottom-left corner (yellow)
  2. +index buffer — 6 indices defining two triangles
  3. glDrawArraysglDrawElements — draw using the index buffer

Replace main.c:

cat > main.c << 'MAIN_EOF'
#include <SDL2/SDL.h>
#include <GLES2/gl2.h>
#include <stdio.h>

/* ── Shader helpers ─────────────────────────────────── */

static GLuint compile_shader(GLenum type, const char *src)
{
    GLuint s = glCreateShader(type);
    glShaderSource(s, 1, &src, NULL);
    glCompileShader(s);
    GLint ok = 0;
    glGetShaderiv(s, GL_COMPILE_STATUS, &ok);
    if (!ok) {
        char log[512];
        glGetShaderInfoLog(s, sizeof(log), NULL, log);
        fprintf(stderr, "Shader error: %s\n", log);
        return 0;
    }
    return s;
}

static GLuint link_program(GLuint vs, GLuint fs)
{
    GLuint p = glCreateProgram();
    glAttachShader(p, vs);
    glAttachShader(p, fs);
    glLinkProgram(p);
    GLint ok = 0;
    glGetProgramiv(p, GL_LINK_STATUS, &ok);
    if (!ok) {
        char log[512];
        glGetProgramInfoLog(p, sizeof(log), NULL, log);
        fprintf(stderr, "Link error: %s\n", log);
        return 0;
    }
    return p;
}

/* ── Main ───────────────────────────────────────────── */

int main(int argc, char **argv)
{
    (void)argc; (void)argv;

    if (SDL_Init(SDL_INIT_VIDEO | SDL_INIT_EVENTS) != 0) {
        fprintf(stderr, "SDL_Init: %s\n", SDL_GetError());
        return 1;
    }

    SDL_GL_SetAttribute(SDL_GL_CONTEXT_PROFILE_MASK, SDL_GL_CONTEXT_PROFILE_ES);
    SDL_GL_SetAttribute(SDL_GL_CONTEXT_MAJOR_VERSION, 2);
    SDL_GL_SetAttribute(SDL_GL_CONTEXT_MINOR_VERSION, 0);
    SDL_GL_SetAttribute(SDL_GL_DOUBLEBUFFER, 1);

    int w = 800, h = 480;
    SDL_Window *win = SDL_CreateWindow("Step 2: Square",
        SDL_WINDOWPOS_CENTERED, SDL_WINDOWPOS_CENTERED,
        w, h, SDL_WINDOW_OPENGL | SDL_WINDOW_FULLSCREEN_DESKTOP);
    if (!win) {
        fprintf(stderr, "Window: %s\n", SDL_GetError());
        SDL_Quit(); return 1;
    }

    SDL_GLContext ctx = SDL_GL_CreateContext(win);
    if (!ctx) {
        fprintf(stderr, "GL context: %s\n", SDL_GetError());
        SDL_DestroyWindow(win); SDL_Quit(); return 1;
    }
    SDL_GL_SetSwapInterval(1);

    printf("GL Renderer: %s\n", glGetString(GL_RENDERER));
    printf("GL Version:  %s\n", glGetString(GL_VERSION));

    /* ── Shaders (still no matrix) ── */
    const char *vs_src =
        "attribute vec3 aPos;\n"
        "attribute vec3 aCol;\n"
        "varying vec3 vCol;\n"
        "void main() {\n"
        "    vCol = aCol;\n"
        "    gl_Position = vec4(aPos, 1.0);\n"
        "}\n";

    const char *fs_src =
        "precision mediump float;\n"
        "varying vec3 vCol;\n"
        "void main() {\n"
        "    gl_FragColor = vec4(vCol, 1.0);\n"
        "}\n";

    GLuint vs = compile_shader(GL_VERTEX_SHADER, vs_src);
    GLuint fs = compile_shader(GL_FRAGMENT_SHADER, fs_src);
    GLuint prog = link_program(vs, fs);
    glDeleteShader(vs);
    glDeleteShader(fs);
    if (!prog) return 1;

    GLint loc_pos = glGetAttribLocation(prog, "aPos");
    GLint loc_col = glGetAttribLocation(prog, "aCol");

    /* ── Square geometry: 4 vertices × (position + color) ── */
    const float verts[] = {
        /* x     y     z       r    g    b  */
        -0.5f,  0.5f, 0.0f,  1.0f, 0.0f, 0.0f,   /* top-left     — red    */
         0.5f,  0.5f, 0.0f,  0.0f, 1.0f, 0.0f,   /* top-right    — green  */
         0.5f, -0.5f, 0.0f,  0.0f, 0.0f, 1.0f,   /* bottom-right — blue   */
        -0.5f, -0.5f, 0.0f,  1.0f, 1.0f, 0.0f,   /* bottom-left  — yellow */
    };

    /* Two triangles sharing vertices 0-1-2 and 2-3-0 */
    const GLushort indices[] = {
        0, 1, 2,
        2, 3, 0,
    };

    GLuint vbo, ibo;
    glGenBuffers(1, &vbo);
    glBindBuffer(GL_ARRAY_BUFFER, vbo);
    glBufferData(GL_ARRAY_BUFFER, sizeof(verts), verts, GL_STATIC_DRAW);

    glGenBuffers(1, &ibo);
    glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ibo);
    glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(indices), indices, GL_STATIC_DRAW);

    /* ── Render loop ── */
    int running = 1;
    while (running) {
        SDL_Event e;
        while (SDL_PollEvent(&e)) {
            if (e.type == SDL_QUIT) running = 0;
            if (e.type == SDL_KEYDOWN && e.key.keysym.sym == SDLK_ESCAPE)
                running = 0;
        }

        glViewport(0, 0, w, h);
        glClearColor(0.12f, 0.12f, 0.14f, 1.0f);
        glClear(GL_COLOR_BUFFER_BIT);

        glUseProgram(prog);

        glBindBuffer(GL_ARRAY_BUFFER, vbo);
        glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ibo);
        glEnableVertexAttribArray((GLuint)loc_pos);
        glVertexAttribPointer((GLuint)loc_pos, 3, GL_FLOAT, GL_FALSE,
                              6 * sizeof(float), (void *)0);
        glEnableVertexAttribArray((GLuint)loc_col);
        glVertexAttribPointer((GLuint)loc_col, 3, GL_FLOAT, GL_FALSE,
                              6 * sizeof(float), (void *)(3 * sizeof(float)));

        glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_SHORT, (void *)0);

        SDL_GL_SwapWindow(win);
    }

    glDeleteProgram(prog);
    glDeleteBuffers(1, &vbo);
    glDeleteBuffers(1, &ibo);
    SDL_GL_DeleteContext(ctx);
    SDL_DestroyWindow(win);
    SDL_Quit();
    return 0;
}
MAIN_EOF

Rebuild and run:

cmake --build build -j$(nproc)
./build/sdl2_cube
Checkpoint

A colored square (red, green, blue, yellow corners with smooth gradient) fills the center of the screen. The diagonal seam where the two triangles meet may be faintly visible in the color gradient.


5. Step 3 — Rotating Square

Goal: Make the square spin by sending a rotation matrix to the GPU every frame.

Rotation Matrices

A 2D rotation by angle θ uses cosine and sine:

┌  cos θ   -sin θ  ┐     Rotates a point (x, y) around the origin.
│                   │
└  sin θ    cos θ   ┘

In 3D, we embed this into a 4×4 matrix. A Y-axis rotation spins the object left and right:

┌ cos θ   0   sin θ   0 ┐
│   0     1     0      0 │
│-sin θ   0   cos θ    0 │
└   0     0     0      1 ┘

OpenGL uses column-major layout, so m[0]=cos, m[2]=-sin, m[8]=sin, m[10]=cos.

Uniforms: CPU → GPU Per-Frame Data

Attributes (like aPos, aCol) vary per vertex — they come from the vertex buffer.

Uniforms (like uMVP) are constant for the entire draw call — set once by the CPU, read by every vertex shader invocation. The rotation matrix changes each frame, so we upload it as a uniform with glUniformMatrix4fv().

What changes from Step 2:

  1. +vertex shader linegl_Position = vec4(aPos, 1.0)gl_Position = uMVP * vec4(aPos, 1.0)
  2. +two matrix functionsmat4_identity and mat4_rotate_y
  3. +animation loop — compute angle from elapsed time, build matrix, upload to GPU

Replace main.c:

cat > main.c << 'MAIN_EOF'
#include <SDL2/SDL.h>
#include <GLES2/gl2.h>
#include <math.h>
#include <stdio.h>
#include <string.h>

/* ── Shader helpers ─────────────────────────────────── */

static GLuint compile_shader(GLenum type, const char *src)
{
    GLuint s = glCreateShader(type);
    glShaderSource(s, 1, &src, NULL);
    glCompileShader(s);
    GLint ok = 0;
    glGetShaderiv(s, GL_COMPILE_STATUS, &ok);
    if (!ok) {
        char log[512];
        glGetShaderInfoLog(s, sizeof(log), NULL, log);
        fprintf(stderr, "Shader error: %s\n", log);
        return 0;
    }
    return s;
}

static GLuint link_program(GLuint vs, GLuint fs)
{
    GLuint p = glCreateProgram();
    glAttachShader(p, vs);
    glAttachShader(p, fs);
    glLinkProgram(p);
    GLint ok = 0;
    glGetProgramiv(p, GL_LINK_STATUS, &ok);
    if (!ok) {
        char log[512];
        glGetProgramInfoLog(p, sizeof(log), NULL, log);
        fprintf(stderr, "Link error: %s\n", log);
        return 0;
    }
    return p;
}

/* ── Matrix math (column-major, OpenGL convention) ──── */

static void mat4_identity(float m[16])
{
    memset(m, 0, 16 * sizeof(float));
    m[0] = m[5] = m[10] = m[15] = 1.0f;
}

static void mat4_rotate_y(float m[16], float a)
{
    mat4_identity(m);
    m[0] = cosf(a);  m[8]  = sinf(a);
    m[2] = -sinf(a); m[10] = cosf(a);
}

/* ── Main ───────────────────────────────────────────── */

int main(int argc, char **argv)
{
    (void)argc; (void)argv;

    if (SDL_Init(SDL_INIT_VIDEO | SDL_INIT_EVENTS) != 0) {
        fprintf(stderr, "SDL_Init: %s\n", SDL_GetError());
        return 1;
    }

    SDL_GL_SetAttribute(SDL_GL_CONTEXT_PROFILE_MASK, SDL_GL_CONTEXT_PROFILE_ES);
    SDL_GL_SetAttribute(SDL_GL_CONTEXT_MAJOR_VERSION, 2);
    SDL_GL_SetAttribute(SDL_GL_CONTEXT_MINOR_VERSION, 0);
    SDL_GL_SetAttribute(SDL_GL_DOUBLEBUFFER, 1);

    int w = 800, h = 480;
    SDL_Window *win = SDL_CreateWindow("Step 3: Rotating Square",
        SDL_WINDOWPOS_CENTERED, SDL_WINDOWPOS_CENTERED,
        w, h, SDL_WINDOW_OPENGL | SDL_WINDOW_FULLSCREEN_DESKTOP);
    if (!win) {
        fprintf(stderr, "Window: %s\n", SDL_GetError());
        SDL_Quit(); return 1;
    }

    SDL_GLContext ctx = SDL_GL_CreateContext(win);
    if (!ctx) {
        fprintf(stderr, "GL context: %s\n", SDL_GetError());
        SDL_DestroyWindow(win); SDL_Quit(); return 1;
    }
    SDL_GL_SetSwapInterval(1);

    printf("GL Renderer: %s\n", glGetString(GL_RENDERER));
    printf("GL Version:  %s\n", glGetString(GL_VERSION));

    /* ── Shaders (NOW with uMVP matrix) ── */
    const char *vs_src =
        "attribute vec3 aPos;\n"
        "attribute vec3 aCol;\n"
        "uniform mat4 uMVP;\n"
        "varying vec3 vCol;\n"
        "void main() {\n"
        "    vCol = aCol;\n"
        "    gl_Position = uMVP * vec4(aPos, 1.0);\n"
        "}\n";

    const char *fs_src =
        "precision mediump float;\n"
        "varying vec3 vCol;\n"
        "void main() {\n"
        "    gl_FragColor = vec4(vCol, 1.0);\n"
        "}\n";

    GLuint vs = compile_shader(GL_VERTEX_SHADER, vs_src);
    GLuint fs = compile_shader(GL_FRAGMENT_SHADER, fs_src);
    GLuint prog = link_program(vs, fs);
    glDeleteShader(vs);
    glDeleteShader(fs);
    if (!prog) return 1;

    GLint loc_pos = glGetAttribLocation(prog, "aPos");
    GLint loc_col = glGetAttribLocation(prog, "aCol");
    GLint loc_mvp = glGetUniformLocation(prog, "uMVP");

    /* ── Square geometry (same as Step 2) ── */
    const float verts[] = {
        -0.5f,  0.5f, 0.0f,  1.0f, 0.0f, 0.0f,
         0.5f,  0.5f, 0.0f,  0.0f, 1.0f, 0.0f,
         0.5f, -0.5f, 0.0f,  0.0f, 0.0f, 1.0f,
        -0.5f, -0.5f, 0.0f,  1.0f, 1.0f, 0.0f,
    };

    const GLushort indices[] = {
        0, 1, 2,
        2, 3, 0,
    };

    GLuint vbo, ibo;
    glGenBuffers(1, &vbo);
    glBindBuffer(GL_ARRAY_BUFFER, vbo);
    glBufferData(GL_ARRAY_BUFFER, sizeof(verts), verts, GL_STATIC_DRAW);

    glGenBuffers(1, &ibo);
    glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ibo);
    glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(indices), indices, GL_STATIC_DRAW);

    /* ── Render loop ── */
    Uint64 t_start = SDL_GetPerformanceCounter();
    double freq = (double)SDL_GetPerformanceFrequency();

    int running = 1;
    while (running) {
        SDL_Event e;
        while (SDL_PollEvent(&e)) {
            if (e.type == SDL_QUIT) running = 0;
            if (e.type == SDL_KEYDOWN && e.key.keysym.sym == SDLK_ESCAPE)
                running = 0;
        }

        double sec = (double)(SDL_GetPerformanceCounter() - t_start) / freq;
        float angle = (float)sec;

        glViewport(0, 0, w, h);
        glClearColor(0.12f, 0.12f, 0.14f, 1.0f);
        glClear(GL_COLOR_BUFFER_BIT);

        /* Build rotation matrix */
        float MVP[16];
        mat4_rotate_y(MVP, angle);

        glUseProgram(prog);
        glUniformMatrix4fv(loc_mvp, 1, GL_FALSE, MVP);

        glBindBuffer(GL_ARRAY_BUFFER, vbo);
        glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ibo);
        glEnableVertexAttribArray((GLuint)loc_pos);
        glVertexAttribPointer((GLuint)loc_pos, 3, GL_FLOAT, GL_FALSE,
                              6 * sizeof(float), (void *)0);
        glEnableVertexAttribArray((GLuint)loc_col);
        glVertexAttribPointer((GLuint)loc_col, 3, GL_FLOAT, GL_FALSE,
                              6 * sizeof(float), (void *)(3 * sizeof(float)));

        glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_SHORT, (void *)0);

        SDL_GL_SwapWindow(win);
    }

    glDeleteProgram(prog);
    glDeleteBuffers(1, &vbo);
    glDeleteBuffers(1, &ibo);
    SDL_GL_DeleteContext(ctx);
    SDL_DestroyWindow(win);
    SDL_Quit();
    return 0;
}
MAIN_EOF

Rebuild and run:

cmake --build build -j$(nproc)
./build/sdl2_cube
Checkpoint

The square spins around the Y axis. When it rotates 90°, it appears as a thin line (you're seeing the square edge-on). This is real 3D rotation — the Y-rotation matrix compresses the X coordinates as the square turns.


6. Step 4 — 3D Rotating Cube

Goal: Turn the flat rotating square into a full 3D cube with proper perspective.

The Model-View-Projection Pipeline

3D rendering uses a chain of matrix multiplications:

Model matrix        View matrix           Projection matrix
(rotate object)  ×  (position camera)  ×  (3D → 2D perspective)

    Ry · Rx       ×      Translate       ×     Perspective
 (spin cube)         (move back -5)        (60° field of view)
                                  MVP matrix
                             gl_Position = uMVP * vec4(aPos, 1.0)

Each matrix transforms coordinates into a different "space":

  • Model spaceWorld space (rotation positions the cube in the scene)
  • World spaceView space (translation acts as camera placement)
  • View spaceClip space (perspective makes far objects smaller)

We multiply all three into one MVP matrix on the CPU and send it to the GPU as a single uniform. The vertex shader does just one multiply per vertex.

The Depth Buffer

A cube has 6 faces. When two faces overlap on screen, which one is in front? The depth buffer (also called Z-buffer) stores the depth of each pixel. Before writing a pixel, the GPU checks: "Is this pixel closer than what's already there?" If not, it's discarded.

Without glEnable(GL_DEPTH_TEST), back faces would draw over front faces depending on draw order — the cube would look broken. We also request a 24-bit depth buffer with SDL_GL_SetAttribute(SDL_GL_DEPTH_SIZE, 24) and clear it each frame with GL_DEPTH_BUFFER_BIT.

What changes from Step 3:

  1. +4 matrix functionsmat4_mul, mat4_perspective, mat4_translate, mat4_rotate_x
  2. +depth bufferSDL_GL_DEPTH_SIZE, glEnable(GL_DEPTH_TEST), clear GL_DEPTH_BUFFER_BIT
  3. 8 vertices, 36 indices — a full cube with 6 colored faces
  4. Full MVP chain — perspective × translate × rotate_y × rotate_x
  5. FPS counter — reports performance every 2 seconds

Replace main.c:

cat > main.c << 'MAIN_EOF'
#include <SDL2/SDL.h>
#include <GLES2/gl2.h>
#include <math.h>
#include <stdio.h>
#include <string.h>

/* ── Shader helpers ─────────────────────────────────── */

static GLuint compile_shader(GLenum type, const char *src)
{
    GLuint s = glCreateShader(type);
    glShaderSource(s, 1, &src, NULL);
    glCompileShader(s);
    GLint ok = 0;
    glGetShaderiv(s, GL_COMPILE_STATUS, &ok);
    if (!ok) {
        char log[512];
        glGetShaderInfoLog(s, sizeof(log), NULL, log);
        fprintf(stderr, "Shader error: %s\n", log);
        return 0;
    }
    return s;
}

static GLuint link_program(GLuint vs, GLuint fs)
{
    GLuint p = glCreateProgram();
    glAttachShader(p, vs);
    glAttachShader(p, fs);
    glLinkProgram(p);
    GLint ok = 0;
    glGetProgramiv(p, GL_LINK_STATUS, &ok);
    if (!ok) {
        char log[512];
        glGetProgramInfoLog(p, sizeof(log), NULL, log);
        fprintf(stderr, "Link error: %s\n", log);
        return 0;
    }
    return p;
}

/* ── Matrix math (column-major, OpenGL convention) ──── */

static void mat4_identity(float m[16])
{
    memset(m, 0, 16 * sizeof(float));
    m[0] = m[5] = m[10] = m[15] = 1.0f;
}

static void mat4_mul(float out[16], const float a[16], const float b[16])
{
    float r[16];
    for (int c = 0; c < 4; c++)
        for (int row = 0; row < 4; row++)
            r[c * 4 + row] = a[0 * 4 + row] * b[c * 4 + 0]
                            + a[1 * 4 + row] * b[c * 4 + 1]
                            + a[2 * 4 + row] * b[c * 4 + 2]
                            + a[3 * 4 + row] * b[c * 4 + 3];
    memcpy(out, r, sizeof(r));
}

static void mat4_perspective(float m[16], float fovy_rad,
                             float aspect, float znear, float zfar)
{
    float f = 1.0f / tanf(fovy_rad * 0.5f);
    memset(m, 0, 16 * sizeof(float));
    m[0]  = f / aspect;
    m[5]  = f;
    m[10] = (zfar + znear) / (znear - zfar);
    m[11] = -1.0f;
    m[14] = (2.0f * zfar * znear) / (znear - zfar);
}

static void mat4_translate(float m[16], float x, float y, float z)
{
    mat4_identity(m);
    m[12] = x; m[13] = y; m[14] = z;
}

static void mat4_rotate_y(float m[16], float a)
{
    mat4_identity(m);
    m[0] = cosf(a);  m[8]  = sinf(a);
    m[2] = -sinf(a); m[10] = cosf(a);
}

static void mat4_rotate_x(float m[16], float a)
{
    mat4_identity(m);
    m[5] = cosf(a);  m[9]  = -sinf(a);
    m[6] = sinf(a);  m[10] = cosf(a);
}

/* ── Main ───────────────────────────────────────────── */

int main(int argc, char **argv)
{
    (void)argc; (void)argv;

    if (SDL_Init(SDL_INIT_VIDEO | SDL_INIT_EVENTS) != 0) {
        fprintf(stderr, "SDL_Init: %s\n", SDL_GetError());
        return 1;
    }

    /* Request OpenGL ES 2.0 context */
    SDL_GL_SetAttribute(SDL_GL_CONTEXT_PROFILE_MASK, SDL_GL_CONTEXT_PROFILE_ES);
    SDL_GL_SetAttribute(SDL_GL_CONTEXT_MAJOR_VERSION, 2);
    SDL_GL_SetAttribute(SDL_GL_CONTEXT_MINOR_VERSION, 0);
    SDL_GL_SetAttribute(SDL_GL_DOUBLEBUFFER, 1);
    SDL_GL_SetAttribute(SDL_GL_DEPTH_SIZE, 24);

    int w = 800, h = 480;
    SDL_Window *win = SDL_CreateWindow("SDL2 GLES2 Cube",
        SDL_WINDOWPOS_CENTERED, SDL_WINDOWPOS_CENTERED,
        w, h, SDL_WINDOW_OPENGL | SDL_WINDOW_FULLSCREEN_DESKTOP);
    if (!win) {
        fprintf(stderr, "Window: %s\n", SDL_GetError());
        SDL_Quit(); return 1;
    }

    SDL_GLContext ctx = SDL_GL_CreateContext(win);
    if (!ctx) {
        fprintf(stderr, "GL context: %s\n", SDL_GetError());
        SDL_DestroyWindow(win); SDL_Quit(); return 1;
    }
    SDL_GL_SetSwapInterval(1);  /* VSync ON */

    printf("GL Renderer: %s\n", glGetString(GL_RENDERER));
    printf("GL Version:  %s\n", glGetString(GL_VERSION));

    /* ── Shaders ── */
    const char *vs_src =
        "attribute vec3 aPos;\n"
        "attribute vec3 aCol;\n"
        "uniform mat4 uMVP;\n"
        "varying vec3 vCol;\n"
        "void main() {\n"
        "    vCol = aCol;\n"
        "    gl_Position = uMVP * vec4(aPos, 1.0);\n"
        "}\n";

    const char *fs_src =
        "precision mediump float;\n"
        "varying vec3 vCol;\n"
        "void main() {\n"
        "    gl_FragColor = vec4(vCol, 1.0);\n"
        "}\n";

    GLuint vs = compile_shader(GL_VERTEX_SHADER, vs_src);
    GLuint fs = compile_shader(GL_FRAGMENT_SHADER, fs_src);
    GLuint prog = link_program(vs, fs);
    glDeleteShader(vs);
    glDeleteShader(fs);
    if (!prog) return 1;

    GLint loc_pos = glGetAttribLocation(prog, "aPos");
    GLint loc_col = glGetAttribLocation(prog, "aCol");
    GLint loc_mvp = glGetUniformLocation(prog, "uMVP");

    /* ── Cube geometry: 8 vertices × (position + color) ── */
    const float verts[] = {
        /* x     y     z       r    g    b  */
        -1.f, -1.f, -1.f,   1.f, 0.f, 0.f,
         1.f, -1.f, -1.f,   0.f, 1.f, 0.f,
         1.f,  1.f, -1.f,   0.f, 0.f, 1.f,
        -1.f,  1.f, -1.f,   1.f, 1.f, 0.f,
        -1.f, -1.f,  1.f,   1.f, 0.f, 1.f,
         1.f, -1.f,  1.f,   0.f, 1.f, 1.f,
         1.f,  1.f,  1.f,   1.f, 1.f, 1.f,
        -1.f,  1.f,  1.f,   0.3f, 0.3f, 0.3f,
    };

    /* 12 triangles = 36 indices */
    const GLushort indices[] = {
        0,1,2, 2,3,0,   /* back   */
        4,5,6, 6,7,4,   /* front  */
        0,4,7, 7,3,0,   /* left   */
        1,5,6, 6,2,1,   /* right  */
        3,2,6, 6,7,3,   /* top    */
        0,1,5, 5,4,0,   /* bottom */
    };

    GLuint vbo, ibo;
    glGenBuffers(1, &vbo);
    glBindBuffer(GL_ARRAY_BUFFER, vbo);
    glBufferData(GL_ARRAY_BUFFER, sizeof(verts), verts, GL_STATIC_DRAW);

    glGenBuffers(1, &ibo);
    glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ibo);
    glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(indices), indices, GL_STATIC_DRAW);

    glEnable(GL_DEPTH_TEST);

    /* ── Render loop ── */
    Uint64 t_start = SDL_GetPerformanceCounter();
    double freq = (double)SDL_GetPerformanceFrequency();
    int frames = 0;
    Uint64 t_fps = t_start;

    int running = 1;
    while (running) {
        SDL_Event e;
        while (SDL_PollEvent(&e)) {
            if (e.type == SDL_QUIT) running = 0;
            if (e.type == SDL_KEYDOWN && e.key.keysym.sym == SDLK_ESCAPE) running = 0;
            if (e.type == SDL_WINDOWEVENT &&
                e.window.event == SDL_WINDOWEVENT_SIZE_CHANGED) {
                w = e.window.data1;
                h = e.window.data2;
            }
        }

        double sec = (double)(SDL_GetPerformanceCounter() - t_start) / freq;
        float angle = (float)sec;

        glViewport(0, 0, w, h);
        glClearColor(0.12f, 0.12f, 0.14f, 1.0f);
        glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

        /* Build Model-View-Projection matrix */
        float P[16], T[16], Rx[16], Ry[16], Rxy[16], M[16], MVP[16];
        mat4_perspective(P, 60.0f * (3.14159f / 180.0f),
                         (float)w / (float)h, 0.1f, 100.0f);
        mat4_translate(T, 0.f, 0.f, -5.0f);
        mat4_rotate_y(Ry, angle);
        mat4_rotate_x(Rx, angle * 0.7f);
        mat4_mul(Rxy, Ry, Rx);
        mat4_mul(M, T, Rxy);
        mat4_mul(MVP, P, M);

        glUseProgram(prog);
        glUniformMatrix4fv(loc_mvp, 1, GL_FALSE, MVP);

        glBindBuffer(GL_ARRAY_BUFFER, vbo);
        glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ibo);
        glEnableVertexAttribArray((GLuint)loc_pos);
        glVertexAttribPointer((GLuint)loc_pos, 3, GL_FLOAT, GL_FALSE,
                              6 * sizeof(float), (void *)0);
        glEnableVertexAttribArray((GLuint)loc_col);
        glVertexAttribPointer((GLuint)loc_col, 3, GL_FLOAT, GL_FALSE,
                              6 * sizeof(float), (void *)(3 * sizeof(float)));

        glDrawElements(GL_TRIANGLES, 36, GL_UNSIGNED_SHORT, (void *)0);

        SDL_GL_SwapWindow(win);

        /* FPS reporting */
        frames++;
        Uint64 now = SDL_GetPerformanceCounter();
        double dt = (double)(now - t_fps) / freq;
        if (dt >= 2.0) {
            printf("FPS: %.1f  (%.2f ms/frame)\n",
                   frames / dt, dt / frames * 1000.0);
            frames = 0;
            t_fps = now;
        }
    }

    glDeleteProgram(prog);
    glDeleteBuffers(1, &vbo);
    glDeleteBuffers(1, &ibo);
    SDL_GL_DeleteContext(ctx);
    SDL_DestroyWindow(win);
    SDL_Quit();
    return 0;
}
MAIN_EOF

Rebuild and run:

cmake --build build -j$(nproc)
./build/sdl2_cube
Checkpoint

A rotating rainbow-colored cube is visible on the display. It spins on two axes, with proper depth — near faces cover far faces. The terminal prints FPS approximately every 2 seconds.

Stuck?
  • Black screen, no cube — check GL Renderer output. If it says "llvmpipe" or "Software Rasterizer", the GPU driver is not working. Ensure dtoverlay=vc4-kms-v3d is in /boot/firmware/config.txt
  • "EGL: No matching config" — try reducing depth size: add SDL_GL_SetAttribute(SDL_GL_DEPTH_SIZE, 16); before creating the window

7. VSync and Tearing

Concept: VSync locks frame presentation to the display refresh. Disabling it lets you measure raw GPU throughput but causes visible tearing.

Try disabling VSync by changing one line in main.c:

SDL_GL_SetSwapInterval(0);  /* VSync OFF — max speed */

Rebuild and run. Observe:

  • FPS jumps — without VSync, frames render as fast as the GPU allows (likely 200+ FPS for a simple cube)
  • Tearing — horizontal tear lines appear because the display reads mid-frame
VSync Setting Expected FPS Tearing CPU Usage
SetSwapInterval(1) ~60 None Low (idle between frames)
SetSwapInterval(0) _ _ _

Fill in the table with your measurements.

The Three Swap Interval Values

SDL_GL_SetSwapInterval(1);   /* VSync ON — swap at next VBlank (60 FPS cap) */
SDL_GL_SetSwapInterval(0);   /* VSync OFF — swap immediately (max FPS, tearing) */
SDL_GL_SetSwapInterval(-1);  /* Adaptive VSync — VSync on if fast enough,
                                 tear if a frame misses the deadline.
                                 Falls back to 0 if not supported. */

When to Use Each

Setting Use case Trade-off
1 (VSync on) Production display apps — dashboards, HMIs, games. Always tear-free. Input latency up to 16.7 ms (one frame). GPU idles between frames.
0 (VSync off) Benchmarking — measure raw GPU throughput. Headless rendering — render offscreen as fast as possible. Latency-critical — VR, fast-response control displays where tearing is acceptable. Visible tearing on screen. CPU burns at 100% in the render loop (no idle wait).
-1 (adaptive) Games with variable load — stays tear-free when GPU keeps up, drops to tearing only on heavy frames instead of halving to 30 FPS. Not supported on all drivers. Check return value: if SDL_GL_SetSwapInterval(-1) returns -1, fall back to 1.
/* Adaptive VSync with fallback */
if (SDL_GL_SetSwapInterval(-1) == -1) {
    printf("Adaptive VSync not supported, using standard VSync\n");
    SDL_GL_SetSwapInterval(1);
}
Checkpoint

You can toggle VSync on/off and observe the difference in FPS and tearing. Restore SetSwapInterval(1) when done.


8. Performance Measurement

Concept: On embedded hardware, GPU utilization matters as much as frame rate. A 60 FPS app that uses 95% of the GPU leaves no headroom for complexity.

GPU Utilization

Knowing GPU utilization tells you how much headroom you have. If the cube uses 5% of the GPU at 60 FPS, you can add complex shaders, more geometry, or post-processing effects. If it uses 90%, you're near the limit — any added complexity will drop frames.

# While the cube is running, open another SSH session:

# Method 1: V3D driver debug (most accurate on Pi 4)
sudo cat /sys/kernel/debug/dri/0/v3d_usage

# Method 2: Query the GPU clock speed (higher = more load)
vcgencmd measure_clock v3d
# → frequency(29)=500000000  (500 MHz = busy)
# → frequency(29)=250000000  (250 MHz = idle, clock scaled down)

# Method 3: Overall system view
mpstat 1
# Low %sys + low %usr with VSync on = GPU doing the work, CPU idle
Why GPU Utilization Matters on Embedded

On a desktop, the GPU has massive headroom — a cube uses <0.1%. On the Pi 4's VideoCore VI, a simple cube might use 2–5%, but add textures, lighting, particles, and post-processing effects, and you can reach 80%+ quickly. Monitoring GPU utilization during development catches performance problems before they become frame drops in production.

Frame Timing

The app already prints FPS. For more detailed per-frame measurement, add timing around the render + swap:

Uint64 frame_start = SDL_GetPerformanceCounter();
/* ... render ... */
SDL_GL_SwapWindow(win);
Uint64 frame_end = SDL_GetPerformanceCounter();
double frame_ms = (double)(frame_end - frame_start) / freq * 1000.0;
printf("Frame: %.2f ms (render + vsync wait)\n", frame_ms);

With VSync on, frame_ms will be ~16.7 ms (includes the VSync wait). To measure render time only (how long the GPU takes), time from render start to just before SDL_GL_SwapWindow:

Uint64 render_start = SDL_GetPerformanceCounter();
/* ... glClear, glDrawElements ... */
glFinish();  /* Force GPU to complete before measuring */
Uint64 render_end = SDL_GetPerformanceCounter();
double render_ms = (double)(render_end - render_start) / freq * 1000.0;
SDL_GL_SwapWindow(win);
printf("Render: %.2f ms  (budget: 16.7 ms for 60 FPS)\n", render_ms);
glFinish() Is for Measurement Only

glFinish() blocks the CPU until the GPU completes all pending work. This defeats the purpose of GPU parallelism — in production, never call it. Here we use it only to get accurate render timing. Remove it after measuring.

Fill In Your Measurements

Metric Value
GL Renderer _
Display resolution _
FPS (VSync on) _
FPS (VSync off) _
Frame time (VSync on, ms) _
GPU utilization (%) _
Checkpoint

Your measurement table is filled in. With VSync on, FPS should be ~60 and frame time ~16.7 ms.


What Just Happened?

You built a complete GPU-accelerated 3D application from scratch on an embedded Linux device, one step at a time:

Step 1: Triangle  → shaders + VBO + glDrawArrays
Step 2: Square    → index buffer + glDrawElements
Step 3: Spinning  → rotation matrix + uniform
Step 4: 3D Cube   → full MVP + depth buffer + 8 vertices

The rendering path is:

main.c render loop
    → OpenGL ES draw call → GPU renders into back buffer
    → SDL_GL_SwapWindow() → waits for VSync
    → DRM page flip → display scans out the new buffer

This is the same path used by car dashboards, industrial HMIs, and embedded games. The cube is trivial, but the pipeline — SDL2 + GLES2 + KMS/DRM + VSync — is production-grade.


Challenges

Challenge 1: Touch Rotation

Add touch/mouse input to control the cube's rotation instead of auto-rotating. Handle SDL_FINGERMOTION events and map finger delta X/Y to rotation angles. This prepares you for the IMU Controller tutorial.

Challenge 2: Add a Second Cube

Draw two cubes side by side, each with a different rotation speed. This requires a second mat4_translate offset and a second draw call. How does this affect FPS?

Challenge 3: Run on Different Displays

If you have both HDMI and a DSI display, run the cube on each and compare FPS. Both should be identical (same GPU path). What happens if you try to run on the SPI display?


Deliverable

  • [ ] Built the cube incrementally through all 4 steps (triangle → square → rotating → 3D)
  • [ ] Rotating cube running at 60 FPS on the Pi display
  • [ ] FPS and frame time measurements recorded
  • [ ] VSync on/off comparison table filled in
  • [ ] Brief note: one sentence explaining why SDL_GL_SwapWindow blocks when VSync is enabled

Course Overview | Next: Touch Paint →