SDL2 + OpenGL ES Rotating Cube
Time estimate: ~45 minutes Prerequisites: SSH Login, DRM/KMS Test Pattern
Learning Objectives
By the end of this tutorial you will be able to:
- Build and run an OpenGL ES 2.0 application with SDL2 on the Raspberry Pi
- Explain the GPU rendering pipeline: vertex shader, fragment shader, and draw call
- Incrementally build a 3D scene — from a static triangle to a rotating cube
- Use SDL2's KMS/DRM backend for tear-free fullscreen rendering without a compositor
- Measure frame rate and GPU utilization on embedded hardware
GPU Rendering Pipeline and VSync
A rotating 3D cube exercises every layer of the embedded graphics stack. SDL2 opens a DRM/KMS display, OpenGL ES 2.0 compiles vertex and fragment shaders on the GPU, and geometry is rendered into a back buffer. SDL_GL_SwapWindow() triggers a DRM page flip synchronised to VBlank (VSync), preventing tearing by ensuring the display controller switches buffers only during the vertical blanking interval. With VSync enabled, the render loop naturally caps at the display refresh rate (typically 60 Hz); disabling it lets you measure raw GPU throughput but causes visible horizontal tear lines. The three-layer driver stack — kernel DRM driver (vc4), Mesa userspace driver (v3d_dri.so), and EGL platform — translates OpenGL API calls into GPU command streams and manages buffer allocation.
See also: Graphics Stack reference | Real-Time Graphics reference
Course Source Repository
This tutorial references source files from the course repository. If you haven't cloned it yet on your Pi:
Source files for this tutorial are in ~/embedded-linux/solutions/sdl2-rotating-cube/.
Introduction
A rotating 3D cube is the "hello world" of GPU programming. It exercises every layer of the embedded graphics stack: SDL2 creates a window on the DRM/KMS display, OpenGL ES 2.0 compiles shaders and renders geometry on the GPU, and VSync synchronizes frame presentation with the display refresh.
Instead of dumping all the code at once, this tutorial builds the cube step by step — each step adds one concept with a visible result:
| Step | What You Build | New Concept |
|---|---|---|
| 1 | Static triangle | SDL2 window, shaders, VBO, glDrawArrays |
| 2 | Colored square | Index buffer (IBO), glDrawElements |
| 3 | Rotating square | Matrix math, uniforms, animation |
| 4 | 3D rotating cube | Perspective projection, depth buffer, full MVP chain |
SDL2 Rendering: With and Without a GPU
SDL2 can render in three modes — understanding when the GPU is involved (and when it isn't) is essential:
1. Software renderer (CPU only — no GPU needed)
Your code → SDL_RenderDrawRect() → SDL software rasterizer → pixel buffer in RAM
→ SDL copies buffer to DRM framebuffer → display controller scans out
The CPU does all the drawing. Every SDL_RenderDrawLine(), SDL_RenderCopy(), etc. writes pixels in a CPU-side buffer. This works on any Linux system with a framebuffer — no GPU driver, no Mesa, no OpenGL. Use SDL_CreateRenderer(win, -1, SDL_RENDERER_SOFTWARE) to force this.
2. Hardware-accelerated 2D renderer (GPU assists)
Your code → SDL_RenderDrawRect() → SDL translates to OpenGL/Vulkan calls
→ Mesa GPU driver → GPU draws into render target → DRM page flip
Same SDL2 API, but the backend uses OpenGL internally. SDL_RenderCopy() becomes a GPU texture blit — much faster for image scaling, rotation, and alpha blending. Use SDL_CreateRenderer(win, -1, SDL_RENDERER_ACCELERATED). This is what the Level Display tutorial uses.
3. Direct OpenGL ES (full GPU control — this tutorial)
Your code → glDrawArrays() → Mesa compiles shaders → GPU executes pipeline
→ GPU writes to render target → SDL_GL_SwapWindow() → DRM page flip
You bypass SDL2's renderer entirely. You write shaders (vertex + fragment programs) in GLSL, upload geometry to GPU memory, and issue draw calls. The GPU runs your shaders on its massively parallel cores. This is what you need for 3D, custom visual effects, or maximum performance.
When does each make sense?
| Mode | CPU load | GPU load | Use case |
|---|---|---|---|
| Software | High (draws every pixel) | None | Simple UI, no GPU driver, SPI displays |
| Accelerated 2D | Low (sends textures to GPU) | Low | Dashboards, image display, 2D games |
| Direct OpenGL ES | Minimal (uploads geometry) | High (runs shaders) | 3D, particles, data visualization |
Can SDL2 work without a GPU? Yes — the software renderer works everywhere. But on the Pi 4, the VideoCore VI GPU is sitting idle when you use it. The hardware-accelerated path (SDL_RENDERER_ACCELERATED) uses the GPU for 2D operations automatically. For 3D, you need OpenGL ES shaders (this tutorial).
What Is a Shader, and Why Does the GPU Need One?
A CPU is designed for complex logic: branch prediction, out-of-order execution, deep caches — great for running your application, but it processes one thing at a time (per core).
A GPU is designed for simple math on massive data: thousands of tiny cores that all execute the same program on different data simultaneously. A GPU core is much simpler than a CPU core — no branch predictor, small cache — but there are hundreds of them.
A shader is that "same program" the GPU cores run. You write it in GLSL (OpenGL Shading Language), and the GPU driver (Mesa) compiles it to GPU machine code at runtime:
Your GLSL source code (text)
│
▼
Mesa shader compiler (CPU)
→ compiles to GPU machine code (binary)
│
▼
GPU loads compiled shader
→ runs it on hundreds of cores simultaneously
→ each core processes one vertex (vertex shader)
or one pixel (fragment shader) independently
Two shader types in OpenGL ES 2.0:
| Shader | Runs per... | Job | Input | Output |
|---|---|---|---|---|
| Vertex shader | vertex (corner) | Position the vertex on screen | 3D coordinates, transform matrix | 2D screen position |
| Fragment shader | pixel (fragment) | Determine the pixel color | Interpolated values from vertices | RGBA color |
Between them, the rasterizer (fixed GPU hardware, not programmable) determines which pixels are covered by each triangle and interpolates vertex data across them.
Why not just use the CPU? A 1080p frame has ~2 million pixels. The fragment shader runs once per pixel per frame, at 60 FPS = 124 million shader executions per second. The Pi 4's GPU (VideoCore VI) has enough parallel cores to handle this; a single CPU core cannot.
1. Install Dependencies
Concept: SDL2 provides the window and input layer. OpenGL ES 2.0 provides the GPU rendering API. Both are available as system packages.
Verify the OpenGL ES library exists:
ls /usr/lib/arm-linux-gnueabihf/libGLESv2.so* 2>/dev/null || \
ls /usr/lib/aarch64-linux-gnu/libGLESv2.so* 2>/dev/null
The GPU Driver Stack on the Pi
Three layers make GPU rendering work on the Pi 4:
- Kernel DRM driver (
vc4) — manages the GPU hardware, display timing, and buffer allocation. Loaded bydtoverlay=vc4-kms-v3dinconfig.txt. Source:drivers/gpu/drm/vc4/anddrivers/gpu/drm/v3d/. - Mesa userspace driver (
v3d_dri.so) — translates OpenGL ES API calls into GPU command streams. Thelibgles2-mesa-devpackage provides the headers; the runtime library (libGLESv2.so) comes from Mesa. - EGL platform — connects OpenGL ES to the display. On KMS/DRM, this is
mesa-eglusing theegl_dri2platform, which opens/dev/dri/renderD128for GPU rendering and/dev/dri/card0for display output.
SDL2's kmsdrm video backend ties these together: it opens the DRM device, creates a GBM (Generic Buffer Manager) surface, and binds an EGL context to it. When you call SDL_GL_SwapWindow(), EGL signals the DRM subsystem to page-flip the rendered buffer to the display at VSync.
For custom images: You need the kernel with CONFIG_DRM_VC4=y and CONFIG_DRM_V3D=y, the Mesa userspace with v3d Gallium driver, and the vc4-kms-v3d overlay enabled. In Buildroot, enable BR2_PACKAGE_MESA3D with the v3d and vc4 Gallium drivers.
Checkpoint
cmake --version and pkg-config --cflags sdl2 both return valid output.
2. Project Setup
Concept: A minimal SDL2 + GLES2 project needs only two files: a CMake build file and the C source.
Create the project directory:
CMakeLists.txt
cat > CMakeLists.txt << 'EOF'
cmake_minimum_required(VERSION 3.16)
project(sdl2_cube C)
set(CMAKE_C_STANDARD 11)
find_package(SDL2 REQUIRED)
add_executable(sdl2_cube main.c)
target_include_directories(sdl2_cube PRIVATE ${SDL2_INCLUDE_DIRS})
target_link_libraries(sdl2_cube PRIVATE ${SDL2_LIBRARIES} GLESv2 m)
EOF
Checkpoint
You have CMakeLists.txt in ~/sdl2-cube/. The source file main.c will be created in the next step.
3. Step 1 — Static Triangle
Goal: Get pixels on screen with the absolute minimum OpenGL ES code.
You will write: an SDL2 window, a vertex shader, a fragment shader, one vertex buffer, and a single glDrawArrays call.
What Is a Shader?
A shader is a small program that runs on the GPU. OpenGL ES 2.0 requires two:
Vertex Data (3 corners)
│
▼
Vertex Shader (runs per vertex)
→ Positions each corner on screen
→ Passes color to the next stage
│
▼
Rasterizer (GPU hardware)
→ Fills in pixels between vertices
→ Interpolates colors smoothly
│
▼
Fragment Shader (runs per pixel)
→ Outputs the final color
│
▼
Framebuffer → Display
You write the vertex and fragment shaders in GLSL (OpenGL Shading Language). The rasterizer is fixed hardware — you don't program it.
Clip-Space Coordinates
Without a projection matrix, the vertex shader outputs clip-space coordinates:
- X: −1 (left edge) to +1 (right edge)
- Y: −1 (bottom) to +1 (top)
- Z: −1 to +1 (depth, ignored for now)
A triangle at (0, 0.5), (−0.5, −0.5), (0.5, −0.5) sits centered in the window. No matrix math needed — coordinates map directly to the screen.
Create main.c:
cat > main.c << 'MAIN_EOF'
#include <SDL2/SDL.h>
#include <GLES2/gl2.h>
#include <stdio.h>
/* ── Shader helpers ─────────────────────────────────── */
static GLuint compile_shader(GLenum type, const char *src)
{
GLuint s = glCreateShader(type);
glShaderSource(s, 1, &src, NULL);
glCompileShader(s);
GLint ok = 0;
glGetShaderiv(s, GL_COMPILE_STATUS, &ok);
if (!ok) {
char log[512];
glGetShaderInfoLog(s, sizeof(log), NULL, log);
fprintf(stderr, "Shader error: %s\n", log);
return 0;
}
return s;
}
static GLuint link_program(GLuint vs, GLuint fs)
{
GLuint p = glCreateProgram();
glAttachShader(p, vs);
glAttachShader(p, fs);
glLinkProgram(p);
GLint ok = 0;
glGetProgramiv(p, GL_LINK_STATUS, &ok);
if (!ok) {
char log[512];
glGetProgramInfoLog(p, sizeof(log), NULL, log);
fprintf(stderr, "Link error: %s\n", log);
return 0;
}
return p;
}
/* ── Main ───────────────────────────────────────────── */
int main(int argc, char **argv)
{
(void)argc; (void)argv;
if (SDL_Init(SDL_INIT_VIDEO | SDL_INIT_EVENTS) != 0) {
fprintf(stderr, "SDL_Init: %s\n", SDL_GetError());
return 1;
}
/* Request OpenGL ES 2.0 context */
SDL_GL_SetAttribute(SDL_GL_CONTEXT_PROFILE_MASK, SDL_GL_CONTEXT_PROFILE_ES);
SDL_GL_SetAttribute(SDL_GL_CONTEXT_MAJOR_VERSION, 2);
SDL_GL_SetAttribute(SDL_GL_CONTEXT_MINOR_VERSION, 0);
SDL_GL_SetAttribute(SDL_GL_DOUBLEBUFFER, 1);
int w = 800, h = 480;
SDL_Window *win = SDL_CreateWindow("Step 1: Triangle",
SDL_WINDOWPOS_CENTERED, SDL_WINDOWPOS_CENTERED,
w, h, SDL_WINDOW_OPENGL | SDL_WINDOW_FULLSCREEN_DESKTOP);
if (!win) {
fprintf(stderr, "Window: %s\n", SDL_GetError());
SDL_Quit(); return 1;
}
SDL_GLContext ctx = SDL_GL_CreateContext(win);
if (!ctx) {
fprintf(stderr, "GL context: %s\n", SDL_GetError());
SDL_DestroyWindow(win); SDL_Quit(); return 1;
}
SDL_GL_SetSwapInterval(1); /* VSync ON */
printf("GL Renderer: %s\n", glGetString(GL_RENDERER));
printf("GL Version: %s\n", glGetString(GL_VERSION));
/* ── Shaders (no matrix — clip-space coordinates) ── */
const char *vs_src =
"attribute vec3 aPos;\n"
"attribute vec3 aCol;\n"
"varying vec3 vCol;\n"
"void main() {\n"
" vCol = aCol;\n"
" gl_Position = vec4(aPos, 1.0);\n"
"}\n";
const char *fs_src =
"precision mediump float;\n"
"varying vec3 vCol;\n"
"void main() {\n"
" gl_FragColor = vec4(vCol, 1.0);\n"
"}\n";
GLuint vs = compile_shader(GL_VERTEX_SHADER, vs_src);
GLuint fs = compile_shader(GL_FRAGMENT_SHADER, fs_src);
GLuint prog = link_program(vs, fs);
glDeleteShader(vs);
glDeleteShader(fs);
if (!prog) return 1;
GLint loc_pos = glGetAttribLocation(prog, "aPos");
GLint loc_col = glGetAttribLocation(prog, "aCol");
/* ── Triangle geometry: 3 vertices × (position + color) ── */
const float verts[] = {
/* x y z r g b */
0.0f, 0.5f, 0.0f, 1.0f, 0.0f, 0.0f, /* top — red */
-0.5f, -0.5f, 0.0f, 0.0f, 1.0f, 0.0f, /* left — green */
0.5f, -0.5f, 0.0f, 0.0f, 0.0f, 1.0f, /* right — blue */
};
GLuint vbo;
glGenBuffers(1, &vbo);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBufferData(GL_ARRAY_BUFFER, sizeof(verts), verts, GL_STATIC_DRAW);
/* ── Render loop ── */
int running = 1;
while (running) {
SDL_Event e;
while (SDL_PollEvent(&e)) {
if (e.type == SDL_QUIT) running = 0;
if (e.type == SDL_KEYDOWN && e.key.keysym.sym == SDLK_ESCAPE)
running = 0;
}
glViewport(0, 0, w, h);
glClearColor(0.12f, 0.12f, 0.14f, 1.0f);
glClear(GL_COLOR_BUFFER_BIT);
glUseProgram(prog);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glEnableVertexAttribArray((GLuint)loc_pos);
glVertexAttribPointer((GLuint)loc_pos, 3, GL_FLOAT, GL_FALSE,
6 * sizeof(float), (void *)0);
glEnableVertexAttribArray((GLuint)loc_col);
glVertexAttribPointer((GLuint)loc_col, 3, GL_FLOAT, GL_FALSE,
6 * sizeof(float), (void *)(3 * sizeof(float)));
glDrawArrays(GL_TRIANGLES, 0, 3);
SDL_GL_SwapWindow(win);
}
glDeleteProgram(prog);
glDeleteBuffers(1, &vbo);
SDL_GL_DeleteContext(ctx);
SDL_DestroyWindow(win);
SDL_Quit();
return 0;
}
MAIN_EOF
Build and Run
Run on KMS/DRM (no desktop):
Or on a desktop session (X11/Wayland), just ./build/sdl2_cube. Press Escape to exit.
Checkpoint
A rainbow triangle (red/green/blue vertices with smooth gradient fill) is visible on the display.
Stuck?
- "No available video device" — set
SDL_VIDEODRIVER=kmsdrmand ensure no other application is using the DRM device - Black screen, no triangle — check
GL Rendereroutput. If it says "llvmpipe" or "Software Rasterizer", the GPU driver is not working. Ensuredtoverlay=vc4-kms-v3dis in/boot/firmware/config.txt
4. Step 2 — Square from Two Triangles
Goal: Draw a square by reusing vertices with an index buffer.
The GPU can only draw triangles. A square needs two triangles — but they share two vertices. Without an index buffer, you'd duplicate those vertices. With an index buffer, you define 4 unique vertices and tell the GPU which three to use for each triangle.
Why Index Buffers?
A square has 4 corners, but two triangles need 6 vertex references:
0 ──── 1 Triangle 1: vertices 0, 1, 2
│ \ │ Triangle 2: vertices 2, 3, 0
│ \ │
│ \ │ Index buffer: [0, 1, 2, 2, 3, 0]
│ \ │
3 ──── 2 4 vertices stored, 6 indices reference them
For a cube with 8 vertices and 12 triangles (36 index entries), the savings are even larger. Index buffers also let the GPU cache transformed vertices — if index 2 appears in both triangles, the vertex shader runs only once for it.
What changes from Step 1:
- +1 vertex — add bottom-left corner (yellow)
- +index buffer — 6 indices defining two triangles
glDrawArrays→glDrawElements— draw using the index buffer
Replace main.c:
cat > main.c << 'MAIN_EOF'
#include <SDL2/SDL.h>
#include <GLES2/gl2.h>
#include <stdio.h>
/* ── Shader helpers ─────────────────────────────────── */
static GLuint compile_shader(GLenum type, const char *src)
{
GLuint s = glCreateShader(type);
glShaderSource(s, 1, &src, NULL);
glCompileShader(s);
GLint ok = 0;
glGetShaderiv(s, GL_COMPILE_STATUS, &ok);
if (!ok) {
char log[512];
glGetShaderInfoLog(s, sizeof(log), NULL, log);
fprintf(stderr, "Shader error: %s\n", log);
return 0;
}
return s;
}
static GLuint link_program(GLuint vs, GLuint fs)
{
GLuint p = glCreateProgram();
glAttachShader(p, vs);
glAttachShader(p, fs);
glLinkProgram(p);
GLint ok = 0;
glGetProgramiv(p, GL_LINK_STATUS, &ok);
if (!ok) {
char log[512];
glGetProgramInfoLog(p, sizeof(log), NULL, log);
fprintf(stderr, "Link error: %s\n", log);
return 0;
}
return p;
}
/* ── Main ───────────────────────────────────────────── */
int main(int argc, char **argv)
{
(void)argc; (void)argv;
if (SDL_Init(SDL_INIT_VIDEO | SDL_INIT_EVENTS) != 0) {
fprintf(stderr, "SDL_Init: %s\n", SDL_GetError());
return 1;
}
SDL_GL_SetAttribute(SDL_GL_CONTEXT_PROFILE_MASK, SDL_GL_CONTEXT_PROFILE_ES);
SDL_GL_SetAttribute(SDL_GL_CONTEXT_MAJOR_VERSION, 2);
SDL_GL_SetAttribute(SDL_GL_CONTEXT_MINOR_VERSION, 0);
SDL_GL_SetAttribute(SDL_GL_DOUBLEBUFFER, 1);
int w = 800, h = 480;
SDL_Window *win = SDL_CreateWindow("Step 2: Square",
SDL_WINDOWPOS_CENTERED, SDL_WINDOWPOS_CENTERED,
w, h, SDL_WINDOW_OPENGL | SDL_WINDOW_FULLSCREEN_DESKTOP);
if (!win) {
fprintf(stderr, "Window: %s\n", SDL_GetError());
SDL_Quit(); return 1;
}
SDL_GLContext ctx = SDL_GL_CreateContext(win);
if (!ctx) {
fprintf(stderr, "GL context: %s\n", SDL_GetError());
SDL_DestroyWindow(win); SDL_Quit(); return 1;
}
SDL_GL_SetSwapInterval(1);
printf("GL Renderer: %s\n", glGetString(GL_RENDERER));
printf("GL Version: %s\n", glGetString(GL_VERSION));
/* ── Shaders (still no matrix) ── */
const char *vs_src =
"attribute vec3 aPos;\n"
"attribute vec3 aCol;\n"
"varying vec3 vCol;\n"
"void main() {\n"
" vCol = aCol;\n"
" gl_Position = vec4(aPos, 1.0);\n"
"}\n";
const char *fs_src =
"precision mediump float;\n"
"varying vec3 vCol;\n"
"void main() {\n"
" gl_FragColor = vec4(vCol, 1.0);\n"
"}\n";
GLuint vs = compile_shader(GL_VERTEX_SHADER, vs_src);
GLuint fs = compile_shader(GL_FRAGMENT_SHADER, fs_src);
GLuint prog = link_program(vs, fs);
glDeleteShader(vs);
glDeleteShader(fs);
if (!prog) return 1;
GLint loc_pos = glGetAttribLocation(prog, "aPos");
GLint loc_col = glGetAttribLocation(prog, "aCol");
/* ── Square geometry: 4 vertices × (position + color) ── */
const float verts[] = {
/* x y z r g b */
-0.5f, 0.5f, 0.0f, 1.0f, 0.0f, 0.0f, /* top-left — red */
0.5f, 0.5f, 0.0f, 0.0f, 1.0f, 0.0f, /* top-right — green */
0.5f, -0.5f, 0.0f, 0.0f, 0.0f, 1.0f, /* bottom-right — blue */
-0.5f, -0.5f, 0.0f, 1.0f, 1.0f, 0.0f, /* bottom-left — yellow */
};
/* Two triangles sharing vertices 0-1-2 and 2-3-0 */
const GLushort indices[] = {
0, 1, 2,
2, 3, 0,
};
GLuint vbo, ibo;
glGenBuffers(1, &vbo);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBufferData(GL_ARRAY_BUFFER, sizeof(verts), verts, GL_STATIC_DRAW);
glGenBuffers(1, &ibo);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ibo);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(indices), indices, GL_STATIC_DRAW);
/* ── Render loop ── */
int running = 1;
while (running) {
SDL_Event e;
while (SDL_PollEvent(&e)) {
if (e.type == SDL_QUIT) running = 0;
if (e.type == SDL_KEYDOWN && e.key.keysym.sym == SDLK_ESCAPE)
running = 0;
}
glViewport(0, 0, w, h);
glClearColor(0.12f, 0.12f, 0.14f, 1.0f);
glClear(GL_COLOR_BUFFER_BIT);
glUseProgram(prog);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ibo);
glEnableVertexAttribArray((GLuint)loc_pos);
glVertexAttribPointer((GLuint)loc_pos, 3, GL_FLOAT, GL_FALSE,
6 * sizeof(float), (void *)0);
glEnableVertexAttribArray((GLuint)loc_col);
glVertexAttribPointer((GLuint)loc_col, 3, GL_FLOAT, GL_FALSE,
6 * sizeof(float), (void *)(3 * sizeof(float)));
glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_SHORT, (void *)0);
SDL_GL_SwapWindow(win);
}
glDeleteProgram(prog);
glDeleteBuffers(1, &vbo);
glDeleteBuffers(1, &ibo);
SDL_GL_DeleteContext(ctx);
SDL_DestroyWindow(win);
SDL_Quit();
return 0;
}
MAIN_EOF
Rebuild and run:
Checkpoint
A colored square (red, green, blue, yellow corners with smooth gradient) fills the center of the screen. The diagonal seam where the two triangles meet may be faintly visible in the color gradient.
5. Step 3 — Rotating Square
Goal: Make the square spin by sending a rotation matrix to the GPU every frame.
Rotation Matrices
A 2D rotation by angle θ uses cosine and sine:
In 3D, we embed this into a 4×4 matrix. A Y-axis rotation spins the object left and right:
OpenGL uses column-major layout, so m[0]=cos, m[2]=-sin, m[8]=sin, m[10]=cos.
Uniforms: CPU → GPU Per-Frame Data
Attributes (like aPos, aCol) vary per vertex — they come from the vertex buffer.
Uniforms (like uMVP) are constant for the entire draw call — set once by the CPU, read by every vertex shader invocation. The rotation matrix changes each frame, so we upload it as a uniform with glUniformMatrix4fv().
What changes from Step 2:
- +vertex shader line —
gl_Position = vec4(aPos, 1.0)→gl_Position = uMVP * vec4(aPos, 1.0) - +two matrix functions —
mat4_identityandmat4_rotate_y - +animation loop — compute angle from elapsed time, build matrix, upload to GPU
Replace main.c:
cat > main.c << 'MAIN_EOF'
#include <SDL2/SDL.h>
#include <GLES2/gl2.h>
#include <math.h>
#include <stdio.h>
#include <string.h>
/* ── Shader helpers ─────────────────────────────────── */
static GLuint compile_shader(GLenum type, const char *src)
{
GLuint s = glCreateShader(type);
glShaderSource(s, 1, &src, NULL);
glCompileShader(s);
GLint ok = 0;
glGetShaderiv(s, GL_COMPILE_STATUS, &ok);
if (!ok) {
char log[512];
glGetShaderInfoLog(s, sizeof(log), NULL, log);
fprintf(stderr, "Shader error: %s\n", log);
return 0;
}
return s;
}
static GLuint link_program(GLuint vs, GLuint fs)
{
GLuint p = glCreateProgram();
glAttachShader(p, vs);
glAttachShader(p, fs);
glLinkProgram(p);
GLint ok = 0;
glGetProgramiv(p, GL_LINK_STATUS, &ok);
if (!ok) {
char log[512];
glGetProgramInfoLog(p, sizeof(log), NULL, log);
fprintf(stderr, "Link error: %s\n", log);
return 0;
}
return p;
}
/* ── Matrix math (column-major, OpenGL convention) ──── */
static void mat4_identity(float m[16])
{
memset(m, 0, 16 * sizeof(float));
m[0] = m[5] = m[10] = m[15] = 1.0f;
}
static void mat4_rotate_y(float m[16], float a)
{
mat4_identity(m);
m[0] = cosf(a); m[8] = sinf(a);
m[2] = -sinf(a); m[10] = cosf(a);
}
/* ── Main ───────────────────────────────────────────── */
int main(int argc, char **argv)
{
(void)argc; (void)argv;
if (SDL_Init(SDL_INIT_VIDEO | SDL_INIT_EVENTS) != 0) {
fprintf(stderr, "SDL_Init: %s\n", SDL_GetError());
return 1;
}
SDL_GL_SetAttribute(SDL_GL_CONTEXT_PROFILE_MASK, SDL_GL_CONTEXT_PROFILE_ES);
SDL_GL_SetAttribute(SDL_GL_CONTEXT_MAJOR_VERSION, 2);
SDL_GL_SetAttribute(SDL_GL_CONTEXT_MINOR_VERSION, 0);
SDL_GL_SetAttribute(SDL_GL_DOUBLEBUFFER, 1);
int w = 800, h = 480;
SDL_Window *win = SDL_CreateWindow("Step 3: Rotating Square",
SDL_WINDOWPOS_CENTERED, SDL_WINDOWPOS_CENTERED,
w, h, SDL_WINDOW_OPENGL | SDL_WINDOW_FULLSCREEN_DESKTOP);
if (!win) {
fprintf(stderr, "Window: %s\n", SDL_GetError());
SDL_Quit(); return 1;
}
SDL_GLContext ctx = SDL_GL_CreateContext(win);
if (!ctx) {
fprintf(stderr, "GL context: %s\n", SDL_GetError());
SDL_DestroyWindow(win); SDL_Quit(); return 1;
}
SDL_GL_SetSwapInterval(1);
printf("GL Renderer: %s\n", glGetString(GL_RENDERER));
printf("GL Version: %s\n", glGetString(GL_VERSION));
/* ── Shaders (NOW with uMVP matrix) ── */
const char *vs_src =
"attribute vec3 aPos;\n"
"attribute vec3 aCol;\n"
"uniform mat4 uMVP;\n"
"varying vec3 vCol;\n"
"void main() {\n"
" vCol = aCol;\n"
" gl_Position = uMVP * vec4(aPos, 1.0);\n"
"}\n";
const char *fs_src =
"precision mediump float;\n"
"varying vec3 vCol;\n"
"void main() {\n"
" gl_FragColor = vec4(vCol, 1.0);\n"
"}\n";
GLuint vs = compile_shader(GL_VERTEX_SHADER, vs_src);
GLuint fs = compile_shader(GL_FRAGMENT_SHADER, fs_src);
GLuint prog = link_program(vs, fs);
glDeleteShader(vs);
glDeleteShader(fs);
if (!prog) return 1;
GLint loc_pos = glGetAttribLocation(prog, "aPos");
GLint loc_col = glGetAttribLocation(prog, "aCol");
GLint loc_mvp = glGetUniformLocation(prog, "uMVP");
/* ── Square geometry (same as Step 2) ── */
const float verts[] = {
-0.5f, 0.5f, 0.0f, 1.0f, 0.0f, 0.0f,
0.5f, 0.5f, 0.0f, 0.0f, 1.0f, 0.0f,
0.5f, -0.5f, 0.0f, 0.0f, 0.0f, 1.0f,
-0.5f, -0.5f, 0.0f, 1.0f, 1.0f, 0.0f,
};
const GLushort indices[] = {
0, 1, 2,
2, 3, 0,
};
GLuint vbo, ibo;
glGenBuffers(1, &vbo);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBufferData(GL_ARRAY_BUFFER, sizeof(verts), verts, GL_STATIC_DRAW);
glGenBuffers(1, &ibo);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ibo);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(indices), indices, GL_STATIC_DRAW);
/* ── Render loop ── */
Uint64 t_start = SDL_GetPerformanceCounter();
double freq = (double)SDL_GetPerformanceFrequency();
int running = 1;
while (running) {
SDL_Event e;
while (SDL_PollEvent(&e)) {
if (e.type == SDL_QUIT) running = 0;
if (e.type == SDL_KEYDOWN && e.key.keysym.sym == SDLK_ESCAPE)
running = 0;
}
double sec = (double)(SDL_GetPerformanceCounter() - t_start) / freq;
float angle = (float)sec;
glViewport(0, 0, w, h);
glClearColor(0.12f, 0.12f, 0.14f, 1.0f);
glClear(GL_COLOR_BUFFER_BIT);
/* Build rotation matrix */
float MVP[16];
mat4_rotate_y(MVP, angle);
glUseProgram(prog);
glUniformMatrix4fv(loc_mvp, 1, GL_FALSE, MVP);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ibo);
glEnableVertexAttribArray((GLuint)loc_pos);
glVertexAttribPointer((GLuint)loc_pos, 3, GL_FLOAT, GL_FALSE,
6 * sizeof(float), (void *)0);
glEnableVertexAttribArray((GLuint)loc_col);
glVertexAttribPointer((GLuint)loc_col, 3, GL_FLOAT, GL_FALSE,
6 * sizeof(float), (void *)(3 * sizeof(float)));
glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_SHORT, (void *)0);
SDL_GL_SwapWindow(win);
}
glDeleteProgram(prog);
glDeleteBuffers(1, &vbo);
glDeleteBuffers(1, &ibo);
SDL_GL_DeleteContext(ctx);
SDL_DestroyWindow(win);
SDL_Quit();
return 0;
}
MAIN_EOF
Rebuild and run:
Checkpoint
The square spins around the Y axis. When it rotates 90°, it appears as a thin line (you're seeing the square edge-on). This is real 3D rotation — the Y-rotation matrix compresses the X coordinates as the square turns.
6. Step 4 — 3D Rotating Cube
Goal: Turn the flat rotating square into a full 3D cube with proper perspective.
The Model-View-Projection Pipeline
3D rendering uses a chain of matrix multiplications:
Model matrix View matrix Projection matrix
(rotate object) × (position camera) × (3D → 2D perspective)
Ry · Rx × Translate × Perspective
(spin cube) (move back -5) (60° field of view)
↓
MVP matrix
↓
gl_Position = uMVP * vec4(aPos, 1.0)
Each matrix transforms coordinates into a different "space":
- Model space → World space (rotation positions the cube in the scene)
- World space → View space (translation acts as camera placement)
- View space → Clip space (perspective makes far objects smaller)
We multiply all three into one MVP matrix on the CPU and send it to the GPU as a single uniform. The vertex shader does just one multiply per vertex.
The Depth Buffer
A cube has 6 faces. When two faces overlap on screen, which one is in front? The depth buffer (also called Z-buffer) stores the depth of each pixel. Before writing a pixel, the GPU checks: "Is this pixel closer than what's already there?" If not, it's discarded.
Without glEnable(GL_DEPTH_TEST), back faces would draw over front faces depending on draw order — the cube would look broken. We also request a 24-bit depth buffer with SDL_GL_SetAttribute(SDL_GL_DEPTH_SIZE, 24) and clear it each frame with GL_DEPTH_BUFFER_BIT.
What changes from Step 3:
- +4 matrix functions —
mat4_mul,mat4_perspective,mat4_translate,mat4_rotate_x - +depth buffer —
SDL_GL_DEPTH_SIZE,glEnable(GL_DEPTH_TEST), clearGL_DEPTH_BUFFER_BIT - 8 vertices, 36 indices — a full cube with 6 colored faces
- Full MVP chain — perspective × translate × rotate_y × rotate_x
- FPS counter — reports performance every 2 seconds
Replace main.c:
cat > main.c << 'MAIN_EOF'
#include <SDL2/SDL.h>
#include <GLES2/gl2.h>
#include <math.h>
#include <stdio.h>
#include <string.h>
/* ── Shader helpers ─────────────────────────────────── */
static GLuint compile_shader(GLenum type, const char *src)
{
GLuint s = glCreateShader(type);
glShaderSource(s, 1, &src, NULL);
glCompileShader(s);
GLint ok = 0;
glGetShaderiv(s, GL_COMPILE_STATUS, &ok);
if (!ok) {
char log[512];
glGetShaderInfoLog(s, sizeof(log), NULL, log);
fprintf(stderr, "Shader error: %s\n", log);
return 0;
}
return s;
}
static GLuint link_program(GLuint vs, GLuint fs)
{
GLuint p = glCreateProgram();
glAttachShader(p, vs);
glAttachShader(p, fs);
glLinkProgram(p);
GLint ok = 0;
glGetProgramiv(p, GL_LINK_STATUS, &ok);
if (!ok) {
char log[512];
glGetProgramInfoLog(p, sizeof(log), NULL, log);
fprintf(stderr, "Link error: %s\n", log);
return 0;
}
return p;
}
/* ── Matrix math (column-major, OpenGL convention) ──── */
static void mat4_identity(float m[16])
{
memset(m, 0, 16 * sizeof(float));
m[0] = m[5] = m[10] = m[15] = 1.0f;
}
static void mat4_mul(float out[16], const float a[16], const float b[16])
{
float r[16];
for (int c = 0; c < 4; c++)
for (int row = 0; row < 4; row++)
r[c * 4 + row] = a[0 * 4 + row] * b[c * 4 + 0]
+ a[1 * 4 + row] * b[c * 4 + 1]
+ a[2 * 4 + row] * b[c * 4 + 2]
+ a[3 * 4 + row] * b[c * 4 + 3];
memcpy(out, r, sizeof(r));
}
static void mat4_perspective(float m[16], float fovy_rad,
float aspect, float znear, float zfar)
{
float f = 1.0f / tanf(fovy_rad * 0.5f);
memset(m, 0, 16 * sizeof(float));
m[0] = f / aspect;
m[5] = f;
m[10] = (zfar + znear) / (znear - zfar);
m[11] = -1.0f;
m[14] = (2.0f * zfar * znear) / (znear - zfar);
}
static void mat4_translate(float m[16], float x, float y, float z)
{
mat4_identity(m);
m[12] = x; m[13] = y; m[14] = z;
}
static void mat4_rotate_y(float m[16], float a)
{
mat4_identity(m);
m[0] = cosf(a); m[8] = sinf(a);
m[2] = -sinf(a); m[10] = cosf(a);
}
static void mat4_rotate_x(float m[16], float a)
{
mat4_identity(m);
m[5] = cosf(a); m[9] = -sinf(a);
m[6] = sinf(a); m[10] = cosf(a);
}
/* ── Main ───────────────────────────────────────────── */
int main(int argc, char **argv)
{
(void)argc; (void)argv;
if (SDL_Init(SDL_INIT_VIDEO | SDL_INIT_EVENTS) != 0) {
fprintf(stderr, "SDL_Init: %s\n", SDL_GetError());
return 1;
}
/* Request OpenGL ES 2.0 context */
SDL_GL_SetAttribute(SDL_GL_CONTEXT_PROFILE_MASK, SDL_GL_CONTEXT_PROFILE_ES);
SDL_GL_SetAttribute(SDL_GL_CONTEXT_MAJOR_VERSION, 2);
SDL_GL_SetAttribute(SDL_GL_CONTEXT_MINOR_VERSION, 0);
SDL_GL_SetAttribute(SDL_GL_DOUBLEBUFFER, 1);
SDL_GL_SetAttribute(SDL_GL_DEPTH_SIZE, 24);
int w = 800, h = 480;
SDL_Window *win = SDL_CreateWindow("SDL2 GLES2 Cube",
SDL_WINDOWPOS_CENTERED, SDL_WINDOWPOS_CENTERED,
w, h, SDL_WINDOW_OPENGL | SDL_WINDOW_FULLSCREEN_DESKTOP);
if (!win) {
fprintf(stderr, "Window: %s\n", SDL_GetError());
SDL_Quit(); return 1;
}
SDL_GLContext ctx = SDL_GL_CreateContext(win);
if (!ctx) {
fprintf(stderr, "GL context: %s\n", SDL_GetError());
SDL_DestroyWindow(win); SDL_Quit(); return 1;
}
SDL_GL_SetSwapInterval(1); /* VSync ON */
printf("GL Renderer: %s\n", glGetString(GL_RENDERER));
printf("GL Version: %s\n", glGetString(GL_VERSION));
/* ── Shaders ── */
const char *vs_src =
"attribute vec3 aPos;\n"
"attribute vec3 aCol;\n"
"uniform mat4 uMVP;\n"
"varying vec3 vCol;\n"
"void main() {\n"
" vCol = aCol;\n"
" gl_Position = uMVP * vec4(aPos, 1.0);\n"
"}\n";
const char *fs_src =
"precision mediump float;\n"
"varying vec3 vCol;\n"
"void main() {\n"
" gl_FragColor = vec4(vCol, 1.0);\n"
"}\n";
GLuint vs = compile_shader(GL_VERTEX_SHADER, vs_src);
GLuint fs = compile_shader(GL_FRAGMENT_SHADER, fs_src);
GLuint prog = link_program(vs, fs);
glDeleteShader(vs);
glDeleteShader(fs);
if (!prog) return 1;
GLint loc_pos = glGetAttribLocation(prog, "aPos");
GLint loc_col = glGetAttribLocation(prog, "aCol");
GLint loc_mvp = glGetUniformLocation(prog, "uMVP");
/* ── Cube geometry: 8 vertices × (position + color) ── */
const float verts[] = {
/* x y z r g b */
-1.f, -1.f, -1.f, 1.f, 0.f, 0.f,
1.f, -1.f, -1.f, 0.f, 1.f, 0.f,
1.f, 1.f, -1.f, 0.f, 0.f, 1.f,
-1.f, 1.f, -1.f, 1.f, 1.f, 0.f,
-1.f, -1.f, 1.f, 1.f, 0.f, 1.f,
1.f, -1.f, 1.f, 0.f, 1.f, 1.f,
1.f, 1.f, 1.f, 1.f, 1.f, 1.f,
-1.f, 1.f, 1.f, 0.3f, 0.3f, 0.3f,
};
/* 12 triangles = 36 indices */
const GLushort indices[] = {
0,1,2, 2,3,0, /* back */
4,5,6, 6,7,4, /* front */
0,4,7, 7,3,0, /* left */
1,5,6, 6,2,1, /* right */
3,2,6, 6,7,3, /* top */
0,1,5, 5,4,0, /* bottom */
};
GLuint vbo, ibo;
glGenBuffers(1, &vbo);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBufferData(GL_ARRAY_BUFFER, sizeof(verts), verts, GL_STATIC_DRAW);
glGenBuffers(1, &ibo);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ibo);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(indices), indices, GL_STATIC_DRAW);
glEnable(GL_DEPTH_TEST);
/* ── Render loop ── */
Uint64 t_start = SDL_GetPerformanceCounter();
double freq = (double)SDL_GetPerformanceFrequency();
int frames = 0;
Uint64 t_fps = t_start;
int running = 1;
while (running) {
SDL_Event e;
while (SDL_PollEvent(&e)) {
if (e.type == SDL_QUIT) running = 0;
if (e.type == SDL_KEYDOWN && e.key.keysym.sym == SDLK_ESCAPE) running = 0;
if (e.type == SDL_WINDOWEVENT &&
e.window.event == SDL_WINDOWEVENT_SIZE_CHANGED) {
w = e.window.data1;
h = e.window.data2;
}
}
double sec = (double)(SDL_GetPerformanceCounter() - t_start) / freq;
float angle = (float)sec;
glViewport(0, 0, w, h);
glClearColor(0.12f, 0.12f, 0.14f, 1.0f);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
/* Build Model-View-Projection matrix */
float P[16], T[16], Rx[16], Ry[16], Rxy[16], M[16], MVP[16];
mat4_perspective(P, 60.0f * (3.14159f / 180.0f),
(float)w / (float)h, 0.1f, 100.0f);
mat4_translate(T, 0.f, 0.f, -5.0f);
mat4_rotate_y(Ry, angle);
mat4_rotate_x(Rx, angle * 0.7f);
mat4_mul(Rxy, Ry, Rx);
mat4_mul(M, T, Rxy);
mat4_mul(MVP, P, M);
glUseProgram(prog);
glUniformMatrix4fv(loc_mvp, 1, GL_FALSE, MVP);
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ibo);
glEnableVertexAttribArray((GLuint)loc_pos);
glVertexAttribPointer((GLuint)loc_pos, 3, GL_FLOAT, GL_FALSE,
6 * sizeof(float), (void *)0);
glEnableVertexAttribArray((GLuint)loc_col);
glVertexAttribPointer((GLuint)loc_col, 3, GL_FLOAT, GL_FALSE,
6 * sizeof(float), (void *)(3 * sizeof(float)));
glDrawElements(GL_TRIANGLES, 36, GL_UNSIGNED_SHORT, (void *)0);
SDL_GL_SwapWindow(win);
/* FPS reporting */
frames++;
Uint64 now = SDL_GetPerformanceCounter();
double dt = (double)(now - t_fps) / freq;
if (dt >= 2.0) {
printf("FPS: %.1f (%.2f ms/frame)\n",
frames / dt, dt / frames * 1000.0);
frames = 0;
t_fps = now;
}
}
glDeleteProgram(prog);
glDeleteBuffers(1, &vbo);
glDeleteBuffers(1, &ibo);
SDL_GL_DeleteContext(ctx);
SDL_DestroyWindow(win);
SDL_Quit();
return 0;
}
MAIN_EOF
Rebuild and run:
Checkpoint
A rotating rainbow-colored cube is visible on the display. It spins on two axes, with proper depth — near faces cover far faces. The terminal prints FPS approximately every 2 seconds.
Stuck?
- Black screen, no cube — check
GL Rendereroutput. If it says "llvmpipe" or "Software Rasterizer", the GPU driver is not working. Ensuredtoverlay=vc4-kms-v3dis in/boot/firmware/config.txt - "EGL: No matching config" — try reducing depth size: add
SDL_GL_SetAttribute(SDL_GL_DEPTH_SIZE, 16);before creating the window
7. VSync and Tearing
Concept: VSync locks frame presentation to the display refresh. Disabling it lets you measure raw GPU throughput but causes visible tearing.
Try disabling VSync by changing one line in main.c:
Rebuild and run. Observe:
- FPS jumps — without VSync, frames render as fast as the GPU allows (likely 200+ FPS for a simple cube)
- Tearing — horizontal tear lines appear because the display reads mid-frame
| VSync Setting | Expected FPS | Tearing | CPU Usage |
|---|---|---|---|
SetSwapInterval(1) |
~60 | None | Low (idle between frames) |
SetSwapInterval(0) |
_ | _ | _ |
Fill in the table with your measurements.
The Three Swap Interval Values
SDL_GL_SetSwapInterval(1); /* VSync ON — swap at next VBlank (60 FPS cap) */
SDL_GL_SetSwapInterval(0); /* VSync OFF — swap immediately (max FPS, tearing) */
SDL_GL_SetSwapInterval(-1); /* Adaptive VSync — VSync on if fast enough,
tear if a frame misses the deadline.
Falls back to 0 if not supported. */
When to Use Each
| Setting | Use case | Trade-off |
|---|---|---|
1 (VSync on) |
Production display apps — dashboards, HMIs, games. Always tear-free. | Input latency up to 16.7 ms (one frame). GPU idles between frames. |
0 (VSync off) |
Benchmarking — measure raw GPU throughput. Headless rendering — render offscreen as fast as possible. Latency-critical — VR, fast-response control displays where tearing is acceptable. | Visible tearing on screen. CPU burns at 100% in the render loop (no idle wait). |
-1 (adaptive) |
Games with variable load — stays tear-free when GPU keeps up, drops to tearing only on heavy frames instead of halving to 30 FPS. | Not supported on all drivers. Check return value: if SDL_GL_SetSwapInterval(-1) returns -1, fall back to 1. |
/* Adaptive VSync with fallback */
if (SDL_GL_SetSwapInterval(-1) == -1) {
printf("Adaptive VSync not supported, using standard VSync\n");
SDL_GL_SetSwapInterval(1);
}
Checkpoint
You can toggle VSync on/off and observe the difference in FPS and tearing. Restore SetSwapInterval(1) when done.
8. Performance Measurement
Concept: On embedded hardware, GPU utilization matters as much as frame rate. A 60 FPS app that uses 95% of the GPU leaves no headroom for complexity.
GPU Utilization
Knowing GPU utilization tells you how much headroom you have. If the cube uses 5% of the GPU at 60 FPS, you can add complex shaders, more geometry, or post-processing effects. If it uses 90%, you're near the limit — any added complexity will drop frames.
# While the cube is running, open another SSH session:
# Method 1: V3D driver debug (most accurate on Pi 4)
sudo cat /sys/kernel/debug/dri/0/v3d_usage
# Method 2: Query the GPU clock speed (higher = more load)
vcgencmd measure_clock v3d
# → frequency(29)=500000000 (500 MHz = busy)
# → frequency(29)=250000000 (250 MHz = idle, clock scaled down)
# Method 3: Overall system view
mpstat 1
# Low %sys + low %usr with VSync on = GPU doing the work, CPU idle
Why GPU Utilization Matters on Embedded
On a desktop, the GPU has massive headroom — a cube uses <0.1%. On the Pi 4's VideoCore VI, a simple cube might use 2–5%, but add textures, lighting, particles, and post-processing effects, and you can reach 80%+ quickly. Monitoring GPU utilization during development catches performance problems before they become frame drops in production.
Frame Timing
The app already prints FPS. For more detailed per-frame measurement, add timing around the render + swap:
Uint64 frame_start = SDL_GetPerformanceCounter();
/* ... render ... */
SDL_GL_SwapWindow(win);
Uint64 frame_end = SDL_GetPerformanceCounter();
double frame_ms = (double)(frame_end - frame_start) / freq * 1000.0;
printf("Frame: %.2f ms (render + vsync wait)\n", frame_ms);
With VSync on, frame_ms will be ~16.7 ms (includes the VSync wait). To measure render time only (how long the GPU takes), time from render start to just before SDL_GL_SwapWindow:
Uint64 render_start = SDL_GetPerformanceCounter();
/* ... glClear, glDrawElements ... */
glFinish(); /* Force GPU to complete before measuring */
Uint64 render_end = SDL_GetPerformanceCounter();
double render_ms = (double)(render_end - render_start) / freq * 1000.0;
SDL_GL_SwapWindow(win);
printf("Render: %.2f ms (budget: 16.7 ms for 60 FPS)\n", render_ms);
glFinish() Is for Measurement Only
glFinish() blocks the CPU until the GPU completes all pending work. This defeats the purpose of GPU parallelism — in production, never call it. Here we use it only to get accurate render timing. Remove it after measuring.
Fill In Your Measurements
| Metric | Value |
|---|---|
| GL Renderer | _ |
| Display resolution | _ |
| FPS (VSync on) | _ |
| FPS (VSync off) | _ |
| Frame time (VSync on, ms) | _ |
| GPU utilization (%) | _ |
Checkpoint
Your measurement table is filled in. With VSync on, FPS should be ~60 and frame time ~16.7 ms.
What Just Happened?
You built a complete GPU-accelerated 3D application from scratch on an embedded Linux device, one step at a time:
Step 1: Triangle → shaders + VBO + glDrawArrays
Step 2: Square → index buffer + glDrawElements
Step 3: Spinning → rotation matrix + uniform
Step 4: 3D Cube → full MVP + depth buffer + 8 vertices
The rendering path is:
main.c render loop
→ OpenGL ES draw call → GPU renders into back buffer
→ SDL_GL_SwapWindow() → waits for VSync
→ DRM page flip → display scans out the new buffer
This is the same path used by car dashboards, industrial HMIs, and embedded games. The cube is trivial, but the pipeline — SDL2 + GLES2 + KMS/DRM + VSync — is production-grade.
Challenges
Challenge 1: Touch Rotation
Add touch/mouse input to control the cube's rotation instead of auto-rotating. Handle SDL_FINGERMOTION events and map finger delta X/Y to rotation angles. This prepares you for the IMU Controller tutorial.
Challenge 2: Add a Second Cube
Draw two cubes side by side, each with a different rotation speed. This requires a second mat4_translate offset and a second draw call. How does this affect FPS?
Challenge 3: Run on Different Displays
If you have both HDMI and a DSI display, run the cube on each and compare FPS. Both should be identical (same GPU path). What happens if you try to run on the SPI display?
Deliverable
- [ ] Built the cube incrementally through all 4 steps (triangle → square → rotating → 3D)
- [ ] Rotating cube running at 60 FPS on the Pi display
- [ ] FPS and frame time measurements recorded
- [ ] VSync on/off comparison table filled in
- [ ] Brief note: one sentence explaining why
SDL_GL_SwapWindowblocks when VSync is enabled