Threads and Synchronization in C
Time: 90 min | Prerequisites: Processes and IPC | Theory companion: Linux Fundamentals, Sections 1.4–1.5
Learning Objectives
By the end of this tutorial you will be able to:
- Create POSIX threads with
pthread_create()and protect shared data withpthread_mutex_t - Identify thread-unsafe functions and replace them with reentrant alternatives
- Use condition variables for efficient producer-consumer communication
- Use POSIX semaphores to limit concurrent access to bounded resources
- Use C11
_Atomictypes and operations for lock-free thread communication - Build a multi-threaded system dashboard that reads from
/procand/sys
Before You Start
All exercises run on any Linux machine — your host laptop (Ubuntu, Fedora, Arch, etc.) or the Raspberry Pi via SSH. You need gcc installed:
You should have completed the Processes and IPC tutorial first. The thread concepts here build directly on the fork/pipe/signal patterns you learned there.
1. Threads with pthreads
Processes are isolated — each has its own memory. Threads share the same address space, which makes data sharing trivial but introduces race conditions.
Step 1: thread_demo.c
Question
Predict Before You Run
Two threads each increment a shared counter 1,000,000 times. What final value do you expect?
- If the threads ran sequentially: ___
- If two threads increment simultaneously without protection: ___
Write down your prediction, then compile and run.
/* thread_demo.c — shared counter with and without mutex */
#include <stdio.h>
#include <pthread.h>
#define ITERATIONS 1000000
int counter = 0;
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
void *worker_unsafe(void *arg)
{
(void)arg;
for (int i = 0; i < ITERATIONS; i++)
counter++; /* Race condition! */
return NULL;
}
void *worker_safe(void *arg)
{
(void)arg;
for (int i = 0; i < ITERATIONS; i++) {
pthread_mutex_lock(&lock);
counter++;
pthread_mutex_unlock(&lock);
}
return NULL;
}
int main(void)
{
pthread_t t1, t2;
/* Round 1: without mutex */
counter = 0;
pthread_create(&t1, NULL, worker_unsafe, NULL);
pthread_create(&t2, NULL, worker_unsafe, NULL);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
printf("Without mutex: %d (expected %d)\n", counter, 2 * ITERATIONS);
/* Round 2: with mutex */
counter = 0;
pthread_create(&t1, NULL, worker_safe, NULL);
pthread_create(&t2, NULL, worker_safe, NULL);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
printf("With mutex: %d (expected %d)\n", counter, 2 * ITERATIONS);
return 0;
}
The "without mutex" result will be less than 2,000,000 — both threads read, increment, and write the counter simultaneously, losing updates. The "with mutex" result is always exactly 2,000,000.


Exercise: Quantify the Race
Run the demo 20 times and record the spread:
Question
Analyse the Results
- What is the lowest value you saw? The highest?
- Change
ITERATIONSto100. Does the race still appear? Why or why not? - Change
ITERATIONSto10000000. Does the spread get wider or narrower?
Exercise: ThreadSanitizer
Compile with ThreadSanitizer to detect the race automatically:
TSan prints a data race report showing which threads access counter and from which source lines. Now comment out the worker_unsafe round (keep only worker_safe) and rebuild with TSan — the report disappears, confirming the mutex fixes the race.
Tip
ThreadSanitizer is your best friend for concurrent code. Get in the habit of compiling with -fsanitize=thread during development. It catches races that only manifest under specific timing — races you might never see in normal testing.
Exercise: Scale the Contention
Change thread_demo.c to use 4 threads instead of 2 (expected total: 4,000,000):
pthread_t threads[4];
for (int i = 0; i < 4; i++)
pthread_create(&threads[i], NULL, worker_unsafe, NULL);
for (int i = 0; i < 4; i++)
pthread_join(threads[i], NULL);
Question
Predict Before You Run
- Will the data loss be worse with 4 threads than with 2? Why?
- Run 10 times and compare the spread to the 2-thread version.
Exercise: Observe the Race Condition
Run the demo 10 times and count how many runs produce the wrong result:
On a multi-core Pi 4, most runs will show data loss. On a single-core system, the race is harder to trigger but still exists.
Exercise: Variable Scoping Across Threads
Threads share global and static variables but each has its own local (stack) variables. This program makes it visible:
/* thread_scope.c — local vs static vs global in threads */
#define _GNU_SOURCE
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/types.h>
int g = 0; /* Global — shared by all threads */
void *worker(void *arg)
{
(void)arg;
static int s = 0; /* Static — shared by all calls to this function */
int local = 0; /* Local — private to each thread (on its own stack) */
++local;
++s;
++g;
printf("PID: %d, TID: %d, Local: %d, Static: %d, Global: %d\n",
getpid(), gettid(), local, s, g);
return NULL;
}
int main(void)
{
printf("Main — PID: %d, TID: %d\n", getpid(), gettid());
pthread_t threads[5];
for (int i = 0; i < 5; i++)
pthread_create(&threads[i], NULL, worker, NULL);
for (int i = 0; i < 5; i++)
pthread_join(threads[i], NULL);
printf("Final — Global: %d (expected 5)\n", g);
return 0;
}
All threads share the same PID but each has a unique TID. The local variable is always 1 (private stack). The static and global variables accumulate across threads.
Warning
Classic bug: passing &i to threads.
A common mistake when creating threads in a loop:
All threads receive a pointer to the same i. By the time the thread reads *arg, the loop may have advanced — so multiple threads see the same value (often 5), and some values are skipped entirely. Fix: pass the value directly with (void *)(intptr_t)i or allocate a per-thread copy.
Note
Under the hood, Linux uses clone() for everything.
fork()=clone()with separatemm_struct(separate page tables, separate memory)pthread_create()=clone()withCLONE_VM | CLONE_FS | CLONE_FILES(shared memory, shared FDs)
This is why Linux process and thread creation share the same kernel code path. The flags determine the level of sharing.
Checkpoint 1
| Question | Your Answer |
|---|---|
| Counter value without mutex (one run) | |
| Counter value with mutex | |
| How many of 20 runs showed data loss? | |
| What was the min/max spread? | |
In thread_scope, is local always 1? Why? |
|
| Do all threads share the same PID or TID? | |
Did TSan report a race for worker_unsafe? For worker_safe? |
2. What Is Thread Safety?
A function (or data structure) is thread-safe if it produces correct results when called simultaneously from multiple threads. Not all C library functions are thread-safe — knowing how to check is a critical skill.
Four Levels of Safety
| Level | Meaning | Example |
|---|---|---|
| Thread-unsafe | Uses hidden shared state; breaks under concurrent access | strtok(), asctime(), rand() |
| Conditionally safe | Safe if each thread uses its own instance | strtok_r() with per-thread saveptr |
| Thread-safe | Safe to call from any thread at any time | printf() (internally locked), strlen() |
| Reentrant | No shared state at all; safe even in signal handlers | memcpy(), strlen(), pure computations |
Hands-on: strtok Is Broken Under Threads
strtok() uses an internal static pointer to track its position. When two threads call it on different strings, they corrupt each other's state.
/* strtok_threads.c — prove strtok is NOT thread-safe
*
* Build: gcc -Wall -pthread -o strtok_threads strtok_threads.c
* Run: ./strtok_threads
*/
#include <stdio.h>
#include <string.h>
#include <pthread.h>
void *tokenize(void *arg)
{
char *input = strdup((char *)arg); /* each thread gets its own copy */
char *token = strtok(input, ",");
while (token) {
printf("[Thread %s] token: '%s'\n", (char *)arg, token);
token = strtok(NULL, ",");
}
free(input);
return NULL;
}
int main(void)
{
pthread_t t1, t2;
pthread_create(&t1, NULL, tokenize, "alpha,beta,gamma");
pthread_create(&t2, NULL, tokenize, "one,two,three");
pthread_join(t1, NULL);
pthread_join(t2, NULL);
return 0;
}
Run it several times. You will see garbled output — tokens from one string appear in the other thread's output, or tokens are skipped entirely. The internal static variable inside strtok() is shared across both threads.
Fix: Use the Reentrant Version
Replace strtok() with strtok_r(), which takes an explicit saveptr instead of using hidden static state:
void *tokenize_safe(void *arg)
{
char *input = strdup((char *)arg);
char *saveptr; /* thread-local state */
char *token = strtok_r(input, ",", &saveptr);
while (token) {
printf("[Thread %s] token: '%s'\n", (char *)arg, token);
token = strtok_r(NULL, ",", &saveptr);
}
free(input);
return NULL;
}
Question
Exercise: Check the Man Page
Run man 3 strtok and look for the ATTRIBUTES section. You will see:
| Function | MT-Safety |
|---|---|
strtok() |
MT-Unsafe race:strtok |
strtok_r() |
MT-Safe |
Now check these functions the same way. Are they MT-Safe or MT-Unsafe?
printf()→ ___asctime()→ ___asctime_r()→ ___rand()→ ___rand_r()→ ___
Quick Checklist: Is This Function Thread-Safe?
Before using a C library function in threaded code, ask:
- Does it use static/global variables internally? → Likely unsafe
- Does
man 3 <function>say MT-Unsafe? → Definitely unsafe - Does a
_r(reentrant) variant exist? → Use that instead - Does it only operate on its arguments? → Likely safe
3. Condition Variables — Producer-Consumer
The dashboard (Section 6) uses usleep() to poll for new data. This works but wastes CPU cycles. In real embedded systems, you want threads to sleep until data arrives. That's what condition variables do.
The Problem with Polling
/* Busy-waiting — bad! */
while (1) {
pthread_mutex_lock(&lock);
if (data_ready) { process(); data_ready = 0; }
pthread_mutex_unlock(&lock);
usleep(1000); /* wastes 1ms between checks */
}
The Solution: pthread_cond_wait
/* Efficient — thread sleeps until signaled */
pthread_mutex_lock(&lock);
while (!data_ready)
pthread_cond_wait(&cond, &lock); /* atomically: unlock + sleep + relock */
process();
data_ready = 0;
pthread_mutex_unlock(&lock);
pthread_cond_wait does three things atomically:
- Unlocks the mutex (so the producer can write)
- Puts the thread to sleep (no CPU usage)
- When signaled, re-locks the mutex and returns
Step 3: producer_consumer.c
Question
Predict the Buffer
The producer writes at 200 ms intervals (5 Hz). The consumer reads at 500 ms intervals (2 Hz). The buffer holds 8 items.
- After 4 seconds, approximately how many items has the producer written? ___
- How many has the consumer read? ___
- Is the buffer full, partially full, or empty? ___
Write down your prediction, then run the program and check.
A sensor producer generates data into a circular buffer, a consumer processes it. The consumer sleeps when the buffer is empty.
/* producer_consumer.c — circular buffer with condition variables
*
* Build: gcc -Wall -pthread -o producer_consumer producer_consumer.c
* Run: ./producer_consumer
*/
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
#include <signal.h>
#include <time.h>
#define BUF_SIZE 8
/* ── Circular buffer ──────────────────────────────────── */
typedef struct {
int data[BUF_SIZE];
int head; /* next write position */
int tail; /* next read position */
int count; /* items in buffer */
pthread_mutex_t lock;
pthread_cond_t not_empty;
pthread_cond_t not_full;
} ringbuf_t;
ringbuf_t buf = {
.head = 0, .tail = 0, .count = 0,
.lock = PTHREAD_MUTEX_INITIALIZER,
.not_empty = PTHREAD_COND_INITIALIZER,
.not_full = PTHREAD_COND_INITIALIZER,
};
volatile sig_atomic_t running = 1;
void handle_int(int sig) { (void)sig; running = 0; }
/* ── Producer: simulated sensor ───────────────────────── */
void *producer(void *arg)
{
(void)arg;
int reading = 0;
while (running) {
/* Simulate a sensor reading (20-30 C with noise) */
int value = 20000 + (rand() % 10000); /* millidegrees */
reading++;
pthread_mutex_lock(&buf.lock);
/* Wait if buffer is full */
while (buf.count == BUF_SIZE && running)
pthread_cond_wait(&buf.not_full, &buf.lock);
if (!running) {
pthread_mutex_unlock(&buf.lock);
break;
}
/* Write to buffer */
buf.data[buf.head] = value;
buf.head = (buf.head + 1) % BUF_SIZE;
buf.count++;
printf("[Producer] #%d: wrote %d mC (buffer: %d/%d)\n",
reading, value, buf.count, BUF_SIZE);
/* Wake consumer */
pthread_cond_signal(&buf.not_empty);
pthread_mutex_unlock(&buf.lock);
usleep(200000); /* 200ms — sensor sample rate */
}
return NULL;
}
/* ── Consumer: process and log ────────────────────────── */
void *consumer(void *arg)
{
(void)arg;
int processed = 0;
long sum = 0;
while (running || buf.count > 0) {
pthread_mutex_lock(&buf.lock);
/* Wait if buffer is empty */
while (buf.count == 0 && running)
pthread_cond_wait(&buf.not_empty, &buf.lock);
if (buf.count == 0) {
pthread_mutex_unlock(&buf.lock);
break;
}
/* Read from buffer */
int value = buf.data[buf.tail];
buf.tail = (buf.tail + 1) % BUF_SIZE;
buf.count--;
/* Wake producer if it was waiting on a full buffer */
pthread_cond_signal(&buf.not_full);
pthread_mutex_unlock(&buf.lock);
/* Process outside the lock (simulate slow processing) */
processed++;
sum += value;
double avg = (double)sum / processed / 1000.0;
printf(" [Consumer] #%d: read %d mC, running avg: %.1f C\n",
processed, value, avg);
usleep(500000); /* 500ms — consumer is slower than producer */
}
printf(" [Consumer] Done: %d readings, final avg: %.1f C\n",
processed, processed > 0 ? (double)sum / processed / 1000.0 : 0);
return NULL;
}
int main(void)
{
srand(time(NULL));
struct sigaction sa = { .sa_handler = handle_int };
sigaction(SIGINT, &sa, NULL);
printf("Producer-Consumer demo (Ctrl+C to stop)\n");
printf("Producer: 200ms interval, Consumer: 500ms interval\n");
printf("Buffer size: %d — watch it fill up!\n\n", BUF_SIZE);
pthread_t t_prod, t_cons;
pthread_create(&t_prod, NULL, producer, NULL);
pthread_create(&t_cons, NULL, consumer, NULL);
/* Wait for Ctrl+C */
while (running)
pause();
/* Wake threads that might be waiting on conditions */
pthread_cond_broadcast(&buf.not_empty);
pthread_cond_broadcast(&buf.not_full);
pthread_join(t_prod, NULL);
pthread_join(t_cons, NULL);
return 0;
}
Build and Run
Watch the buffer fill up — the producer writes at 5 Hz but the consumer only reads at 2 Hz. The buffer acts as a shock absorber. When it fills to 8/8, the producer sleeps until the consumer catches up.
[Producer] #1: wrote 25431 mC (buffer: 1/8)
[Consumer] #1: read 25431 mC, running avg: 25.4 C
[Producer] #2: wrote 21087 mC (buffer: 1/8)
[Producer] #3: wrote 28943 mC (buffer: 2/8)
[Consumer] #2: read 21087 mC, running avg: 23.3 C
[Producer] #4: wrote 23156 mC (buffer: 2/8)
[Producer] #5: wrote 27891 mC (buffer: 3/8)
...
Note
Why while (!data_ready) and not if (!data_ready)? Condition variables can have spurious wakeups — the OS may wake the thread without a signal. The while loop re-checks the condition after waking. This is a universal rule: always use while with pthread_cond_wait, never if.
Exercise: What Happens When...
Try each of these modifications one at a time and observe the result:
Question
Experiment A — Swap the speeds
Make the producer slower (500 ms) and consumer faster (200 ms). What happens to the buffer fill level? Does the consumer ever wait?
Question
Experiment B — Tiny buffer
Set BUF_SIZE to 1. How does the output change? Is the producer blocked most of the time?
Question
Experiment C — Remove the signal
Comment out the pthread_cond_signal(&buf.not_empty) line in the producer. What happens to the consumer? Why?
Question
Experiment D — Spurious Wakeups
Change the while (buf.count == 0 && running) in the consumer to if (buf.count == 0 && running). Compile with TSan (-fsanitize=thread). Can you observe incorrect behaviour?
Tip
Detect race conditions automatically. Compile with ThreadSanitizer to catch bugs:
Or use Valgrind's Helgrind (slower but catches lock-order violations too):
Checkpoint 3
| Question | Your Answer |
|---|---|
| What happens when the buffer fills to 8/8? | |
Why does the consumer use while (buf.count == 0) not if? |
|
What would happen without pthread_cond_broadcast in main? |
|
| When you swapped speeds, did the buffer ever fill up? |
4. POSIX Semaphores — Bounded Resources
A counting semaphore is an integer counter with two atomic operations: wait (decrement, block if zero) and post (increment, wake a waiter). Think of it as "N permits available."
Concept: Parking Lot
Imagine a parking lot with 3 spaces. A car can enter (wait → decrement) only if a space is free. When a car leaves (post → increment), another can enter.
sem_init(&spots, 0, 3) → spots = 3
Car A arrives: sem_wait → spots = 2 (enters)
Car B arrives: sem_wait → spots = 1 (enters)
Car C arrives: sem_wait → spots = 0 (enters)
Car D arrives: sem_wait → spots = 0 (BLOCKS — lot full)
Car B leaves: sem_post → spots = 1 (Car D unblocks, enters)
Semaphore vs Condition Variable
sem_t (semaphore) |
pthread_cond_t (condvar) |
|
|---|---|---|
| Has state | Yes — internal counter | No — stateless signal |
| Remembers signals | Yes — sem_post before sem_wait still counts |
No — signal is lost if nobody is waiting |
| Needs mutex | No — self-contained | Yes — always paired with a mutex |
| Spurious wakeups | No | Yes — must use while loop |
| Async-signal-safe | sem_post is safe |
No — cannot use in signal handlers |
| Best for | Limiting concurrency (N permits) | Notifying state changes |
Step 4: parking_lot.c
/* parking_lot.c — POSIX semaphore demo
*
* Build: gcc -Wall -pthread -o parking_lot parking_lot.c
* Run: ./parking_lot
*/
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <semaphore.h>
#include <unistd.h>
#include <time.h>
#define NUM_CARS 8
#define CAPACITY 3
sem_t spots;
void *car(void *arg)
{
int id = (int)(intptr_t)arg;
printf("Car %d: arriving...\n", id);
struct timespec start, end;
clock_gettime(CLOCK_MONOTONIC, &start);
sem_wait(&spots); /* block if lot is full */
clock_gettime(CLOCK_MONOTONIC, &end);
double waited = (end.tv_sec - start.tv_sec)
+ (end.tv_nsec - start.tv_nsec) / 1e9;
printf("Car %d: PARKED (waited %.2f s)\n", id, waited);
/* Simulate being parked for 1-3 seconds */
usleep((1000 + rand() % 2000) * 1000);
printf("Car %d: leaving\n", id);
sem_post(&spots); /* free the spot */
return NULL;
}
int main(void)
{
srand(time(NULL));
sem_init(&spots, 0, CAPACITY);
printf("Parking lot: %d spots, %d cars\n\n", CAPACITY, NUM_CARS);
pthread_t threads[NUM_CARS];
for (int i = 0; i < NUM_CARS; i++) {
pthread_create(&threads[i], NULL, car, (void *)(intptr_t)(i + 1));
usleep(200000); /* stagger arrivals by 200ms */
}
for (int i = 0; i < NUM_CARS; i++)
pthread_join(threads[i], NULL);
sem_destroy(&spots);
printf("\nAll cars served.\n");
return 0;
}
You will see 3 cars park immediately, the rest wait. As each car leaves, the next one enters.
Experiments
Question
Experiment A — Semaphore as mutex
Set CAPACITY=1. How does the output change? A semaphore with count 1 behaves like a mutex — only one car at a time. This is called a binary semaphore.
Question
Experiment B — No contention
Set CAPACITY=8 (same as NUM_CARS). Does any car wait? What is the maximum wait time?
Question
Experiment C — Deadlock
Set CAPACITY=0. What happens? Every car blocks on sem_wait because the count starts at zero. No thread ever calls sem_post, so the program hangs forever. This is a deadlock. Press Ctrl+C to escape.
Question
Experiment D — Measure contention
Try CAPACITY=3 vs CAPACITY=5. Compare the average wait times. How does the number of spots affect throughput?
POSIX Semaphore API Reference
| Function | Purpose |
|---|---|
sem_init(&sem, 0, N) |
Initialize unnamed semaphore with count N (0 = process-local) |
sem_wait(&sem) |
Decrement; block if count is 0 |
sem_trywait(&sem) |
Decrement if count > 0; return EAGAIN otherwise (non-blocking) |
sem_post(&sem) |
Increment; wake one waiting thread (async-signal-safe) |
sem_getvalue(&sem, &val) |
Read current count |
sem_destroy(&sem) |
Clean up |
Note
Named vs unnamed semaphores. sem_init() creates an unnamed semaphore (thread-level). For IPC between unrelated processes, use sem_open("/my_sem", O_CREAT, 0644, N) instead — this creates a named semaphore visible in /dev/shm/. We use unnamed semaphores here because all our threads share the same address space.
Checkpoint 4
| Question | Your Answer |
|---|---|
| With CAPACITY=3, how many cars parked simultaneously? | |
| With CAPACITY=1, did it behave like a mutex? | |
| With CAPACITY=0, what happened? | |
Is sem_post async-signal-safe? Is pthread_cond_signal? |
5. C11 Atomics — Lock-Free Shared Data
Mutexes work but carry overhead — locking, unlocking, and potentially blocking the thread if someone else holds the lock. For sharing a single value between threads (a sensor reading, a counter, a flag), C11 _Atomic types let the CPU update the value in one instruction — no lock, no contention.
Why volatile is not enough
A common misconception: "volatile prevents caching, so it makes shared variables thread-safe." It doesn't. volatile only prevents the compiler from optimizing away reads/writes — it does nothing about CPU reordering or atomicity of read-modify-write operations.
Question
Predict Before You Run
The program below uses volatile int counter and two threads each incrementing it 1,000,000 times. Will the result be correct (2,000,000)? Why or why not?
Remember: counter++ compiles to three CPU operations: load, add, store.
Create this file to see the problem:
/* volatile_broken.c — prove that volatile does NOT fix data races */
#include <stdio.h>
#include <pthread.h>
static volatile int counter = 0;
static void *increment(void *arg)
{
(void)arg;
for (int i = 0; i < 1000000; i++)
counter++; /* NOT atomic: load, add, store — 3 steps */
return NULL;
}
int main(void)
{
pthread_t t1, t2;
pthread_create(&t1, NULL, increment, NULL);
pthread_create(&t2, NULL, increment, NULL);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
printf("counter = %d (expected 2000000)\n", counter);
return 0;
}
gcc -Wall -pthread -o volatile_broken volatile_broken.c
./volatile_broken # run 5 times — result varies, always < 2000000
Compare this to thread_demo.c from Section 1 — same bug, same root cause: counter++ is three operations (load, add, store) that can interleave between threads.
The synchronization spectrum
volatile |
volatile sig_atomic_t |
_Atomic (C11) |
pthread_mutex_t |
|
|---|---|---|---|---|
| Prevents compiler reorder | Yes | Yes | Yes | Yes |
| Prevents CPU reorder | No | No | Yes | Yes |
| Atomic read-modify-write | No | No | Yes | Yes (via lock) |
| Safe between threads | No | No | Yes | Yes |
| Safe in signal handlers | No | Yes | No | No |
| Overhead | None | None | ~1 instruction | Syscall if contended |
| Use case | HW registers | Signal flags | Counters, sensor values | Multi-field structs |
Warning
_Atomic is NOT async-signal-safe. The C and POSIX standards only guarantee volatile sig_atomic_t for variables shared between a signal handler and normal code. Keep using volatile sig_atomic_t for signal handlers — use _Atomic for thread-to-thread sharing.
Exercise: See the CPU Instructions
Compile volatile_broken.c to assembly and compare the increment:
# Plain volatile increment
gcc -S -O2 -o volatile.s volatile_broken.c
grep -A5 'counter' volatile.s
# Now change 'volatile int' to '_Atomic int' and recompile
gcc -S -O2 -o atomic.s volatile_broken.c
grep -A5 'counter' atomic.s
Question
Compare the output
- On x86: look for
lockprefix instructions (e.g.,lock addl) - On ARM: look for
ldxr/stxr(load-exclusive/store-exclusive) loops - How many instructions does the plain version use? The atomic version?
Guided reading: atomic_sensor.c
The course repository contains a complete program that demonstrates all four core atomic operations in a realistic sensor→display architecture. Instead of copying code from this page, you will read the source and answer questions about it.
Architecture overview:
Sensor Thread (10 Hz) Display Thread (2 Hz)
┌───────────────────────┐ ┌──────────────────────┐
│ read CPU temperature │ │ atomic_load(&g_temp) │
│ atomic_store(&g_temp) ├───────►│ print terminal bar │
│ atomic_fetch_add() │_Atomic │ show stats │
│ CAS max-temp update │ └──────────────────────┘
└───────────────────────┘
▲
│ atomic_exchange(&g_calibrate, 0)
│ (consumes flag set by main thread)
Read the source and answer these questions:
- Find
_Atomic float g_temp. Which thread writes to it? Which reads? What operation does each use? - Find
atomic_fetch_add. What value does it return — the old value or the new value? - Find the
atomic_compare_exchange_weakloop. What pattern does it implement? Why does it need a loop? - Find
g_running. Why is itvolatile sig_atomic_tinstead of_Atomic int? - Find
atomic_exchange(&g_calibrate, 0). What would happen if you usedatomic_load+atomic_storeinstead ofatomic_exchange?
Build and run:
You should see a live temperature bar updating in the terminal. Press c + Enter to trigger calibration.
C11 atomic operations reference
| C11 Function | Purpose | ARM Instructions | In atomic_sensor.c |
|---|---|---|---|
atomic_store |
Write value | STR + memory barrier |
Sensor stores g_temp |
atomic_load |
Read value | LDR + memory barrier |
Display reads g_temp |
atomic_fetch_add |
Increment, return old value | LDXR + ADD + STXR loop |
Counting g_readings |
atomic_exchange |
Swap, return old value | LDXR + STXR loop |
Consuming g_calibrate flag |
atomic_compare_exchange_weak |
CAS: replace if expected | LDXR + CMP + STXR loop |
Tracking g_max_temp |
On ARM, the LDXR/STXR (Load-Exclusive / Store-Exclusive) pair implements lock-free atomic updates: the CPU marks a cache line as "exclusive," performs the operation, and STXR only succeeds if no other core touched that line. If it fails, the loop retries. On x86, these map to LOCK-prefixed instructions or CMPXCHG.
Memory ordering
By default, all atomic operations use memory_order_seq_cst (sequentially consistent) — the safest and simplest ordering. This means:
- All threads see operations in the same order — like a single-lane bridge where everyone takes turns.
- No reordering — neither the compiler nor the CPU can move atomic operations past each other.
This is almost always what you want. But C11 provides weaker orderings for performance-critical code:
| Memory Order | Guarantee | Analogy | When to use |
|---|---|---|---|
memory_order_seq_cst |
Total order across all threads | Single-lane bridge | Default — always correct |
memory_order_acquire / release |
Producer/consumer ordering | One-way gate | Store/load pairs between two threads |
memory_order_relaxed |
Only atomicity, no ordering | Free-for-all | Standalone counters, statistics |
Tip
Rule of thumb: Start with the default (seq_cst). Only weaken ordering when profiling proves it is a bottleneck — and only if you fully understand the implications. Wrong memory ordering creates bugs that appear only under load, only on certain CPUs, and are nearly impossible to debug.
Exercise: Measure the Cost
Benchmark atomic_fetch_add vs pthread_mutex_lock/unlock for 10 million increments. Create bench_sync.c:
/* bench_sync.c — compare atomic vs mutex overhead
*
* Build: gcc -Wall -pthread -O2 -o bench_sync bench_sync.c
* Run: ./bench_sync
*/
#include <stdio.h>
#include <stdatomic.h>
#include <pthread.h>
#include <time.h>
#define ITERS 10000000
static _Atomic int atomic_counter = 0;
static int mutex_counter = 0;
static pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER;
static void *atomic_worker(void *arg)
{
(void)arg;
for (int i = 0; i < ITERS; i++)
atomic_fetch_add(&atomic_counter, 1);
return NULL;
}
static void *mutex_worker(void *arg)
{
(void)arg;
for (int i = 0; i < ITERS; i++) {
pthread_mutex_lock(&mtx);
mutex_counter++;
pthread_mutex_unlock(&mtx);
}
return NULL;
}
static double run_test(void *(*fn)(void *), int nthreads)
{
pthread_t t[8];
struct timespec start, end;
clock_gettime(CLOCK_MONOTONIC, &start);
for (int i = 0; i < nthreads; i++)
pthread_create(&t[i], NULL, fn, NULL);
for (int i = 0; i < nthreads; i++)
pthread_join(t[i], NULL);
clock_gettime(CLOCK_MONOTONIC, &end);
return (end.tv_sec - start.tv_sec) + (end.tv_nsec - start.tv_nsec) / 1e9;
}
int main(void)
{
int threads[] = {1, 2, 4};
for (int t = 0; t < 3; t++) {
int n = threads[t];
atomic_counter = 0;
mutex_counter = 0;
double t_atomic = run_test(atomic_worker, n);
double t_mutex = run_test(mutex_worker, n);
printf("%d thread(s): atomic=%.3fs mutex=%.3fs (%.1fx)\n",
n, t_atomic, t_mutex, t_mutex / t_atomic);
}
return 0;
}
Question
Analyse the Results
- With 1 thread, is there much difference? Why?
- With 4 threads, how much faster are atomics?
- When would you still prefer mutex over atomics?
Exercise: relaxed counting
The g_readings counter is standalone — its value doesn't need to be ordered relative to g_temp or any other variable. This makes it safe to use memory_order_relaxed:
Open atomic_sensor.c and find the atomic_fetch_add(&g_readings, 1) call. Change it to:
Rebuild and run — the counter still works correctly. This is safe because g_readings is a standalone counter — no other variable depends on its ordering.
Exercise: break it on purpose
Remove _Atomic from g_temp (change it to plain float g_temp) and rebuild with ThreadSanitizer:
TSan will report a data race on g_temp — one thread writes while another reads without synchronization. This is exactly what _Atomic prevents. Revert your change after observing the report.
When atomics are NOT enough
Atomics work for single values. When you need to share a multi-field struct (e.g., CPU usage + temperature + memory in sys_dashboard.c), atomics cannot help — a reader could see the new CPU value with the old temperature.
| Pattern | Mechanism | Example |
|---|---|---|
| Signal flag (async) | volatile sig_atomic_t |
g_running in signal handler |
| Single value between threads | _Atomic |
g_temp, g_readings, g_calibrate |
| Multi-field struct | pthread_mutex_t |
sys_dashboard.c shared stats |
| Producer-consumer queue | Mutex + pthread_cond_t |
producer_consumer.c |
Modification exercises
These exercises modify atomic_sensor.c. Work on a copy or use git stash to save your changes.
Tip
Exercise A: Track minimum temperature
Add _Atomic float g_min_temp using the same CAS loop pattern as g_max_temp, but reversed — update when temp < cur_min. Initialize it to a high value (e.g., 1000.0f). Display it alongside the max in the display thread.
Hint: Copy the while (temp > cur_max) loop and change the comparison direction.
Tip
Exercise B: Temperature alert flag
Add _Atomic int g_alert. In the sensor thread, set it to 1 when temperature exceeds a threshold (e.g., 50°C). In the display thread, consume the alert with atomic_exchange(&g_alert, 0) and print *** ALERT *** when it fires.
Think: Why atomic_exchange instead of atomic_load + atomic_store? What race condition would the two-step version have?
Tip
Exercise C: Connect forward to level_sdl2.c
Open level_sdl2.c in the same apps/ directory. Find all _Atomic variables and for each one, identify:
- What type is it?
- Which thread writes it? Which reads?
- Which atomic operation pattern does it use (store/load, fetch-add, exchange, CAS)?
List your findings in your lab notebook. You will work with this code in the SDL2 Display Tutorial.
Checkpoint 5
| Question | Your Answer |
|---|---|
What ARM instructions does atomic_fetch_add compile to? |
|
What is the difference between volatile int and _Atomic int for threads? |
|
When would you use memory_order_relaxed instead of the default? |
|
Why volatile sig_atomic_t for g_running but _Atomic for g_calibrate? |
|
How many _Atomic variables does level_sdl2.c declare? |
|
| With 4 threads, how much faster were atomics than mutex? |
What Just Happened?
You built concurrent C programs using four different synchronization mechanisms:
| Mechanism | What You Did | When to Use | Later in Course |
|---|---|---|---|
pthread_mutex_t |
Protected a shared counter | Multi-field structs, critical sections | Kernel mutex_lock() / spin_lock() |
pthread_cond_t |
Producer-consumer ring buffer | Thread needs to sleep until condition | Kernel wait_event() / wake_up() |
sem_t |
Bounded parking lot | Limiting N concurrent accessors | Kernel struct semaphore |
_Atomic (C11) |
Lock-free sensor sharing | Single values, counters, flags | Kernel WRITE_ONCE() / READ_ONCE() |
strtok_r() |
Replaced thread-unsafe function | Any _r variant in threaded code |
Reentrant kernel APIs |
Forward references:
- Mutex → Kernel uses
mutex_lock()andspin_lock()with the same purpose but stricter rules (cannot sleep in spinlock) - Condition variables → Kernel wait queues (
wait_event()/wake_up()) follow the same pattern - Semaphores → Kernel
down()/up()onstruct semaphore— same counting semantics - Atomics → Kernel provides
atomic_twithatomic_read()/atomic_set()/atomic_add()— similar to C11 but kernel-specific API
6. Mini-Project: Live System Dashboard [Host/RPi]
Put everything together: build a multi-threaded system monitor that reads real data from /proc and /sys, shares it through a mutex-protected struct, and displays a live-updating terminal dashboard.
This exercise is designed to run on your host laptop — no RPi needed. It uses only standard Linux interfaces (/proc/stat, /proc/meminfo, /sys/class/thermal/).
Architecture
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ CPU thread │ │ MEM thread │ │ TEMP thread │
│ reads │ │ reads │ │ reads │
│ /proc/stat │ │ /proc/meminfo│ │ thermal_zone│
└──────┬──────┘ └──────┬───────┘ └──────┬──────┘
│ │ │
▼ ▼ ▼
┌──────────────────────────────────────────────┐
│ shared struct (mutex) │
│ cpu_pct │ mem_used_mb │ temp_c │ ... │
└──────────────────┬───────────────────────────┘
│
▼
┌───────────────┐
│ Display thread│
│ prints every │
│ 500ms │
└───────────────┘
Three producer threads read system data. One consumer thread (or main) displays it. A pthread_mutex_t protects the shared data.
Step 6: Starter — sys_dashboard.c
This is a skeleton. The structure and the display thread are complete. You fill in the sensor-reading functions marked with TODO.
/* sys_dashboard.c — multi-threaded system monitor
*
* Build: gcc -Wall -pthread -o sys_dashboard sys_dashboard.c
* Run: ./sys_dashboard
* Stop: Ctrl+C
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <signal.h>
#include <pthread.h>
#include <time.h>
/* ── Shared data (protected by mutex) ─────────────────── */
typedef struct {
double cpu_pct; /* CPU usage 0-100% */
long mem_total_mb;
long mem_used_mb;
double temp_c; /* CPU temperature */
int readings; /* Total readings taken */
} sys_data_t;
sys_data_t shared = {0};
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
volatile sig_atomic_t running = 1;
void handle_int(int sig) { (void)sig; running = 0; }
/* ── Helper: draw a bar ──────────────────────────────── */
static void draw_bar(char *buf, int width, double pct)
{
int filled = (int)(pct / 100.0 * width);
if (filled > width) filled = width;
if (filled < 0) filled = 0;
for (int i = 0; i < width; i++)
buf[i] = (i < filled) ? '#' : '-';
buf[width] = '\0';
}
/* ── CPU thread ──────────────────────────────────────── */
/*
* /proc/stat first line: cpu <user> <nice> <system> <idle> ...
* CPU% = 100 * (total - idle) / total (delta between two reads)
*/
static void *cpu_thread(void *arg)
{
(void)arg;
long prev_total = 0, prev_idle = 0;
while (running) {
FILE *f = fopen("/proc/stat", "r");
if (!f) { sleep(1); continue; }
/* TODO: Read the first line starting with "cpu "
* Parse the numbers: user, nice, system, idle, iowait, irq, softirq
* Compute total = user+nice+system+idle+iowait+irq+softirq
* Compute delta_total and delta_idle from previous values
* cpu_pct = 100.0 * (delta_total - delta_idle) / delta_total
*
* Hint: use fscanf(f, "cpu %ld %ld %ld %ld %ld %ld %ld",
* &user, &nice, &sys, &idle, &iow, &irq, &sirq);
*/
long user, nice, sys, idle, iow, irq, sirq;
if (fscanf(f, "cpu %ld %ld %ld %ld %ld %ld %ld",
&user, &nice, &sys, &idle, &iow, &irq, &sirq) == 7) {
long total = user + nice + sys + idle + iow + irq + sirq;
long dt = total - prev_total;
long di = idle - prev_idle;
pthread_mutex_lock(&lock);
if (dt > 0)
shared.cpu_pct = 100.0 * (dt - di) / dt;
shared.readings++;
pthread_mutex_unlock(&lock);
prev_total = total;
prev_idle = idle;
}
fclose(f);
usleep(500000); /* 500 ms */
}
return NULL;
}
/* ── Memory thread ───────────────────────────────────── */
static void *mem_thread(void *arg)
{
(void)arg;
while (running) {
FILE *f = fopen("/proc/meminfo", "r");
if (!f) { sleep(1); continue; }
/* TODO: Read MemTotal and MemAvailable from /proc/meminfo
* Each line is like: MemTotal: 8000000 kB
* mem_used = mem_total - mem_available
*
* Hint: read line by line with fgets(), use sscanf() to match:
* sscanf(line, "MemTotal: %ld kB", &total_kb)
* sscanf(line, "MemAvailable: %ld kB", &avail_kb)
*/
char line[128];
long total_kb = 0, avail_kb = 0;
while (fgets(line, sizeof(line), f)) {
sscanf(line, "MemTotal: %ld kB", &total_kb);
sscanf(line, "MemAvailable: %ld kB", &avail_kb);
}
fclose(f);
pthread_mutex_lock(&lock);
shared.mem_total_mb = total_kb / 1024;
shared.mem_used_mb = (total_kb - avail_kb) / 1024;
pthread_mutex_unlock(&lock);
usleep(1000000); /* 1 s */
}
return NULL;
}
/* ── Temperature thread ──────────────────────────────── */
static void *temp_thread(void *arg)
{
(void)arg;
while (running) {
FILE *f = fopen("/sys/class/thermal/thermal_zone0/temp", "r");
if (!f) {
/* No thermal zone — set to -1 so display knows */
pthread_mutex_lock(&lock);
shared.temp_c = -1;
pthread_mutex_unlock(&lock);
sleep(2);
continue;
}
/* TODO: Read the millidegree value, convert to Celsius
*
* Hint: the file contains a single integer like 42000 (= 42.0 C)
* int mc; fscanf(f, "%d", &mc); temp = mc / 1000.0;
*/
int mc;
if (fscanf(f, "%d", &mc) == 1) {
pthread_mutex_lock(&lock);
shared.temp_c = mc / 1000.0;
pthread_mutex_unlock(&lock);
}
fclose(f);
usleep(2000000); /* 2 s */
}
return NULL;
}
/* ── Display (main loop) ─────────────────────────────── */
int main(void)
{
struct sigaction sa = { .sa_handler = handle_int };
sigaction(SIGINT, &sa, NULL);
pthread_t t_cpu, t_mem, t_temp;
pthread_create(&t_cpu, NULL, cpu_thread, NULL);
pthread_create(&t_mem, NULL, mem_thread, NULL);
pthread_create(&t_temp, NULL, temp_thread, NULL);
printf("\n System Dashboard (Ctrl+C to stop)\n");
while (running) {
/* Snapshot shared data under lock */
pthread_mutex_lock(&lock);
sys_data_t snap = shared;
pthread_mutex_unlock(&lock);
/* Build bars */
char cpu_bar[31], mem_bar[31];
draw_bar(cpu_bar, 30, snap.cpu_pct);
double mem_pct = snap.mem_total_mb > 0
? 100.0 * snap.mem_used_mb / snap.mem_total_mb : 0;
draw_bar(mem_bar, 30, mem_pct);
/* Print dashboard (ANSI escape to overwrite) */
printf("\033[2J\033[H"); /* clear screen, cursor home */
printf(" ╔══════════════════════════════════════════╗\n");
printf(" ║ SYSTEM DASHBOARD #%-5d ║\n", snap.readings);
printf(" ╠══════════════════════════════════════════╣\n");
printf(" ║ CPU [%s] %5.1f%% ║\n", cpu_bar, snap.cpu_pct);
printf(" ║ MEM [%s] %5.1f%% ║\n", mem_bar, mem_pct);
printf(" ║ used: %ld / %ld MB %*s║\n",
snap.mem_used_mb, snap.mem_total_mb,
(int)(28 - snprintf(NULL, 0, "%ld / %ld MB",
snap.mem_used_mb, snap.mem_total_mb)), "");
if (snap.temp_c >= 0)
printf(" ║ TEMP %.1f C %30s║\n", snap.temp_c, "");
else
printf(" ║ TEMP (no sensor) %23s║\n", "");
printf(" ╠══════════════════════════════════════════╣\n");
printf(" ║ Ctrl+C = stop kill -SIGUSR1 = TODO ║\n");
printf(" ╚══════════════════════════════════════════╝\n");
fflush(stdout);
usleep(500000); /* refresh at 2 Hz */
}
printf("\n Shutting down...\n");
pthread_join(t_cpu, NULL);
pthread_join(t_mem, NULL);
pthread_join(t_temp, NULL);
printf(" %d readings taken. Goodbye!\n", shared.readings);
return 0;
}
Build and Run
You should see a live-updating box with CPU usage, memory usage, and temperature. Open another terminal and generate some CPU load to see the bar move:
# In another terminal — watch the CPU bar jump
stress-ng --cpu 2 --timeout 10s
# or: while true; do :; done & (kill it after)
Your Tasks
The skeleton above is complete and runs. Now extend it — pick at least two:
Tip
Task A: Add SIGUSR1 for stats snapshot
Add a SIGUSR1 handler that prints a one-line stats summary to stderr (so it doesn't mess up the dashboard). Pattern: set a volatile sig_atomic_t flag, check it in the display loop.
Tip
Task B: CSV logging thread
Add a 4th thread that appends a CSV row every second:
Open the file once, lock the mutex, snapshot, unlock, fprintf, fflush. When the program exits (SIGINT), the file should be complete and valid.
Tip
Task C: High/low alerts
Track min/max for CPU and temperature in the shared struct. Display them in the dashboard. Add a flashing ** HIGH TEMP ** warning when temperature exceeds 70 C (use ANSI color: \033[31m red, \033[0m reset).
Tip
Task D: Load average from /proc/loadavg
Add a new field to the shared struct and read /proc/loadavg (format: 0.15 0.20 0.18 1/234 5678). Display the 1-minute load average in the dashboard. This one is easy — good warm-up.
Checkpoint 6
| Question | Your Answer |
|---|---|
| Which tasks did you complete (A/B/C/D)? | |
| What happens if you remove the mutex lock in the display loop? | |
Does the dashboard still work if /sys/class/thermal/ doesn't exist? |
|
How many threads are running? (hint: ps -eLf \| grep sys_dashboard) |
Challenges
Tip
Challenge 1: Thread Pool
Implement a thread pool with N worker threads and a shared task queue. Use a mutex and condition variable for the queue. Submit 20 tasks (each sleeps a random time and prints a message). Compare performance with N=1, N=4, and N=8 workers.
Tip
Challenge 2: Readers-Writers Lock
Implement a readers-writers lock using mutexes and condition variables. Multiple readers can access data simultaneously, but writers need exclusive access. Test with 5 reader threads and 2 writer threads. Compare with pthread_rwlock_t.
Tip
Challenge 3: Lock-Free Ring Buffer
Rewrite the producer-consumer ring buffer using only _Atomic variables (no mutex, no condvar). Use atomic_load and atomic_store on the head/tail indices. This is a classic lock-free pattern used in high-performance systems. Test with TSan to verify correctness.
Deliverable
- [ ]
thread_demo.ccompiles and runs — demonstrates race condition without mutex, correct result with mutex - [ ]
thread_scope.ccompiles and runs — shows local/static/global scoping and PID vs TID - [ ] TSan report observed for
worker_unsafe, clean run forworker_safe - [ ]
strtok_threads.ccompiles and runs — garbled output withstrtok(), correct withstrtok_r() - [ ] Man page MT-Safety attributes checked for at least 3 functions
- [ ]
producer_consumer.ccompiles and runs — circular buffer fills and drains, consumer sleeps when empty - [ ] At least 2 of the condvar "What Happens When" experiments completed
- [ ]
parking_lot.ccompiles and runs — semaphore limits concurrent parking - [ ] Semaphore experiments completed: CAPACITY=1 (binary), CAPACITY=8 (no wait), CAPACITY=0 (deadlock)
- [ ]
atomic_sensor.cbuilds and runs — live temperature display with atomic variables - [ ]
bench_sync.ccompiles and runs — atomic vs mutex benchmarked with 1/2/4 threads - [ ] Exercise A or B completed —
g_min_temporg_alertadded toatomic_sensor.c - [ ] Exercise C completed — all
_Atomicvariables inlevel_sdl2.cidentified and documented - [ ]
sys_dashboard.ccompiles and runs — live dashboard with CPU, memory, temperature - [ ] At least two of the dashboard extension tasks (A/B/C/D) completed
- [ ] (Optional) At least one advanced challenge completed
Next Steps
You now have the C fundamentals for kernel driver development. Next: write your first kernel module in Tutorial: MCP9808 Kernel Driver, where you will see probe(), read(), and interrupt handlers that follow the same patterns you practiced here.
For network programming, continue to Tutorial: Network Sockets.