Skip to content

Threads and Synchronization in C

Time: 90 min | Prerequisites: Processes and IPC | Theory companion: Linux Fundamentals, Sections 1.4–1.5


Learning Objectives

By the end of this tutorial you will be able to:

  • Create POSIX threads with pthread_create() and protect shared data with pthread_mutex_t
  • Identify thread-unsafe functions and replace them with reentrant alternatives
  • Use condition variables for efficient producer-consumer communication
  • Use POSIX semaphores to limit concurrent access to bounded resources
  • Use C11 _Atomic types and operations for lock-free thread communication
  • Build a multi-threaded system dashboard that reads from /proc and /sys
Before You Start

All exercises run on any Linux machine — your host laptop (Ubuntu, Fedora, Arch, etc.) or the Raspberry Pi via SSH. You need gcc installed:

gcc --version           # should print version info
mkdir -p ~/threads
cd ~/threads

You should have completed the Processes and IPC tutorial first. The thread concepts here build directly on the fork/pipe/signal patterns you learned there.


1. Threads with pthreads

Processes are isolated — each has its own memory. Threads share the same address space, which makes data sharing trivial but introduces race conditions.

Step 1: thread_demo.c

Question

Predict Before You Run

Two threads each increment a shared counter 1,000,000 times. What final value do you expect?

  • If the threads ran sequentially: ___
  • If two threads increment simultaneously without protection: ___

Write down your prediction, then compile and run.

/* thread_demo.c — shared counter with and without mutex */
#include <stdio.h>
#include <pthread.h>

#define ITERATIONS 1000000

int counter = 0;
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

void *worker_unsafe(void *arg)
{
    (void)arg;
    for (int i = 0; i < ITERATIONS; i++)
        counter++;          /* Race condition! */
    return NULL;
}

void *worker_safe(void *arg)
{
    (void)arg;
    for (int i = 0; i < ITERATIONS; i++) {
        pthread_mutex_lock(&lock);
        counter++;
        pthread_mutex_unlock(&lock);
    }
    return NULL;
}

int main(void)
{
    pthread_t t1, t2;

    /* Round 1: without mutex */
    counter = 0;
    pthread_create(&t1, NULL, worker_unsafe, NULL);
    pthread_create(&t2, NULL, worker_unsafe, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    printf("Without mutex: %d (expected %d)\n", counter, 2 * ITERATIONS);

    /* Round 2: with mutex */
    counter = 0;
    pthread_create(&t1, NULL, worker_safe, NULL);
    pthread_create(&t2, NULL, worker_safe, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    printf("With mutex:    %d (expected %d)\n", counter, 2 * ITERATIONS);

    return 0;
}
gcc -Wall -pthread -o thread_demo thread_demo.c
./thread_demo

The "without mutex" result will be less than 2,000,000 — both threads read, increment, and write the counter simultaneously, losing updates. The "with mutex" result is always exactly 2,000,000.

Threads share code, data, and file descriptors — only the stack and registers are private per thread.

POSIX threads architecture — all threads share the heap, global data, and file descriptors.

Exercise: Quantify the Race

Run the demo 20 times and record the spread:

for i in $(seq 1 20); do ./thread_demo | head -1; done | sort -t: -k2 -n
Question

Analyse the Results

  • What is the lowest value you saw? The highest?
  • Change ITERATIONS to 100. Does the race still appear? Why or why not?
  • Change ITERATIONS to 10000000. Does the spread get wider or narrower?

Exercise: ThreadSanitizer

Compile with ThreadSanitizer to detect the race automatically:

gcc -Wall -pthread -fsanitize=thread -g -o thread_demo_tsan thread_demo.c
./thread_demo_tsan

TSan prints a data race report showing which threads access counter and from which source lines. Now comment out the worker_unsafe round (keep only worker_safe) and rebuild with TSan — the report disappears, confirming the mutex fixes the race.

Tip

ThreadSanitizer is your best friend for concurrent code. Get in the habit of compiling with -fsanitize=thread during development. It catches races that only manifest under specific timing — races you might never see in normal testing.

Exercise: Scale the Contention

Change thread_demo.c to use 4 threads instead of 2 (expected total: 4,000,000):

pthread_t threads[4];
for (int i = 0; i < 4; i++)
    pthread_create(&threads[i], NULL, worker_unsafe, NULL);
for (int i = 0; i < 4; i++)
    pthread_join(threads[i], NULL);
Question

Predict Before You Run

  • Will the data loss be worse with 4 threads than with 2? Why?
  • Run 10 times and compare the spread to the 2-thread version.

Exercise: Observe the Race Condition

Run the demo 10 times and count how many runs produce the wrong result:

for i in $(seq 1 10); do ./thread_demo | head -1; done

On a multi-core Pi 4, most runs will show data loss. On a single-core system, the race is harder to trigger but still exists.

Exercise: Variable Scoping Across Threads

Threads share global and static variables but each has its own local (stack) variables. This program makes it visible:

/* thread_scope.c — local vs static vs global in threads */
#define _GNU_SOURCE
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/types.h>

int g = 0;   /* Global — shared by all threads */

void *worker(void *arg)
{
    (void)arg;
    static int s = 0;   /* Static — shared by all calls to this function */
    int local = 0;       /* Local — private to each thread (on its own stack) */

    ++local;
    ++s;
    ++g;

    printf("PID: %d, TID: %d, Local: %d, Static: %d, Global: %d\n",
           getpid(), gettid(), local, s, g);
    return NULL;
}

int main(void)
{
    printf("Main — PID: %d, TID: %d\n", getpid(), gettid());

    pthread_t threads[5];
    for (int i = 0; i < 5; i++)
        pthread_create(&threads[i], NULL, worker, NULL);

    for (int i = 0; i < 5; i++)
        pthread_join(threads[i], NULL);

    printf("Final — Global: %d (expected 5)\n", g);
    return 0;
}
gcc -Wall -pthread -o thread_scope thread_scope.c
./thread_scope

All threads share the same PID but each has a unique TID. The local variable is always 1 (private stack). The static and global variables accumulate across threads.

Warning

Classic bug: passing &i to threads. A common mistake when creating threads in a loop:

for (int i = 0; i < 5; i++)
    pthread_create(&tid, NULL, worker, (void *)&i);  /* BUG! */

All threads receive a pointer to the same i. By the time the thread reads *arg, the loop may have advanced — so multiple threads see the same value (often 5), and some values are skipped entirely. Fix: pass the value directly with (void *)(intptr_t)i or allocate a per-thread copy.

Note

Under the hood, Linux uses clone() for everything.

  • fork() = clone() with separate mm_struct (separate page tables, separate memory)
  • pthread_create() = clone() with CLONE_VM | CLONE_FS | CLONE_FILES (shared memory, shared FDs)

This is why Linux process and thread creation share the same kernel code path. The flags determine the level of sharing.

Checkpoint 1

Question Your Answer
Counter value without mutex (one run)
Counter value with mutex
How many of 20 runs showed data loss?
What was the min/max spread?
In thread_scope, is local always 1? Why?
Do all threads share the same PID or TID?
Did TSan report a race for worker_unsafe? For worker_safe?

2. What Is Thread Safety?

A function (or data structure) is thread-safe if it produces correct results when called simultaneously from multiple threads. Not all C library functions are thread-safe — knowing how to check is a critical skill.

Four Levels of Safety

Level Meaning Example
Thread-unsafe Uses hidden shared state; breaks under concurrent access strtok(), asctime(), rand()
Conditionally safe Safe if each thread uses its own instance strtok_r() with per-thread saveptr
Thread-safe Safe to call from any thread at any time printf() (internally locked), strlen()
Reentrant No shared state at all; safe even in signal handlers memcpy(), strlen(), pure computations

Hands-on: strtok Is Broken Under Threads

strtok() uses an internal static pointer to track its position. When two threads call it on different strings, they corrupt each other's state.

/* strtok_threads.c — prove strtok is NOT thread-safe
 *
 * Build:  gcc -Wall -pthread -o strtok_threads strtok_threads.c
 * Run:    ./strtok_threads
 */
#include <stdio.h>
#include <string.h>
#include <pthread.h>

void *tokenize(void *arg)
{
    char *input = strdup((char *)arg);   /* each thread gets its own copy */
    char *token = strtok(input, ",");
    while (token) {
        printf("[Thread %s] token: '%s'\n", (char *)arg, token);
        token = strtok(NULL, ",");
    }
    free(input);
    return NULL;
}

int main(void)
{
    pthread_t t1, t2;
    pthread_create(&t1, NULL, tokenize, "alpha,beta,gamma");
    pthread_create(&t2, NULL, tokenize, "one,two,three");
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    return 0;
}
gcc -Wall -pthread -o strtok_threads strtok_threads.c
./strtok_threads

Run it several times. You will see garbled output — tokens from one string appear in the other thread's output, or tokens are skipped entirely. The internal static variable inside strtok() is shared across both threads.

Fix: Use the Reentrant Version

Replace strtok() with strtok_r(), which takes an explicit saveptr instead of using hidden static state:

void *tokenize_safe(void *arg)
{
    char *input = strdup((char *)arg);
    char *saveptr;   /* thread-local state */
    char *token = strtok_r(input, ",", &saveptr);
    while (token) {
        printf("[Thread %s] token: '%s'\n", (char *)arg, token);
        token = strtok_r(NULL, ",", &saveptr);
    }
    free(input);
    return NULL;
}
Question

Exercise: Check the Man Page

Run man 3 strtok and look for the ATTRIBUTES section. You will see:

Function MT-Safety
strtok() MT-Unsafe race:strtok
strtok_r() MT-Safe

Now check these functions the same way. Are they MT-Safe or MT-Unsafe?

  • printf() → ___
  • asctime() → ___
  • asctime_r() → ___
  • rand() → ___
  • rand_r() → ___

Quick Checklist: Is This Function Thread-Safe?

Before using a C library function in threaded code, ask:

  1. Does it use static/global variables internally? → Likely unsafe
  2. Does man 3 <function> say MT-Unsafe? → Definitely unsafe
  3. Does a _r (reentrant) variant exist? → Use that instead
  4. Does it only operate on its arguments? → Likely safe

3. Condition Variables — Producer-Consumer

The dashboard (Section 6) uses usleep() to poll for new data. This works but wastes CPU cycles. In real embedded systems, you want threads to sleep until data arrives. That's what condition variables do.

The Problem with Polling

/* Busy-waiting — bad! */
while (1) {
    pthread_mutex_lock(&lock);
    if (data_ready) { process(); data_ready = 0; }
    pthread_mutex_unlock(&lock);
    usleep(1000);    /* wastes 1ms between checks */
}

The Solution: pthread_cond_wait

/* Efficient — thread sleeps until signaled */
pthread_mutex_lock(&lock);
while (!data_ready)
    pthread_cond_wait(&cond, &lock);   /* atomically: unlock + sleep + relock */
process();
data_ready = 0;
pthread_mutex_unlock(&lock);

pthread_cond_wait does three things atomically:

  1. Unlocks the mutex (so the producer can write)
  2. Puts the thread to sleep (no CPU usage)
  3. When signaled, re-locks the mutex and returns

Step 3: producer_consumer.c

Question

Predict the Buffer

The producer writes at 200 ms intervals (5 Hz). The consumer reads at 500 ms intervals (2 Hz). The buffer holds 8 items.

  • After 4 seconds, approximately how many items has the producer written? ___
  • How many has the consumer read? ___
  • Is the buffer full, partially full, or empty? ___

Write down your prediction, then run the program and check.

A sensor producer generates data into a circular buffer, a consumer processes it. The consumer sleeps when the buffer is empty.

/* producer_consumer.c — circular buffer with condition variables
 *
 * Build:  gcc -Wall -pthread -o producer_consumer producer_consumer.c
 * Run:    ./producer_consumer
 */
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
#include <signal.h>
#include <time.h>

#define BUF_SIZE 8

/* ── Circular buffer ──────────────────────────────────── */

typedef struct {
    int    data[BUF_SIZE];
    int    head;           /* next write position */
    int    tail;           /* next read position */
    int    count;          /* items in buffer */
    pthread_mutex_t lock;
    pthread_cond_t  not_empty;
    pthread_cond_t  not_full;
} ringbuf_t;

ringbuf_t buf = {
    .head = 0, .tail = 0, .count = 0,
    .lock = PTHREAD_MUTEX_INITIALIZER,
    .not_empty = PTHREAD_COND_INITIALIZER,
    .not_full  = PTHREAD_COND_INITIALIZER,
};

volatile sig_atomic_t running = 1;
void handle_int(int sig) { (void)sig; running = 0; }

/* ── Producer: simulated sensor ───────────────────────── */

void *producer(void *arg)
{
    (void)arg;
    int reading = 0;

    while (running) {
        /* Simulate a sensor reading (20-30 C with noise) */
        int value = 20000 + (rand() % 10000);   /* millidegrees */
        reading++;

        pthread_mutex_lock(&buf.lock);

        /* Wait if buffer is full */
        while (buf.count == BUF_SIZE && running)
            pthread_cond_wait(&buf.not_full, &buf.lock);

        if (!running) {
            pthread_mutex_unlock(&buf.lock);
            break;
        }

        /* Write to buffer */
        buf.data[buf.head] = value;
        buf.head = (buf.head + 1) % BUF_SIZE;
        buf.count++;

        printf("[Producer] #%d: wrote %d mC (buffer: %d/%d)\n",
               reading, value, buf.count, BUF_SIZE);

        /* Wake consumer */
        pthread_cond_signal(&buf.not_empty);
        pthread_mutex_unlock(&buf.lock);

        usleep(200000);   /* 200ms — sensor sample rate */
    }
    return NULL;
}

/* ── Consumer: process and log ────────────────────────── */

void *consumer(void *arg)
{
    (void)arg;
    int processed = 0;
    long sum = 0;

    while (running || buf.count > 0) {
        pthread_mutex_lock(&buf.lock);

        /* Wait if buffer is empty */
        while (buf.count == 0 && running)
            pthread_cond_wait(&buf.not_empty, &buf.lock);

        if (buf.count == 0) {
            pthread_mutex_unlock(&buf.lock);
            break;
        }

        /* Read from buffer */
        int value = buf.data[buf.tail];
        buf.tail = (buf.tail + 1) % BUF_SIZE;
        buf.count--;

        /* Wake producer if it was waiting on a full buffer */
        pthread_cond_signal(&buf.not_full);
        pthread_mutex_unlock(&buf.lock);

        /* Process outside the lock (simulate slow processing) */
        processed++;
        sum += value;
        double avg = (double)sum / processed / 1000.0;
        printf("  [Consumer] #%d: read %d mC, running avg: %.1f C\n",
               processed, value, avg);

        usleep(500000);   /* 500ms — consumer is slower than producer */
    }

    printf("  [Consumer] Done: %d readings, final avg: %.1f C\n",
           processed, processed > 0 ? (double)sum / processed / 1000.0 : 0);
    return NULL;
}

int main(void)
{
    srand(time(NULL));
    struct sigaction sa = { .sa_handler = handle_int };
    sigaction(SIGINT, &sa, NULL);

    printf("Producer-Consumer demo (Ctrl+C to stop)\n");
    printf("Producer: 200ms interval, Consumer: 500ms interval\n");
    printf("Buffer size: %d — watch it fill up!\n\n", BUF_SIZE);

    pthread_t t_prod, t_cons;
    pthread_create(&t_prod, NULL, producer, NULL);
    pthread_create(&t_cons, NULL, consumer, NULL);

    /* Wait for Ctrl+C */
    while (running)
        pause();

    /* Wake threads that might be waiting on conditions */
    pthread_cond_broadcast(&buf.not_empty);
    pthread_cond_broadcast(&buf.not_full);

    pthread_join(t_prod, NULL);
    pthread_join(t_cons, NULL);

    return 0;
}

Build and Run

gcc -Wall -pthread -o producer_consumer producer_consumer.c
./producer_consumer

Watch the buffer fill up — the producer writes at 5 Hz but the consumer only reads at 2 Hz. The buffer acts as a shock absorber. When it fills to 8/8, the producer sleeps until the consumer catches up.

[Producer] #1: wrote 25431 mC (buffer: 1/8)
  [Consumer] #1: read 25431 mC, running avg: 25.4 C
[Producer] #2: wrote 21087 mC (buffer: 1/8)
[Producer] #3: wrote 28943 mC (buffer: 2/8)
  [Consumer] #2: read 21087 mC, running avg: 23.3 C
[Producer] #4: wrote 23156 mC (buffer: 2/8)
[Producer] #5: wrote 27891 mC (buffer: 3/8)
...
Note

Why while (!data_ready) and not if (!data_ready)? Condition variables can have spurious wakeups — the OS may wake the thread without a signal. The while loop re-checks the condition after waking. This is a universal rule: always use while with pthread_cond_wait, never if.

Exercise: What Happens When...

Try each of these modifications one at a time and observe the result:

Question

Experiment A — Swap the speeds

Make the producer slower (500 ms) and consumer faster (200 ms). What happens to the buffer fill level? Does the consumer ever wait?

Question

Experiment B — Tiny buffer

Set BUF_SIZE to 1. How does the output change? Is the producer blocked most of the time?

Question

Experiment C — Remove the signal

Comment out the pthread_cond_signal(&buf.not_empty) line in the producer. What happens to the consumer? Why?

Question

Experiment D — Spurious Wakeups

Change the while (buf.count == 0 && running) in the consumer to if (buf.count == 0 && running). Compile with TSan (-fsanitize=thread). Can you observe incorrect behaviour?

Tip

Detect race conditions automatically. Compile with ThreadSanitizer to catch bugs:

gcc -Wall -pthread -fsanitize=thread -g -o app app.c
./app    # prints data race reports at runtime

Or use Valgrind's Helgrind (slower but catches lock-order violations too):

valgrind --tool=helgrind ./app

Checkpoint 3

Question Your Answer
What happens when the buffer fills to 8/8?
Why does the consumer use while (buf.count == 0) not if?
What would happen without pthread_cond_broadcast in main?
When you swapped speeds, did the buffer ever fill up?

4. POSIX Semaphores — Bounded Resources

A counting semaphore is an integer counter with two atomic operations: wait (decrement, block if zero) and post (increment, wake a waiter). Think of it as "N permits available."

Concept: Parking Lot

Imagine a parking lot with 3 spaces. A car can enter (wait → decrement) only if a space is free. When a car leaves (post → increment), another can enter.

sem_init(&spots, 0, 3)     →  spots = 3

Car A arrives: sem_wait     →  spots = 2  (enters)
Car B arrives: sem_wait     →  spots = 1  (enters)
Car C arrives: sem_wait     →  spots = 0  (enters)
Car D arrives: sem_wait     →  spots = 0  (BLOCKS — lot full)

Car B leaves:  sem_post     →  spots = 1  (Car D unblocks, enters)

Semaphore vs Condition Variable

sem_t (semaphore) pthread_cond_t (condvar)
Has state Yes — internal counter No — stateless signal
Remembers signals Yes — sem_post before sem_wait still counts No — signal is lost if nobody is waiting
Needs mutex No — self-contained Yes — always paired with a mutex
Spurious wakeups No Yes — must use while loop
Async-signal-safe sem_post is safe No — cannot use in signal handlers
Best for Limiting concurrency (N permits) Notifying state changes

Step 4: parking_lot.c

/* parking_lot.c — POSIX semaphore demo
 *
 * Build:  gcc -Wall -pthread -o parking_lot parking_lot.c
 * Run:    ./parking_lot
 */
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <semaphore.h>
#include <unistd.h>
#include <time.h>

#define NUM_CARS  8
#define CAPACITY  3

sem_t spots;

void *car(void *arg)
{
    int id = (int)(intptr_t)arg;

    printf("Car %d: arriving...\n", id);

    struct timespec start, end;
    clock_gettime(CLOCK_MONOTONIC, &start);

    sem_wait(&spots);   /* block if lot is full */

    clock_gettime(CLOCK_MONOTONIC, &end);
    double waited = (end.tv_sec - start.tv_sec)
                  + (end.tv_nsec - start.tv_nsec) / 1e9;

    printf("Car %d: PARKED (waited %.2f s)\n", id, waited);

    /* Simulate being parked for 1-3 seconds */
    usleep((1000 + rand() % 2000) * 1000);

    printf("Car %d: leaving\n", id);
    sem_post(&spots);   /* free the spot */

    return NULL;
}

int main(void)
{
    srand(time(NULL));
    sem_init(&spots, 0, CAPACITY);

    printf("Parking lot: %d spots, %d cars\n\n", CAPACITY, NUM_CARS);

    pthread_t threads[NUM_CARS];
    for (int i = 0; i < NUM_CARS; i++) {
        pthread_create(&threads[i], NULL, car, (void *)(intptr_t)(i + 1));
        usleep(200000);   /* stagger arrivals by 200ms */
    }

    for (int i = 0; i < NUM_CARS; i++)
        pthread_join(threads[i], NULL);

    sem_destroy(&spots);
    printf("\nAll cars served.\n");
    return 0;
}
gcc -Wall -pthread -o parking_lot parking_lot.c
./parking_lot

You will see 3 cars park immediately, the rest wait. As each car leaves, the next one enters.

Experiments

Question

Experiment A — Semaphore as mutex

Set CAPACITY=1. How does the output change? A semaphore with count 1 behaves like a mutex — only one car at a time. This is called a binary semaphore.

Question

Experiment B — No contention

Set CAPACITY=8 (same as NUM_CARS). Does any car wait? What is the maximum wait time?

Question

Experiment C — Deadlock

Set CAPACITY=0. What happens? Every car blocks on sem_wait because the count starts at zero. No thread ever calls sem_post, so the program hangs forever. This is a deadlock. Press Ctrl+C to escape.

Question

Experiment D — Measure contention

Try CAPACITY=3 vs CAPACITY=5. Compare the average wait times. How does the number of spots affect throughput?

POSIX Semaphore API Reference

Function Purpose
sem_init(&sem, 0, N) Initialize unnamed semaphore with count N (0 = process-local)
sem_wait(&sem) Decrement; block if count is 0
sem_trywait(&sem) Decrement if count > 0; return EAGAIN otherwise (non-blocking)
sem_post(&sem) Increment; wake one waiting thread (async-signal-safe)
sem_getvalue(&sem, &val) Read current count
sem_destroy(&sem) Clean up
Note

Named vs unnamed semaphores. sem_init() creates an unnamed semaphore (thread-level). For IPC between unrelated processes, use sem_open("/my_sem", O_CREAT, 0644, N) instead — this creates a named semaphore visible in /dev/shm/. We use unnamed semaphores here because all our threads share the same address space.

Checkpoint 4

Question Your Answer
With CAPACITY=3, how many cars parked simultaneously?
With CAPACITY=1, did it behave like a mutex?
With CAPACITY=0, what happened?
Is sem_post async-signal-safe? Is pthread_cond_signal?

5. C11 Atomics — Lock-Free Shared Data

Mutexes work but carry overhead — locking, unlocking, and potentially blocking the thread if someone else holds the lock. For sharing a single value between threads (a sensor reading, a counter, a flag), C11 _Atomic types let the CPU update the value in one instruction — no lock, no contention.

Why volatile is not enough

A common misconception: "volatile prevents caching, so it makes shared variables thread-safe." It doesn't. volatile only prevents the compiler from optimizing away reads/writes — it does nothing about CPU reordering or atomicity of read-modify-write operations.

Question

Predict Before You Run

The program below uses volatile int counter and two threads each incrementing it 1,000,000 times. Will the result be correct (2,000,000)? Why or why not?

Remember: counter++ compiles to three CPU operations: load, add, store.

Create this file to see the problem:

/* volatile_broken.c — prove that volatile does NOT fix data races */
#include <stdio.h>
#include <pthread.h>

static volatile int counter = 0;

static void *increment(void *arg)
{
    (void)arg;
    for (int i = 0; i < 1000000; i++)
        counter++;   /* NOT atomic: load, add, store — 3 steps */
    return NULL;
}

int main(void)
{
    pthread_t t1, t2;
    pthread_create(&t1, NULL, increment, NULL);
    pthread_create(&t2, NULL, increment, NULL);
    pthread_join(t1, NULL);
    pthread_join(t2, NULL);
    printf("counter = %d (expected 2000000)\n", counter);
    return 0;
}
gcc -Wall -pthread -o volatile_broken volatile_broken.c
./volatile_broken    # run 5 times — result varies, always < 2000000

Compare this to thread_demo.c from Section 1 — same bug, same root cause: counter++ is three operations (load, add, store) that can interleave between threads.

The synchronization spectrum

volatile volatile sig_atomic_t _Atomic (C11) pthread_mutex_t
Prevents compiler reorder Yes Yes Yes Yes
Prevents CPU reorder No No Yes Yes
Atomic read-modify-write No No Yes Yes (via lock)
Safe between threads No No Yes Yes
Safe in signal handlers No Yes No No
Overhead None None ~1 instruction Syscall if contended
Use case HW registers Signal flags Counters, sensor values Multi-field structs
Warning

_Atomic is NOT async-signal-safe. The C and POSIX standards only guarantee volatile sig_atomic_t for variables shared between a signal handler and normal code. Keep using volatile sig_atomic_t for signal handlers — use _Atomic for thread-to-thread sharing.

Exercise: See the CPU Instructions

Compile volatile_broken.c to assembly and compare the increment:

# Plain volatile increment
gcc -S -O2 -o volatile.s volatile_broken.c
grep -A5 'counter' volatile.s

# Now change 'volatile int' to '_Atomic int' and recompile
gcc -S -O2 -o atomic.s volatile_broken.c
grep -A5 'counter' atomic.s
Question

Compare the output

  • On x86: look for lock prefix instructions (e.g., lock addl)
  • On ARM: look for ldxr/stxr (load-exclusive/store-exclusive) loops
  • How many instructions does the plain version use? The atomic version?

Guided reading: atomic_sensor.c

The course repository contains a complete program that demonstrates all four core atomic operations in a realistic sensor→display architecture. Instead of copying code from this page, you will read the source and answer questions about it.

cd ~/embedded-linux/apps/processes-and-ipc
cat -n atomic_sensor.c    # read the full source

Architecture overview:

 Sensor Thread (10 Hz)           Display Thread (2 Hz)
 ┌───────────────────────┐        ┌──────────────────────┐
 │ read CPU temperature  │        │ atomic_load(&g_temp) │
 │ atomic_store(&g_temp) ├───────►│ print terminal bar   │
 │ atomic_fetch_add()    │_Atomic │ show stats           │
 │ CAS max-temp update   │        └──────────────────────┘
 └───────────────────────┘
          │ atomic_exchange(&g_calibrate, 0)
          │ (consumes flag set by main thread)

Read the source and answer these questions:

  1. Find _Atomic float g_temp. Which thread writes to it? Which reads? What operation does each use?
  2. Find atomic_fetch_add. What value does it return — the old value or the new value?
  3. Find the atomic_compare_exchange_weak loop. What pattern does it implement? Why does it need a loop?
  4. Find g_running. Why is it volatile sig_atomic_t instead of _Atomic int?
  5. Find atomic_exchange(&g_calibrate, 0). What would happen if you used atomic_load + atomic_store instead of atomic_exchange?

Build and run:

make atomic_sensor
./atomic_sensor        # Ctrl-C to stop

You should see a live temperature bar updating in the terminal. Press c + Enter to trigger calibration.

C11 atomic operations reference

C11 Function Purpose ARM Instructions In atomic_sensor.c
atomic_store Write value STR + memory barrier Sensor stores g_temp
atomic_load Read value LDR + memory barrier Display reads g_temp
atomic_fetch_add Increment, return old value LDXR + ADD + STXR loop Counting g_readings
atomic_exchange Swap, return old value LDXR + STXR loop Consuming g_calibrate flag
atomic_compare_exchange_weak CAS: replace if expected LDXR + CMP + STXR loop Tracking g_max_temp

On ARM, the LDXR/STXR (Load-Exclusive / Store-Exclusive) pair implements lock-free atomic updates: the CPU marks a cache line as "exclusive," performs the operation, and STXR only succeeds if no other core touched that line. If it fails, the loop retries. On x86, these map to LOCK-prefixed instructions or CMPXCHG.

Memory ordering

By default, all atomic operations use memory_order_seq_cst (sequentially consistent) — the safest and simplest ordering. This means:

  1. All threads see operations in the same order — like a single-lane bridge where everyone takes turns.
  2. No reordering — neither the compiler nor the CPU can move atomic operations past each other.

This is almost always what you want. But C11 provides weaker orderings for performance-critical code:

Memory Order Guarantee Analogy When to use
memory_order_seq_cst Total order across all threads Single-lane bridge Default — always correct
memory_order_acquire / release Producer/consumer ordering One-way gate Store/load pairs between two threads
memory_order_relaxed Only atomicity, no ordering Free-for-all Standalone counters, statistics
Tip

Rule of thumb: Start with the default (seq_cst). Only weaken ordering when profiling proves it is a bottleneck — and only if you fully understand the implications. Wrong memory ordering creates bugs that appear only under load, only on certain CPUs, and are nearly impossible to debug.

Exercise: Measure the Cost

Benchmark atomic_fetch_add vs pthread_mutex_lock/unlock for 10 million increments. Create bench_sync.c:

/* bench_sync.c — compare atomic vs mutex overhead
 *
 * Build:  gcc -Wall -pthread -O2 -o bench_sync bench_sync.c
 * Run:    ./bench_sync
 */
#include <stdio.h>
#include <stdatomic.h>
#include <pthread.h>
#include <time.h>

#define ITERS 10000000

static _Atomic int atomic_counter = 0;
static int mutex_counter = 0;
static pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER;

static void *atomic_worker(void *arg)
{
    (void)arg;
    for (int i = 0; i < ITERS; i++)
        atomic_fetch_add(&atomic_counter, 1);
    return NULL;
}

static void *mutex_worker(void *arg)
{
    (void)arg;
    for (int i = 0; i < ITERS; i++) {
        pthread_mutex_lock(&mtx);
        mutex_counter++;
        pthread_mutex_unlock(&mtx);
    }
    return NULL;
}

static double run_test(void *(*fn)(void *), int nthreads)
{
    pthread_t t[8];
    struct timespec start, end;
    clock_gettime(CLOCK_MONOTONIC, &start);
    for (int i = 0; i < nthreads; i++)
        pthread_create(&t[i], NULL, fn, NULL);
    for (int i = 0; i < nthreads; i++)
        pthread_join(t[i], NULL);
    clock_gettime(CLOCK_MONOTONIC, &end);
    return (end.tv_sec - start.tv_sec) + (end.tv_nsec - start.tv_nsec) / 1e9;
}

int main(void)
{
    int threads[] = {1, 2, 4};
    for (int t = 0; t < 3; t++) {
        int n = threads[t];
        atomic_counter = 0;
        mutex_counter = 0;
        double t_atomic = run_test(atomic_worker, n);
        double t_mutex  = run_test(mutex_worker, n);
        printf("%d thread(s): atomic=%.3fs  mutex=%.3fs  (%.1fx)\n",
               n, t_atomic, t_mutex, t_mutex / t_atomic);
    }
    return 0;
}
gcc -Wall -pthread -O2 -o bench_sync bench_sync.c
./bench_sync
Question

Analyse the Results

  • With 1 thread, is there much difference? Why?
  • With 4 threads, how much faster are atomics?
  • When would you still prefer mutex over atomics?

Exercise: relaxed counting

The g_readings counter is standalone — its value doesn't need to be ordered relative to g_temp or any other variable. This makes it safe to use memory_order_relaxed:

Open atomic_sensor.c and find the atomic_fetch_add(&g_readings, 1) call. Change it to:

atomic_fetch_add_explicit(&g_readings, 1, memory_order_relaxed);

Rebuild and run — the counter still works correctly. This is safe because g_readings is a standalone counter — no other variable depends on its ordering.

Exercise: break it on purpose

Remove _Atomic from g_temp (change it to plain float g_temp) and rebuild with ThreadSanitizer:

gcc -Wall -pthread -fsanitize=thread -g -o atomic_sensor atomic_sensor.c
./atomic_sensor

TSan will report a data race on g_temp — one thread writes while another reads without synchronization. This is exactly what _Atomic prevents. Revert your change after observing the report.

When atomics are NOT enough

Atomics work for single values. When you need to share a multi-field struct (e.g., CPU usage + temperature + memory in sys_dashboard.c), atomics cannot help — a reader could see the new CPU value with the old temperature.

Pattern Mechanism Example
Signal flag (async) volatile sig_atomic_t g_running in signal handler
Single value between threads _Atomic g_temp, g_readings, g_calibrate
Multi-field struct pthread_mutex_t sys_dashboard.c shared stats
Producer-consumer queue Mutex + pthread_cond_t producer_consumer.c

Modification exercises

These exercises modify atomic_sensor.c. Work on a copy or use git stash to save your changes.

Tip

Exercise A: Track minimum temperature

Add _Atomic float g_min_temp using the same CAS loop pattern as g_max_temp, but reversed — update when temp < cur_min. Initialize it to a high value (e.g., 1000.0f). Display it alongside the max in the display thread.

Hint: Copy the while (temp > cur_max) loop and change the comparison direction.

Tip

Exercise B: Temperature alert flag

Add _Atomic int g_alert. In the sensor thread, set it to 1 when temperature exceeds a threshold (e.g., 50°C). In the display thread, consume the alert with atomic_exchange(&g_alert, 0) and print *** ALERT *** when it fires.

Think: Why atomic_exchange instead of atomic_load + atomic_store? What race condition would the two-step version have?

Tip

Exercise C: Connect forward to level_sdl2.c

Open level_sdl2.c in the same apps/ directory. Find all _Atomic variables and for each one, identify:

  • What type is it?
  • Which thread writes it? Which reads?
  • Which atomic operation pattern does it use (store/load, fetch-add, exchange, CAS)?

List your findings in your lab notebook. You will work with this code in the SDL2 Display Tutorial.

Checkpoint 5

Question Your Answer
What ARM instructions does atomic_fetch_add compile to?
What is the difference between volatile int and _Atomic int for threads?
When would you use memory_order_relaxed instead of the default?
Why volatile sig_atomic_t for g_running but _Atomic for g_calibrate?
How many _Atomic variables does level_sdl2.c declare?
With 4 threads, how much faster were atomics than mutex?

What Just Happened?

You built concurrent C programs using four different synchronization mechanisms:

Mechanism What You Did When to Use Later in Course
pthread_mutex_t Protected a shared counter Multi-field structs, critical sections Kernel mutex_lock() / spin_lock()
pthread_cond_t Producer-consumer ring buffer Thread needs to sleep until condition Kernel wait_event() / wake_up()
sem_t Bounded parking lot Limiting N concurrent accessors Kernel struct semaphore
_Atomic (C11) Lock-free sensor sharing Single values, counters, flags Kernel WRITE_ONCE() / READ_ONCE()
strtok_r() Replaced thread-unsafe function Any _r variant in threaded code Reentrant kernel APIs

Forward references:

  • Mutex → Kernel uses mutex_lock() and spin_lock() with the same purpose but stricter rules (cannot sleep in spinlock)
  • Condition variables → Kernel wait queues (wait_event() / wake_up()) follow the same pattern
  • Semaphores → Kernel down() / up() on struct semaphore — same counting semantics
  • Atomics → Kernel provides atomic_t with atomic_read() / atomic_set() / atomic_add() — similar to C11 but kernel-specific API

6. Mini-Project: Live System Dashboard [Host/RPi]

Put everything together: build a multi-threaded system monitor that reads real data from /proc and /sys, shares it through a mutex-protected struct, and displays a live-updating terminal dashboard.

This exercise is designed to run on your host laptop — no RPi needed. It uses only standard Linux interfaces (/proc/stat, /proc/meminfo, /sys/class/thermal/).

Architecture

 ┌─────────────┐   ┌──────────────┐   ┌─────────────┐
 │ CPU thread  │   │ MEM thread   │   │ TEMP thread │
 │ reads       │   │ reads        │   │ reads       │
 │ /proc/stat  │   │ /proc/meminfo│   │ thermal_zone│
 └──────┬──────┘   └──────┬───────┘   └──────┬──────┘
        │                 │                  │
        ▼                 ▼                  ▼
   ┌──────────────────────────────────────────────┐
   │           shared struct (mutex)              │
   │  cpu_pct  │  mem_used_mb  │  temp_c  │ ...   │
   └──────────────────┬───────────────────────────┘
              ┌───────────────┐
              │ Display thread│
              │ prints every  │
              │ 500ms         │
              └───────────────┘

Three producer threads read system data. One consumer thread (or main) displays it. A pthread_mutex_t protects the shared data.

Step 6: Starter — sys_dashboard.c

This is a skeleton. The structure and the display thread are complete. You fill in the sensor-reading functions marked with TODO.

/* sys_dashboard.c — multi-threaded system monitor
 *
 * Build:  gcc -Wall -pthread -o sys_dashboard sys_dashboard.c
 * Run:    ./sys_dashboard
 * Stop:   Ctrl+C
 */
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <signal.h>
#include <pthread.h>
#include <time.h>

/* ── Shared data (protected by mutex) ─────────────────── */

typedef struct {
    double cpu_pct;         /* CPU usage 0-100% */
    long   mem_total_mb;
    long   mem_used_mb;
    double temp_c;          /* CPU temperature */
    int    readings;        /* Total readings taken */
} sys_data_t;

sys_data_t shared = {0};
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
volatile sig_atomic_t running = 1;

void handle_int(int sig) { (void)sig; running = 0; }

/* ── Helper: draw a bar ──────────────────────────────── */

static void draw_bar(char *buf, int width, double pct)
{
    int filled = (int)(pct / 100.0 * width);
    if (filled > width) filled = width;
    if (filled < 0)     filled = 0;
    for (int i = 0; i < width; i++)
        buf[i] = (i < filled) ? '#' : '-';
    buf[width] = '\0';
}

/* ── CPU thread ──────────────────────────────────────── */
/*
 * /proc/stat first line: cpu <user> <nice> <system> <idle> ...
 * CPU% = 100 * (total - idle) / total   (delta between two reads)
 */

static void *cpu_thread(void *arg)
{
    (void)arg;
    long prev_total = 0, prev_idle = 0;

    while (running) {
        FILE *f = fopen("/proc/stat", "r");
        if (!f) { sleep(1); continue; }

        /* TODO: Read the first line starting with "cpu "
         * Parse the numbers: user, nice, system, idle, iowait, irq, softirq
         * Compute total = user+nice+system+idle+iowait+irq+softirq
         * Compute delta_total and delta_idle from previous values
         * cpu_pct = 100.0 * (delta_total - delta_idle) / delta_total
         *
         * Hint: use fscanf(f, "cpu %ld %ld %ld %ld %ld %ld %ld",
         *                  &user, &nice, &sys, &idle, &iow, &irq, &sirq);
         */
        long user, nice, sys, idle, iow, irq, sirq;
        if (fscanf(f, "cpu %ld %ld %ld %ld %ld %ld %ld",
                   &user, &nice, &sys, &idle, &iow, &irq, &sirq) == 7) {
            long total = user + nice + sys + idle + iow + irq + sirq;
            long dt = total - prev_total;
            long di = idle - prev_idle;

            pthread_mutex_lock(&lock);
            if (dt > 0)
                shared.cpu_pct = 100.0 * (dt - di) / dt;
            shared.readings++;
            pthread_mutex_unlock(&lock);

            prev_total = total;
            prev_idle = idle;
        }
        fclose(f);

        usleep(500000);   /* 500 ms */
    }
    return NULL;
}

/* ── Memory thread ───────────────────────────────────── */

static void *mem_thread(void *arg)
{
    (void)arg;
    while (running) {
        FILE *f = fopen("/proc/meminfo", "r");
        if (!f) { sleep(1); continue; }

        /* TODO: Read MemTotal and MemAvailable from /proc/meminfo
         * Each line is like:  MemTotal:       8000000 kB
         * mem_used = mem_total - mem_available
         *
         * Hint: read line by line with fgets(), use sscanf() to match:
         *   sscanf(line, "MemTotal: %ld kB", &total_kb)
         *   sscanf(line, "MemAvailable: %ld kB", &avail_kb)
         */
        char line[128];
        long total_kb = 0, avail_kb = 0;
        while (fgets(line, sizeof(line), f)) {
            sscanf(line, "MemTotal: %ld kB", &total_kb);
            sscanf(line, "MemAvailable: %ld kB", &avail_kb);
        }
        fclose(f);

        pthread_mutex_lock(&lock);
        shared.mem_total_mb = total_kb / 1024;
        shared.mem_used_mb  = (total_kb - avail_kb) / 1024;
        pthread_mutex_unlock(&lock);

        usleep(1000000);   /* 1 s */
    }
    return NULL;
}

/* ── Temperature thread ──────────────────────────────── */

static void *temp_thread(void *arg)
{
    (void)arg;
    while (running) {
        FILE *f = fopen("/sys/class/thermal/thermal_zone0/temp", "r");
        if (!f) {
            /* No thermal zone — set to -1 so display knows */
            pthread_mutex_lock(&lock);
            shared.temp_c = -1;
            pthread_mutex_unlock(&lock);
            sleep(2);
            continue;
        }

        /* TODO: Read the millidegree value, convert to Celsius
         *
         * Hint: the file contains a single integer like 42000 (= 42.0 C)
         *   int mc; fscanf(f, "%d", &mc); temp = mc / 1000.0;
         */
        int mc;
        if (fscanf(f, "%d", &mc) == 1) {
            pthread_mutex_lock(&lock);
            shared.temp_c = mc / 1000.0;
            pthread_mutex_unlock(&lock);
        }
        fclose(f);

        usleep(2000000);   /* 2 s */
    }
    return NULL;
}

/* ── Display (main loop) ─────────────────────────────── */

int main(void)
{
    struct sigaction sa = { .sa_handler = handle_int };
    sigaction(SIGINT, &sa, NULL);

    pthread_t t_cpu, t_mem, t_temp;
    pthread_create(&t_cpu,  NULL, cpu_thread,  NULL);
    pthread_create(&t_mem,  NULL, mem_thread,  NULL);
    pthread_create(&t_temp, NULL, temp_thread, NULL);

    printf("\n  System Dashboard (Ctrl+C to stop)\n");

    while (running) {
        /* Snapshot shared data under lock */
        pthread_mutex_lock(&lock);
        sys_data_t snap = shared;
        pthread_mutex_unlock(&lock);

        /* Build bars */
        char cpu_bar[31], mem_bar[31];
        draw_bar(cpu_bar, 30, snap.cpu_pct);
        double mem_pct = snap.mem_total_mb > 0
            ? 100.0 * snap.mem_used_mb / snap.mem_total_mb : 0;
        draw_bar(mem_bar, 30, mem_pct);

        /* Print dashboard (ANSI escape to overwrite) */
        printf("\033[2J\033[H");   /* clear screen, cursor home */
        printf("  ╔══════════════════════════════════════════╗\n");
        printf("  ║  SYSTEM DASHBOARD          #%-5d         ║\n", snap.readings);
        printf("  ╠══════════════════════════════════════════╣\n");
        printf("  ║  CPU  [%s] %5.1f%%  ║\n", cpu_bar, snap.cpu_pct);
        printf("  ║  MEM  [%s] %5.1f%%  ║\n", mem_bar, mem_pct);
        printf("  ║  used: %ld / %ld MB %*s║\n",
               snap.mem_used_mb, snap.mem_total_mb,
               (int)(28 - snprintf(NULL, 0, "%ld / %ld MB",
                   snap.mem_used_mb, snap.mem_total_mb)), "");
        if (snap.temp_c >= 0)
            printf("  ║  TEMP  %.1f C %30s║\n", snap.temp_c, "");
        else
            printf("  ║  TEMP  (no sensor) %23s║\n", "");
        printf("  ╠══════════════════════════════════════════╣\n");
        printf("  ║  Ctrl+C = stop   kill -SIGUSR1 = TODO    ║\n");
        printf("  ╚══════════════════════════════════════════╝\n");

        fflush(stdout);
        usleep(500000);   /* refresh at 2 Hz */
    }

    printf("\n  Shutting down...\n");
    pthread_join(t_cpu, NULL);
    pthread_join(t_mem, NULL);
    pthread_join(t_temp, NULL);
    printf("  %d readings taken. Goodbye!\n", shared.readings);

    return 0;
}

Build and Run

gcc -Wall -pthread -o sys_dashboard sys_dashboard.c
./sys_dashboard

You should see a live-updating box with CPU usage, memory usage, and temperature. Open another terminal and generate some CPU load to see the bar move:

# In another terminal — watch the CPU bar jump
stress-ng --cpu 2 --timeout 10s
# or: while true; do :; done &    (kill it after)

Your Tasks

The skeleton above is complete and runs. Now extend it — pick at least two:

Tip

Task A: Add SIGUSR1 for stats snapshot

Add a SIGUSR1 handler that prints a one-line stats summary to stderr (so it doesn't mess up the dashboard). Pattern: set a volatile sig_atomic_t flag, check it in the display loop.

kill -SIGUSR1 $(pgrep sys_dashboard)
Tip

Task B: CSV logging thread

Add a 4th thread that appends a CSV row every second:

timestamp,cpu_pct,mem_used_mb,temp_c
14:30:01,23.5,1847,52.3

Open the file once, lock the mutex, snapshot, unlock, fprintf, fflush. When the program exits (SIGINT), the file should be complete and valid.

Tip

Task C: High/low alerts

Track min/max for CPU and temperature in the shared struct. Display them in the dashboard. Add a flashing ** HIGH TEMP ** warning when temperature exceeds 70 C (use ANSI color: \033[31m red, \033[0m reset).

Tip

Task D: Load average from /proc/loadavg

Add a new field to the shared struct and read /proc/loadavg (format: 0.15 0.20 0.18 1/234 5678). Display the 1-minute load average in the dashboard. This one is easy — good warm-up.

Checkpoint 6

Question Your Answer
Which tasks did you complete (A/B/C/D)?
What happens if you remove the mutex lock in the display loop?
Does the dashboard still work if /sys/class/thermal/ doesn't exist?
How many threads are running? (hint: ps -eLf \| grep sys_dashboard)

Challenges

Tip

Challenge 1: Thread Pool

Implement a thread pool with N worker threads and a shared task queue. Use a mutex and condition variable for the queue. Submit 20 tasks (each sleeps a random time and prints a message). Compare performance with N=1, N=4, and N=8 workers.

Tip

Challenge 2: Readers-Writers Lock

Implement a readers-writers lock using mutexes and condition variables. Multiple readers can access data simultaneously, but writers need exclusive access. Test with 5 reader threads and 2 writer threads. Compare with pthread_rwlock_t.

Tip

Challenge 3: Lock-Free Ring Buffer

Rewrite the producer-consumer ring buffer using only _Atomic variables (no mutex, no condvar). Use atomic_load and atomic_store on the head/tail indices. This is a classic lock-free pattern used in high-performance systems. Test with TSan to verify correctness.


Deliverable

  • [ ] thread_demo.c compiles and runs — demonstrates race condition without mutex, correct result with mutex
  • [ ] thread_scope.c compiles and runs — shows local/static/global scoping and PID vs TID
  • [ ] TSan report observed for worker_unsafe, clean run for worker_safe
  • [ ] strtok_threads.c compiles and runs — garbled output with strtok(), correct with strtok_r()
  • [ ] Man page MT-Safety attributes checked for at least 3 functions
  • [ ] producer_consumer.c compiles and runs — circular buffer fills and drains, consumer sleeps when empty
  • [ ] At least 2 of the condvar "What Happens When" experiments completed
  • [ ] parking_lot.c compiles and runs — semaphore limits concurrent parking
  • [ ] Semaphore experiments completed: CAPACITY=1 (binary), CAPACITY=8 (no wait), CAPACITY=0 (deadlock)
  • [ ] atomic_sensor.c builds and runs — live temperature display with atomic variables
  • [ ] bench_sync.c compiles and runs — atomic vs mutex benchmarked with 1/2/4 threads
  • [ ] Exercise A or B completed — g_min_temp or g_alert added to atomic_sensor.c
  • [ ] Exercise C completed — all _Atomic variables in level_sdl2.c identified and documented
  • [ ] sys_dashboard.c compiles and runs — live dashboard with CPU, memory, temperature
  • [ ] At least two of the dashboard extension tasks (A/B/C/D) completed
  • [ ] (Optional) At least one advanced challenge completed

Next Steps

You now have the C fundamentals for kernel driver development. Next: write your first kernel module in Tutorial: MCP9808 Kernel Driver, where you will see probe(), read(), and interrupt handlers that follow the same patterns you practiced here.

For network programming, continue to Tutorial: Network Sockets.


Back to Course Overview