What is a SEGV

What happens under the hood, with a Linux / x86 focus

# test.c
#include <stdio.h>
int main(void) {
        int* i = NULL;
        printf("%d\n", *i);
        return 0;
}

Bash

$ gcc test.c
$ ./a.out
[1]    15654 segmentation fault  ./a.out

SEGV stands for Segmentation Violation, also known as a segmentation fault or “segfault”. It means your program is attempting to access a region, or “segment” of memory it is not allowed to. This is practically one of the following:

Dereferencing a NULL pointer
Dereferencing an uninitialized pointer
Writing to read-only memory
Reading or writing outside the valid boundaries of an array or allocated buffer, such as accessing the 15th element of an 11-element array
Reading or writing more generally to any region of memory (stack, heap etc) not mapped to this process

Your program executes as sequence of machine instructions on the CPU as a "process". The kernel oversees the execution of all processes, including process creation, destruction, CPU scheduling and memory allocation + management.

As part of its memory management function, the kernel abstracts the CPU’s real, physical memory addresses, i.e. addresses in main memory (RAM) into a “virtual” address space the process interacts with starting at 0x0000000000000000 and going to 0xFFFFFFFFFFFFFFFF for a 64-bit address space.

When a process accesses virtual address 0x00007ffdf3c8a000, for example, it translated by the CPU’s memory management unit (MMU) into a physical memory address, e.g. 0x0000000123abf000.

If a virtual address is not “mapped”, i.e. backed by a real physical address, or violates permissions, the CPU raises a hardware exception called a Page Fault. The kernel then decides whether the fault is recoverable. If not, it sends a SIGSEGV (signal 11) to the process and terminates execution. This is a SEGV.

Many page faults are recoverable

Not all page faults are deadly. In fact, during normal operation of a program, most page faults recoverable. We discuss the most common, recoverable page faults below.

First off, a "page" is a fixed-size chunk of contiguous virtual memory, normally 4kb on most systems. On Linux, you can view your page size via

Bash

$ getconf PAGE_SIZE
4096

The "Page Fault" gets its name because modern MMUs work at the granularity of pages, since managing billions of individual bytes would be impossible. "Fault" unlike "abort" (unrecoverable error) and "trap" (intentional exception) can be handled by the operating system and is (sometimes) correctable.

Recoverable Page Fault #1: Demand Paging

C++

char *ptr = new char[1'000'000]; // Ask the kernel for 1MB.

Modern kernels use lazy allocation, so the kernel doesn't actually create any pages yet, but instead just reserves some virtual address space and hands back the address range like 0x7f8a4b200000-0x7f8a4b2f5000 to the user.

C++

ptr[0] = 0; // PAGE FAULT -> kernel allocates physical page.

When this memory is actually accessed by the process, there is no page backing it. This situation is recovered by these general steps:

The CPU's MMU will attempt to translate virtual address ptr[0] i.e. 0x7f8a4b200000 into a physical address
To do this, the MMU walks the proccess's page table (more on this below) and discovers there is no entry for this virtual address
The CPU raises a page fault
The CPU saves current register and stack state and switches to Kernel Mode.
The CPU jumps to page fault handler, do_page_fault(), via IDT (Interrupt Descriptor Table).
The kernel then evaluates and resolves the fault by allocating a new page in main memory.
The CPU jumps back into User Mode and retries the instruction

C++

ptr[0] = 0

This time, the page is backed--the MMU finds an entry in the page table for this virtual address and the process successfully reads.

Fun fact: Linux can "overcommit" memory: it may allow allocations even when total requested virtual memory exceeds RAM + swap, betting that not all pages will be touched. vm.overcommit_memory controls this policy (0 = heuristic default, 1 = always overcommit, 2 = strict limit). Practical consequence: malloc() can succeed now, but under pressure a later page fault can still trigger out-of-memory handling and process termination.

Recoverable Page Fault #2: Copy-on-Write

Another common example: fork().

int *data = malloc(1000);
data[0] = 42;
pid_t pid = fork(); 
data[1] = 99;        // Page fault!

Copy-on-write is an optimization that saves time and space. When a process forks, the kernel copy over the process's entire address space to the child. Instead, the child shares its parent's physical pages, which are marked as read-only in the page table. If either process attempts to write to a page, the CPU raises a page fault, the kernel allocates a new page, copies the data over and updates the writing process's page table to point to its new private copy.

This is easier to understand with an example. After fork(), before any writes, the two processes' page tables looked like this:

Process A Page Table          Process B Page Table
Virtual    Physical            Virtual    Physical
---------- --------            ---------- --------
0x7f...000 ->0x12345000 (RO)   0x7f...000 ->0x12345000 (RO) <- same physical page
0x7f...001 ->0x12346000 (RO)   0x7f...001 ->0x12346000 (RO) <- same physical page
0x7f...002 ->0x12347000 (RO)   0x7f...002 ->0x12347000 (RO) <- same physical page
0x7f...003 ->0x12348000 (RO)   0x7f...003 ->0x12348000 (RO) <- same physical page
0x7f...004 ->0x12349000 (RO)   0x7f...004 ->0x12349000 (RO) <- same physical page

After Process A writes to data[1] (which is on page 0):

Process A Page Table          Process B Page Table
Virtual    Physical            Virtual    Physical
---------- --------            ---------- --------
0x7f...000 ->0xABCDE000 (RW)   0x7f...000 ->0x12345000 (RO)  <- DIFFERENT! A got a private copy
0x7f...001 ->0x12346000 (RO)   0x7f...001 ->0x12346000 (RO)  <- still shared
0x7f...002 ->0x12347000 (RO)   0x7f...002 ->0x12347000 (RO)  <- still shared
0x7f...003 ->0x12348000 (RO)   0x7f...003- ->0x12348000 (RO) <- still shared
0x7f...004 ->0x12349000 (RO)   0x7f...004 ->0x12349000 (RO)  <- still shared

Recoverable Page Fault #3: Requested page is not in main memory aka major fault.

Your process may request a valid page that is no longer in main memory, but has been swapped to disk by the kernel. First, how is this even possible and why does it happen?

RAM on all systems is limited, so the kernel evicts LRU pages from main memory to disk when fall below certain "watermark" values.
Bash
$ cat /proc/sys/vm/min_free_kbytes
67584
The default on my machine is 66 MB free system-wide. "vm" is Virtual Memory, the kernel subsystem that manages all memory: RAM, swap, page cache, process address spaces, etc. min_free_kbytes is the minimum amount of free memory (in KB) the kernel tries to keep available across the entire system.

When the kernel evicts a page, it choose either one from the page cache (pages already backed by files on disk) or anonymous pages - process memory like the stack + heap, which are not backed by disk. On Linux, you can tune the kernel's balance between these via /proc/sys/vm/swappiness, which defaults to 60 and ranges for 0 - 200. This value is an expression of the relative cost between dropping from the page cache and swapping anyomous pages to disk when the kernel is making space in main memory:

swappiness= 0: strongly prefer dropping from page cache

swappiness= 100: treat both types roughly equally

swappiness= 200: show some preference for swapping anonymous pages

By default (swappiness=60), the kernel prefers dropping from the page cache, but this number is both program and hardware specific. For workloads that access the file cache frequently and where process memory is less active, swappiness should be higher. Likewise, for processes that keep large working sets in memory, swappiness should be closeer to 0. Imagine a PostgreSQL with 32GB shared buffers doing frequent index scans--swapping out the database's heap/stack would cause severe latency spikes. Hardware also matters: evicting a page from the page cache only requires marking the physical page free (assuming it's clean) while swapping process memory requires writing to disk and reading it back later. HDD (Hard Disk Drive) systems have higher (~10ms) I/O times (prefer lower swappiness) while SSD (Solid State Drive) systems have faster (~.1ms) I/O times (can tolerate higher swappiness).

Swapping allows processes to utilize more memory than RAM. Of course, I/O to disk is very, very slow, so if a process is addressing large amounts of memory, no amount of swaps will make it execute in a reasonable amount of time.

Check your memory / swap usage:
Bash
$ free -h
                  total        used        free      shared  buff/cache   available
  Mem:           124Gi        32Gi       3.8Gi       391Mi        90Gi        92Gi
  Swap:          2.0Gi       173Mi       1.8Gi
My machine has 124Gi total installed physical RAM. 32Gi is currently allocated to running processes, 3.8Gi is completely untouched, 391Mi (shared) is used for tmpfs and 90Gi is used for the page cache (recently accessed disk files).

92Gi is "available": this is an estimate of how much RAM is ready for new applications. It includes free + the portion of buff / cache that can be instantly reclaimed if a process needs it.

Under Swap, we see 2.0Gi of swap space is available on disk while 173Mi of it is currently being used, leaving 1.8Gi left. More on swap management: https://www.kernel.org/doc/gorman/html/understand/understand014.html

Back to our recoverable page fault. Let's say your program does the following:

char *big_array = malloc(1'000'000)
// time passes

The kernel swaps the page backing big_array[i] to disk and updates the PTE (page table entry):

Sets "present" bit to 0
Removes the physical page address
Adds a "swap entry", which contains the location on disk of the page

// Since the page backing big_array[] has been offloaded to disk, this generates a page fault
big_array[i] = 0

CPU raises page fault
Kernel recognizes it must read this page in from disk and finds a free page in RAM (evicting another page if need be)
Kernel reads the page from disk, issuing a disk I/O read from the swap file
- This is slow (5-20ms) so the process sleeps while waiting for the I/O via io_schedule() (which calls schedule()).
Kernel updates the page table
Control flow returns from the kernel page fault handler to the running process

Recoverable Page Fault #4: File backed / memory-mapped files

int fd = open("bigfile.dat", O_RDONLY);
char *mapped = mmap(NULL, filesize, PROT_READ, MAP_PRIVATE, fd, 0);
char c = mapped[0];  // Page fault - kernel loads this file chunk into memory

mmap() gives your process a handle to a file (or device) by mapping it into the process's virtual memory so it can interact with the file via pointer operations instead of slow and expensive read() / write() system calls. This involves lazy loading. Only when the process writes to a page does it actually get loaded from disk, as shown above.

mmap() is how the kernel makes shared libraries (.so files) accessible to a process. When the kernel loads your program (execve syscall), it called mmap() on executable segments, e.g. libc. We can view these mmap'd files with
Bash
$ sudo cat /proc/self/maps
# simplified output
7fa809a28000-7fa809b9d000 r-xp 00028000 fd:01 5938                       /usr/lib64/libc.so.6
the libc executable is mapped to address 0x7fa809a28000 in this process's virtual memory, has read / execute / private (no write) perms, and some other metadata.

Unrecoverable

We've seen the recoverable page-fault cases. When fault handling cannot recover, it queues the SIGSEGV signal to the process.

If user space installed a handler (sa_handler != SIG_DFL), get_signal() dispatches that handler. Otherwise it may call vfs_coredump(), which enters do_coredump(), before exiting via do_group_exit().