What is a SEGV
What happens under the hood, with a Linux / x86 focus
# test.c
#include <stdio.h>
int main(void) {
int* i = NULL;
printf("%d\n", *i);
return 0;
}
$ gcc test.c
$ ./a.out
[1] 15654 segmentation fault ./a.out
SEGV stands for Segmentation Violation, also known as a segmentation fault or “segfault”. It means your program is attempting to access a region, or “segment” of memory it is not allowed to. This is practically one of the following:
- Dereferencing a
NULLpointer - Dereferencing an uninitialized pointer
- Writing to read-only memory
- Reading or writing outside the valid boundaries of an array or allocated buffer, such as accessing the 15th element of an 11-element array
- Reading or writing more generally to any region of memory (stack, heap etc) not mapped to this process
Your program executes as sequence of machine instructions on the CPU as a "process". The kernel oversees the execution of all processes, including process creation, destruction, CPU scheduling and memory allocation + management.
As part of its memory management function, the kernel abstracts the CPU’s real, physical memory addresses, i.e. addresses in main memory (RAM) into a “virtual” address space the process interacts with starting at 0x0000000000000000 and going to 0xFFFFFFFFFFFFFFFF for a 64-bit address space.
When a process accesses virtual address 0x00007ffdf3c8a000, for example, it translated by the CPU’s memory management unit (MMU) into a physical memory address, e.g. 0x0000000123abf000.
If a virtual address is not “mapped”, i.e. backed by a real physical address, or violates permissions, the CPU raises a hardware exception called a Page Fault. The kernel then decides whether the fault is recoverable. If not, it sends a SIGSEGV (signal 11) to the process and terminates execution. This is a SEGV.
Many page faults are recoverable
Not all page faults are deadly. In fact, during normal operation of a program, most page faults recoverable. We discuss the most common, recoverable page faults below.
First off, a "page" is a fixed-size chunk of contiguous virtual memory, normally 4kb on most systems. On Linux, you can view your page size via
$ getconf PAGE_SIZE
4096
The "Page Fault" gets its name because modern MMUs work at the granularity of pages, since managing billions of individual bytes would be impossible. "Fault" unlike "abort" (unrecoverable error) and "trap" (intentional exception) can be handled by the operating system and is (sometimes) correctable.
Recoverable Page Fault #1: Demand Paging
char *ptr = new char[1'000'000]; // Ask the kernel for 1MB.
Modern kernels use lazy allocation, so the kernel doesn't actually create any pages yet, but instead just reserves some virtual address space and hands back the address range like 0x7f8a4b200000-0x7f8a4b2f5000 to the user.
ptr[0] = 0; // PAGE FAULT -> kernel allocates physical page.
When this memory is actually accessed by the process, there is no page backing it. This situation is recovered by these general steps:
- The CPU's MMU will attempt to translate virtual address
ptr[0]i.e.0x7f8a4b200000into a physical address - To do this, the MMU walks the proccess's page table (more on this below) and discovers there is no entry for this virtual address
- The CPU raises a page fault
- The CPU saves current register and stack state and switches to Kernel Mode.
- The CPU jumps to page fault handler, do_page_fault(), via IDT (Interrupt Descriptor Table).
- The kernel then evaluates and resolves the fault by allocating a new page in main memory.
- The CPU jumps back into User Mode and retries the instruction
ptr[0] = 0
This time, the page is backed--the MMU finds an entry in the page table for this virtual address and the process successfully reads.
Fun fact: Linux can "overcommit" memory: it may allow allocations even when total requested virtual memory exceeds RAM + swap, betting that not all pages will be touched.
vm.overcommit_memorycontrols this policy (0= heuristic default,1= always overcommit,2= strict limit). Practical consequence:malloc()can succeed now, but under pressure a later page fault can still trigger out-of-memory handling and process termination.
Recoverable Page Fault #2: Copy-on-Write
Another common example: fork().
int *data = malloc(1000);
data[0] = 42;
pid_t pid = fork();
data[1] = 99; // Page fault!
Copy-on-write is an optimization that saves time and space. When a process forks, the kernel copy over the process's entire address space to the child. Instead, the child shares its parent's physical pages, which are marked as read-only in the page table. If either process attempts to write to a page, the CPU raises a page fault, the kernel allocates a new page, copies the data over and updates the writing process's page table to point to its new private copy.
This is easier to understand with an example. After fork(), before any writes, the two processes' page tables looked like this:
Process A Page Table Process B Page Table
Virtual Physical Virtual Physical
---------- -------- ---------- --------
0x7f...000 ->0x12345000 (RO) 0x7f...000 ->0x12345000 (RO) <- same physical page
0x7f...001 ->0x12346000 (RO) 0x7f...001 ->0x12346000 (RO) <- same physical page
0x7f...002 ->0x12347000 (RO) 0x7f...002 ->0x12347000 (RO) <- same physical page
0x7f...003 ->0x12348000 (RO) 0x7f...003 ->0x12348000 (RO) <- same physical page
0x7f...004 ->0x12349000 (RO) 0x7f...004 ->0x12349000 (RO) <- same physical page
After Process A writes to data[1] (which is on page 0):
Process A Page Table Process B Page Table
Virtual Physical Virtual Physical
---------- -------- ---------- --------
0x7f...000 ->0xABCDE000 (RW) 0x7f...000 ->0x12345000 (RO) <- DIFFERENT! A got a private copy
0x7f...001 ->0x12346000 (RO) 0x7f...001 ->0x12346000 (RO) <- still shared
0x7f...002 ->0x12347000 (RO) 0x7f...002 ->0x12347000 (RO) <- still shared
0x7f...003 ->0x12348000 (RO) 0x7f...003- ->0x12348000 (RO) <- still shared
0x7f...004 ->0x12349000 (RO) 0x7f...004 ->0x12349000 (RO) <- still shared
Recoverable Page Fault #3: Requested page is not in main memory aka major fault.
Your process may request a valid page that is no longer in main memory, but has been swapped to disk by the kernel. First, how is this even possible and why does it happen?
RAM on all systems is limited, so the kernel evicts LRU pages from main memory to disk when fall below certain "watermark" values.
Bash$ cat /proc/sys/vm/min_free_kbytes 67584The default on my machine is 66 MB free system-wide. "vm" is Virtual Memory, the kernel subsystem that manages all memory: RAM, swap, page cache, process address spaces, etc.
min_free_kbytesis the minimum amount of free memory (in KB) the kernel tries to keep available across the entire system.When the kernel evicts a page, it choose either one from the page cache (pages already backed by files on disk) or anonymous pages - process memory like the stack + heap, which are not backed by disk. On Linux, you can tune the kernel's balance between these via
/proc/sys/vm/swappiness, which defaults to60and ranges for0-200. This value is an expression of the relative cost between dropping from the page cache and swapping anyomous pages to disk when the kernel is making space in main memory:
swappiness= 0: strongly prefer dropping from page cacheswappiness= 100: treat both types roughly equallyswappiness= 200: show some preference for swapping anonymous pagesBy default (
swappiness=60), the kernel prefers dropping from the page cache, but this number is both program and hardware specific. For workloads that access the file cache frequently and where process memory is less active,swappinessshould be higher. Likewise, for processes that keep large working sets in memory,swappinessshould be closeer to 0. Imagine a PostgreSQL with 32GB shared buffers doing frequent index scans--swapping out the database's heap/stack would cause severe latency spikes. Hardware also matters: evicting a page from the page cache only requires marking the physical page free (assuming it's clean) while swapping process memory requires writing to disk and reading it back later. HDD (Hard Disk Drive) systems have higher (~10ms) I/O times (prefer lowerswappiness) while SSD (Solid State Drive) systems have faster (~.1ms) I/O times (can tolerate higherswappiness).Swapping allows processes to utilize more memory than RAM. Of course, I/O to disk is very, very slow, so if a process is addressing large amounts of memory, no amount of swaps will make it execute in a reasonable amount of time.
Check your memory / swap usage:
Bash$ free -h total used free shared buff/cache available Mem: 124Gi 32Gi 3.8Gi 391Mi 90Gi 92Gi Swap: 2.0Gi 173Mi 1.8GiMy machine has 124Gi total installed physical RAM. 32Gi is currently allocated to running processes, 3.8Gi is completely untouched, 391Mi (shared) is used for tmpfs and 90Gi is used for the page cache (recently accessed disk files).
92Gi is "available": this is an estimate of how much RAM is ready for new applications. It includes
free+ the portion ofbuff/cachethat can be instantly reclaimed if a process needs it.Under
Swap, we see 2.0Gi of swap space is available on disk while 173Mi of it is currently being used, leaving 1.8Gi left. More on swap management: https://www.kernel.org/doc/gorman/html/understand/understand014.html
Back to our recoverable page fault. Let's say your program does the following:
char *big_array = malloc(1'000'000)
// time passes
The kernel swaps the page backing big_array[i] to disk and updates the PTE (page table entry):
- Sets "present" bit to 0
- Removes the physical page address
- Adds a "swap entry", which contains the location on disk of the page
// Since the page backing big_array[] has been offloaded to disk, this generates a page fault
big_array[i] = 0
- CPU raises page fault
- Kernel recognizes it must read this page in from disk and finds a free page in RAM (evicting another page if need be)
- Kernel reads the page from disk, issuing a disk I/O read from the swap file
- This is slow (5-20ms) so the process sleeps while waiting for the I/O via io_schedule() (which calls
schedule()).
- This is slow (5-20ms) so the process sleeps while waiting for the I/O via io_schedule() (which calls
- Kernel updates the page table
- Control flow returns from the kernel page fault handler to the running process
Recoverable Page Fault #4: File backed / memory-mapped files
int fd = open("bigfile.dat", O_RDONLY);
char *mapped = mmap(NULL, filesize, PROT_READ, MAP_PRIVATE, fd, 0);
char c = mapped[0]; // Page fault - kernel loads this file chunk into memory
mmap() gives your process a handle to a file (or device) by mapping it into the process's virtual memory so it can interact with the file via pointer operations instead of slow and expensive read() / write() system calls.
This involves lazy loading. Only when the process writes to a page does it actually get loaded from disk, as shown above.
mmap()is how the kernel makes shared libraries (.sofiles) accessible to a process. When the kernel loads your program (execve syscall), it calledmmap()on executable segments, e.g. libc. We can view these mmap'd files withBash$ sudo cat /proc/self/maps # simplified output 7fa809a28000-7fa809b9d000 r-xp 00028000 fd:01 5938 /usr/lib64/libc.so.6the libc executable is mapped to address
0x7fa809a28000in this process's virtual memory, has read / execute / private (no write) perms, and some other metadata.
Unrecoverable
We've seen the recoverable page-fault cases. When fault handling cannot recover, it queues the SIGSEGV signal to the process.
If user space installed a handler (sa_handler != SIG_DFL), get_signal() dispatches that handler. Otherwise it may call vfs_coredump(), which enters do_coredump(), before exiting via do_group_exit().