2025-5-31 Linux Kernel Internals Process

Prelude

This is my attempt to go through the kernel and explain as much as possible of its inner working. It is (For now) heavily inspired by Understanding the Linux Kernel, 3rd Edition i will still be up to date in my content.

Keep in mind, that was written when the best consumer hardware you could get your hands on would be a single core CPU with Brand new 64bits (;

We will be focusing on X86_64 for simplicity.

Introduction

Basic concepts

The kernel has 2 Main jobs:

  • Interact with the hardware in order to make it work.
  • Provide an execution environment to the applications that run on the computer system (the so-called user programs).

From there, we can get more precise:

  • abstract low-level detail about the hardware and make it accessible to the processes.
  • Manage and grant system resources.
  • Be the process manager.
  • Give the user a way to manage their data through files

How is that enforced? This is where 2 key concepts comes into play.

Execution modes

To enforce this, the CPU has a feature called execution modes (protection rings). The kernel runs in Kernel mode (Ring 0) while any other user program will run in User mode (Ring 3). Under this system, a process in User mode cannot access kernel memory while a process in Kernel mode can access everything.

PS: Their is a Ring 1 and 2 inside CPU which is never used and was almost removed by intel under their failed X86S initiative [1]

Some will even say that there is a Ring -1 (Hypervisor), a Ring -2 (X86's System Management Mode) and a Ring -3 (The Firmware itself)

For 99% of cases, only User mode and Kernel mode matter.

Users and Groups

Users and group are used for file permission inside of the system. Identified by a User ID (UID) and a User Group ID, a Process can only interact with the files that allow the level of permission that was granted to its User/Group.

Unix-Like systems also has a Root or SuperUser which is a user that is exempt for the permission check, which make it able to access almost any resource on the system.

  • Make a point here about how that not totally true and Capabilities can be tweaked but that beyond this vid.

Even if they may seem similar, Root and Kernel mode have nothing to do with one another; Root runs in User mode just like any other user.


Short overview of what we just learned (Execution modes and Users/Groups)


Processes

A process can be defined either as “an instance of a program in execution” or as the “execution context” of a running program. At its core, it is nothing more than a handful of register values and an list of instruction to be ran from memory.

It is the job of the Kernel to split the CPU between different processes. To do that, the Kernel will use a Scheduler that will decide when to preempt (remove from the CPU) a running process for another. This is mostly done by timing a given process and removing it at the end of the timer.

Unix-Like systems subscribe to the process/kernel model. Each process will run as if it is the only process on the machine and has exclusive access to its services. The process will access those services using a system call (a request to the kernel). As a result of a system call, the hardware will switch into Kernel mode and run the related kernel routine. Once the request fully executed, the hardware will return into User mode and the result will be returned. GOOD TIME FOR A VENDING MACHINE AND CANADA DRY ANALOGY. WINK WINK

Syscalls can be used for a bunch of things from managing file and processes to requesting memory or getting the time.

The Process/Kernel Model

We know that our CPU can either run in User Mode or Kernel Mode, and that when a process requests a kernel service (with a syscall) it will switch to Kernel Mode temporarily to fulfil the request.

Lets keep in mind that a Process is a dynamic entity that has a limited life span.

It is the Kernel's job to Create, Eliminate, and Synchronize processes using kernel routines.

The kernel is NOT a process but it IS a process manager.

Small tangent:

{Besides user processes, Unix systems also include a few privileged processes called Kernel Threads with the following characteristics:

  • They run in Kernel Mode in the kernel address space.
  • They do not interact with users, and thus do not require terminal devices.

lets talk about process again:

Syscalls are not the only way to reach kernel mode while running a Process:

  • The CPU executing the process signals an exception, which is an unusual condition such as an invalid instruction. The kernel handles the exception on behalf of the process that caused it.
  • A peripheral device issues an interrupt signal to the CPU to notify it of an event such as a request for attention, a status change, or the completion of an I/O operation. Each interrupt signal is dealt by a kernel program called an interrupt handler. Because peripheral devices operate asynchronously with respect to the CPU, interrupts occur at unpredictable times.
  • and as stated the Kernel Thread
  • Process Implementation

To be able to manage a process, the kernel will need some information about the current state of the process.

That information is stored in a Process Descriptor (struct task_struct).

When the kernel stops the execution of a process, it will save the content of multiple processor register in the Process Descriptor, here is some example:

  • The program counter (PC) and stack pointer (SP) registers
  • The general purpose registers
  • The floating point registers
  • The processor control registers (Processor Status Word) containing information about the CPU state
  • The memory management registers used to keep track of the RAM accessed by the process

When the kernel wants to resume the execution of the process, it will load all those register back onto the processor.

You can think of the Process Descriptor as an image of the last state of our process.

A process not currently being executed will be waiting, most likely in a queues that will trigger at a specific event.

Process Address Space

Each process runs in its private address space. A process running in User Mode refers to private stack, data, and code areas. When running in Kernel Mode, the process addresses the kernel data and code areas and uses another private stack.

While it appears to each process that it has access to a private address space, there are times when part of the address space is shared among processes. In some cases, this sharing is explicitly requested by processes; in others, it is done automatically by the kernel to reduce memory usage.

If the same program, say an editor, is needed simultaneously by several users, the program is loaded into memory only once, and its instructions can be shared by all of the users who need it. Its data, of course, must not be shared, because each user will have separate data. This kind of shared address space is done automatically by the kernel to save memory.

Processes also can share parts of their address space as a kind of interprocess communication, using the “shared memory” technique introduced in System V and supported by Linux.

Finally, Linux supports the mmap( ) system call, which allows part of a file or the information stored on a block device to be mapped into a part of a process address space. Memory mapping can provide an alternative to normal reads and writes for transferring data. If the same file is shared by several processes, its memory mapping is included in the address space of each of the processes that share it.

Signals and Interprocess Communication

Process think User Mode

Unix signals provide a mechanism for notifying processes of system events. Each event has its own signal number, which is usually referred to by a symbolic constant such as SIGTERM. There are two kinds of system events:

Asynchronous notifications (Triggered from the outside)

For instance, a user can send the interrupt signal SIGINT to a foreground process by pressing the interrupt keycode (usually Ctrl-C) at the terminal.

Synchronous notifications (Triggered from the inside)

For instance, the kernel sends the signal SIGSEGV to a process when it accesses a memory location at an invalid address. (Memory access violation)

The Linux defines about 31 different signals, 2 of which are user-definable and may be used as a primitive mechanism for communication and synchronization among processes in User Mode. [4]

In general, a process may react to a signal delivery in two possible ways:

  • Ignore the signal.
  • Asynchronously execute a specified procedure (the signal handler).

If the process does not specify one of these alternatives, the kernel performs a default action that depends on the signal number. The five possible default actions are:

  • Terminate the process.
  • Write the execution context and the contents of the address space in a file (core dump) and terminate the process.
  • Ignore the signal.
  • Suspend the process.
  • Resume the process’s execution, if it was stopped.

Kernel signal handling is rather elaborate, because the POSIX semantics allows processes to temporarily block signals. Moreover, the SIGKILL and SIGSTOP signals cannot be directly handled by the process or ignored.

Since introduced by ATT’s Unix System V, Unix kernels use IPC (InterProcess Communication).

It is composed by:

  • Shared memory shmget( )
  • Message queues msgget( )
  • Semaphores semget( )

You must use syscalls to acquire them and IPC resources are persistent, they must be deallocated by their owner (or the superuser) to be removed.

Shared memory provides the fastest way for processes to exchange and share data. A process starts by issuing a shmget( ) system call to create a new shared memory of a specific size. After obtaining the IPC resource identifier, the process invokes the shmat( ) system call, which returns the starting address of the new region within the process address space. When you want to detach the shared memory from its address space, it invokes the shmdt( ) system call.

The implementation of shared memory depends on how the kernel implements process address spaces.

Message queues allow processes to exchange messages by using the msgsnd( ) and msgrcv( ) system calls, which insert a message into a specific message queue and extract a message from it, respectively. (not saying based on POSIX standard (IEEE Std 1003.1-2001))

and Semaphores are synchronization primitives and we will discuss in future episode.

Process Management

Lets start by talking about 3 syscalls:

exec() -like

Will load a new program into the process and provide a new address space.

_ exit()

Will terminate the process.[5]

fork()

Will create a new child process, the process that ran fork() will be the parent, they can find each other by a data structure which give the parent access the all their child and the child access to their parent.

The child created will have a "copy" of the data and the code from the parent process. (this is done by the hardware paging unit which will do Copy on Write and defer page duplication for as long as possible).

if a child gets _exit() it will send the parent process a signal SIGCHLD (ignored by default).

Zombie processes

When a process gets terminated, it becomes a Zombie until its parent releases it.

The way a parent process would do that is with the wait3() or wait4() which will allow to wait until a child terminates to extract data about the Zombie process and releases its memory.

What if the Parent gets terminated, what happens to the child?

The child process would keep on running and the Parent would have to be released using wait4(). That job will be done by PID1 The init (SystemD). The init will issue frequent wait4() to take care of any zombie process directly under it and this will put all of the zombie's childs under the init's control.

This is how a process started by a shell can still run even when the shell gets terminated.

Take note that wait4() is considered nonstandard in Linux, it is recommended to use waitpid() or waitid() which does the same thing but requires you to specify and identification for the process.

Sessions and Process groups

Processes can also have groups, groups are used to send signals to multiple process at the same time.

To identify them, the Process Descriptor will contain a Process Group ID (PGID) that will be equal to the Process ID of the first Process of the group called Process Group Leader.[6]

A shell with that is compatible with process group like Bash would put all process from this command under the same group:

ls | sort | more

When opening a Unix system, you will be prompted to log into your account. The system will then execute a Shell (Like Bash) for you to interact with the system.

That Shell will be a Session Leader and will have a Session ID equal to its Process ID.

Multiple Group can be part of 1 Session but only one can be in the foreground and output to the shell.

The other Process Group will be in the background and if they attempt the read or write the the terminal they will receive a signal (SIGTTIN or SIGTTOUT) to let them know.

A New process will inherit the Process Group ID and the Session ID of its parent.

Sources: