Prelude

This is my attempt to go through the kernel and explain as much as possible of its inner working. It is (For now) heavily inspired by Understanding the Linux Kernel, 3rd Edition i will still be up to date in my content.

Keep in mind, that was written when the best consumer hardware you could get your hands on would be a single core CPU with Brand new 64bits (;

We will be focusing on X86_64 for simplicity.

https://man7.org/linux/man-pages/man7/capabilities.7.html

task_struct:

sched.h - include/linux/sched.h - Linux source code v6.15 - Bootlin Elixir Cross Referencer

Introduction

Linux is a True Unix Kernel.

Linux joins such well-known commercial Unix operating systems as ~~System V Release 4 (SVR4)~~(pronounced: "System Five"); ~~CSRG BSD~~ release from the University of California at Berkeley; ~~Digital UNIX~~ from Digital Equipment Corporation (now Hewlett-Packard); AIX from IBM; HP-UX from Hewlett-Packard; Solaris from Sun Microsystems (Now Oracle); and Mac OS from Apple Computer, Inc. Beside Linux, a few other opensource Unix-like kernels exist, such as FreeBSD , NetBSD , and OpenBSD.

Also most commercial Unix Kernel were all bundled as OS, not Linux.

Commercial Unix kernels often introduce new features to gain a larger slice of the market, but these features are not necessarily useful, stable, or productive. As a matter of fact, modern Unix kernels tend to be quite bloated. By contrast, Linux—together with the other open source operating systems

Back it the day there used to be a thriving Unix kernel/OS but they are mainly rendered obsolete by the existence of Linux.

Even with their differences all Unix tend to respect the IEEE’s Portable Operating Systems 1003.1 which makes software made for one Unix easily portable to any other.

Those standards specify only an application programming interface (API) and do not restrict the implementation.

Basic concepts

The kernel has 2 jobs:

Interact with the hardware components, servicing all low-level programmable elements included in the hardware platform.
Provide an execution environment to the applications that run on the computer system (the so-called user programs).

In contrast to OS like MS-DOS that would let you play directly with the hardware, Unix Kernel (including Linux) will abstract low-level detail about the hardware. When a program needs resources from the system it needs to ask the Kernel. The kernel evaluates the request and, if it chooses to grant the resource, interacts with the proper hardware components on behalf of the user program.

Execution modes

To enforce this, the CPU has a feature called execution modes (protection rings). The kernel runs in Kernel mode (Ring 0) while any other user program will run in User mode (Ring 3). Under this system, a process in User mode cannot access kernel memory while a process in Kernel mode can access everything.

PS: Their is a Ring 1 and 2 inside CPU which is never used and was almost removed by intel under their failed X86S initiative ^[1]

Some will even say that there is a Ring -1 (Hypervisor), a Ring -2 (X86's System Management Mode) and a Ring -3 (The Firmware itself)

For 99% of cases, only User mode and Kernel mode matter.

MultiUser

Linux is also a MultiUser system which means that it needs to handled multiple user programs concurrently without the programs themselves caring/knowing about the others.

A MultiUser system also need to have:

An authentication mechanism for verifying the user’s identity
A protection mechanism against buggy user programs that could block other applications running in the system
A protection mechanism against malicious user programs that could interfere with or spy on the activity of other users
An accounting mechanism that limits the amount of resource units assigned to each user

Again User mode and Kernel mode comes into play to make this possible.

PS: One example of a multiprocessing OS that is not Multiuser would be Windows 98.

Users and Groups

Users and group are used for file permission inside of the system. Identified by a User ID (UID) and a User Group ID, a Process can only interact with the files that allow the level of permission that was granted to its User/Group.

Unix-Like systems also has a Root or SuperUser which is a user that is exempt for the permission check, which make it able to access almost any resource on the system.

Processes

A process can be defined either as “an instance of a program in execution” or as the “execution context” of a running program. At its core, it is nothing more than a handful of register values and an list of instruction to be ran from memory.

It is the job of the Kernel to split the CPU between different processes. To do that, the Kernel will use a Scheduler that will decide when to preempt (remove from the CPU) a running process for another. This is mostly done by timing a given process and removing it at the end of the timer.

Unix-Like systems subscribe to the process/kernel model. Each process will run as if it is the only process on the machine and has exclusive access to its services. The process will access those services using a system call (a request to the kernel). As a result of a system call, the hardware will switch into Kernel mode and run the related kernel routine. Once the request fully executed, the hardware will return into User mode and the result will be returned. GOOD TIME FOR A VENDING MACHINE AND CANADA DRY ANALOGY. WINK WINK

Kernel Architecture

Time for us to talk about the age old fight: Monolithic Kernel VS Microkernel

Lets define both:

Microkernel:

Only integrate the strict minimum (a very small set of functions from the kernel, generally including a few synchronization primitives, a simple scheduler, and an interprocess communication mechanism.)
don't contain device driver (relegated to higher ring or User Mode)

Monolithic Kernel:

Will contain Much more than a Microkernel
Will contain the device driver (in Kernel Mode)

Linux, FreeBSD, OpenBSD, NetBSD are all Monolithic Kernel.

Gnu Hurd is a Microkernel.

Mac OS uses an Hybrid Kernel (in between)

There once was a time when it was theorize that Microkernel were superior: They were smaller, easier to maintain and less prone to bugs.

One fact broke this belief: Switching between User and Kernel Mode add overhead on a simple call. Lets take a read call, under a Monolithic Kernel you would only switch 2 times, once at the start of the syscall and once at the end. In a Microkernel, if we assume that it is perfectly executed, this would already be bumped up to 4 switch.

This is a big reason why Mac OS, which started up with a pure Microkernel ended up merging some feature back into the kernel to reduce overhead.

The module feature also added all the benefits of the Microkernel without the drawback. It is possible in Linux to load any part of the kernel as a module when needed (using object file). Because the code is linked at runtime it doesn't result in the same performance overhead (except for the initial loading of course).

Unix FS

The kernel offers syscalls to access files but does not seek to interpret them.

From the user's point of view, the files are organized in a tree-structure with each node being a directory except all the leaves which are files

Each Unix process has a Current Working Directory which tells you the directory that is used by the process. To access a file, a process will need a Pathname, it will be constructed of nodes of the tree-structure separated by slashes.

A Pathname starting with a "/" is deemed to be absolute as it will ignore the Current Working Directory and start at the root of our tree

A Pathname that doesn't with a "/" is deemed relative as it will start at the Current Working Directory

"." and ".." can also be used:

"." means the Current Working Directory

".." the directory before the Current Working Directory

".." can also be chained to go higher and higher in our tree, ".." at the root level will leave you at the root level

File Type

There is 7 file type:

Regular file
Directory
Symbolic link (address NOW)
Block-oriented device file (Hard drive)
Character-oriented device file (serial communication)
FIFO (Named pipe)
Socket

Unix makes a distinction between the content of a file and the information about the file.

When you access a file, you will be only accessing its content.

To access the information about the file, you will need to access the inode of the file.

There is 1 inode per file and the inode will contain:

File type
Number of hard links associated with the file (time to explain)
File length in bytes
identifier of the device containing the file
Inode number that identifies the file within the filesystem
UID of the file owner
User group ID of the file
Several timestamps that specify the inode status change time, the last access time, and the last modify time
Access rights and file mode

There are three types of access rights read, write, and execute for each of these three classes(Owner Group Other/everyone). In reality, the set of access rights associated with a file consists of nine different binary flags. Three additional flags, called suid (Set User ID), sgid (Set Group ID), and sticky, which define the file mode. These flags have the following meanings when applied to executable files:

suid

A process executing a file normally keeps the User ID (UID ) of the process owner. However, if the executable file has the suid flag set, the process gets the UID of the file owner.

sgid

FOR FILES A process executing a file keeps the user group ID of the process group. However, if the executable file has the sgid flag set, the process gets the user group ID of the file.

FOR DIR This will set the default group of files created under this dir to the group of the dir

sticky

FOR UNIX An executable file with the sticky flag set corresponds to a request to the kernel to keep the program in memory after its execution terminates.

FOR LINUX, this isn't used on file anymore. it can be used on a directory to restrict file deletion to only be possible by the owner (not the group)

When a file is created by a process, it will take the UID and GID of the process, unless the sgid of the dir is set where it will take the dir GIB

File handling syscall

When a process accesses the contents of either a regular file or a directory, it is accessing some data stored in a hardware block device. Because a process in User Mode cannot directly interact with the low-level hardware components, each actual file operation must be performed in Kernel Mode. Therefore, the Unix operating system defines several system calls for file handling.

Here are these syscalls:

Opening a file

 fd = open(path, flag, mode)

path

Path of the file to be opened.

flag

Specifies how the file must be opened (e.g., read, write, read/write, append). It also can specify whether a nonexisting file should be created.

mode

Specifies the access rights of a newly created file.

Opening a file will return a File Descriptor (fd) and it contains:

Some file-handling data structures, such as a set of flags specifying how the file has been opened,
An offset field that denotes the current position in the file from which the next operation will take place (the so-called file pointer), and
Some pointers to kernel functions that the process can invoke. The set of permitted functions depends on the value of the flag parameter.

File descriptor are arrange in a table (File Descriptor Table), which is unique per process.

PS: By default, a process has 3 File Descriptor:

0 Standard input

1 Standard output

2 Standard error

The file descriptor will be pointing to an open file object (struct file) which itself will point to an inode ^[2]

One open file object can be shared along multiple File Descriptor but each open file object will only be used with one process.

Accessing an opened file

Regular Unix files can be addressed either sequentially or randomly, while device files and named pipes are usually accessed sequentially.

By default, Sequential access is assumed: the read( ) and write( ) system calls always refer to the position of the current file pointer. To modify the value, a program must explicitly invoke the lseek( ) system call.

newoffset = lseek(fd, offset, whence);

fd

Indicates the file descriptor of the opened file

offset

Specifies a signed integer value that will be used for computing the new position of the file pointer

whence

Specifies whether the new position should be computed by adding the offset value to the number start of the file, the end or the stored pointer.

nread = read(fd, buf, count);

fd

Indicates the file descriptor of the opened file

buf

Specifies the address of the buffer in the process’s address space to which the data will be transferred

count

Denotes the number of bytes to read

nread

Returns the number of bytes successfully read.

Write is used in the same way.

Closing a file

res = close(fd);

Returns 0 on success and -1 on error

When a process is terminated, all its files are automatically closed.

Renaming and deleting a file

res = rename(oldpath, newpath);
res = unlink(pathname);

Unix Kernel

The Process/Kernel Model

We know that our CPU can either run in User Mode or Kernel Mode, and that when a process requests a kernel service (with a syscall) it will switch to Kernel Mode temporarily to fulfil the request.

Lets keep in mind that a Process is a dynamic entity that has a limited life span.

It is the Kernel's job to Create, Eliminate, and Synchronize processes using kernel routines.

The kernel is NOT a process but it IS a process manager.

{Besides user processes, Unix systems also include a few privileged processes called Kernel Threads with the following characteristics:

They run in Kernel Mode in the kernel address space.
They do not interact with users, and thus do not require terminal devices.
They are usually created during system startup and remain alive until the system is shut down.}

lets talk about process again.

Syscalls are not the only way to reach kernel mode while running a Process:

The CPU executing the process signals an exception, which is an unusual condition such as an invalid instruction. The kernel handles the exception on behalf of the process that caused it.
A peripheral device issues an interrupt signal to the CPU to notify it of an event such as a request for attention, a status change, or the completion of an I/O operation. Each interrupt signal is dealt by a kernel program called an interrupt handler. Because peripheral devices operate asynchronously with respect to the CPU, interrupts occur at unpredictable times.
and as stated the Kernel Thread

Process Implementation

To be able to manage a process, the kernel will need some information about the current state of the process.

That information is stored in a Process Descriptor (struct task_struct).

When the kernel stops the execution of a process, it will save the content of multiple processor register in the Process Descriptor, here is some example:

The program counter (PC) and stack pointer (SP) registers
The general purpose registers
The floating point registers
The processor control registers (Processor Status Word) containing information about the CPU state
The memory management registers used to keep track of the RAM accessed by the process

When the kernel wants to resume the execution of the process, it will load all those register back onto the processor.

A process not currently being executed will be waiting, most likely in a queues that will trigger at a specific event.

Process Address Space

Each process runs in its private address space. A process running in User Mode refers to private stack, data, and code areas. When running in Kernel Mode, the process addresses the kernel data and code areas and uses another private stack.

While it appears to each process that it has access to a private address space, there are times when part of the address space is shared among processes. In some cases, this sharing is explicitly requested by processes; in others, it is done automatically by the kernel to reduce memory usage.

If the same program, say an editor, is needed simultaneously by several users, the program is loaded into memory only once, and its instructions can be shared by all of the users who need it. Its data, of course, must not be shared, because each user will have separate data. This kind of shared address space is done automatically by the kernel to save memory.

Processes also can share parts of their address space as a kind of interprocess communication, using the “shared memory” technique introduced in System V and supported by Linux.

Finally, Linux supports the mmap( ) system call, which allows part of a file or the information stored on a block device to be mapped into a part of a process address space. Memory mapping can provide an alternative to normal reads and writes for transferring data. If the same file is shared by several processes, its memory mapping is included in the address space of each of the processes that share it.

Reentrant Kernels

All Unix kernels are reentrant. This means that several processes may be executing in Kernel Mode at the same time. Of course, on uniprocessor systems, only one process can progress, but many can be blocked in Kernel Mode when waiting for the CPU or the completion of some I/O operation. For instance, after issuing a read to a disk on behalf of a process, the kernel lets the disk controller handle it and resumes executing other processes. An interrupt notifies the kernel when the device has satisfied the read, so the former process can resume the execution.

One way to provide reentrancy is to write functions so that they modify only local variables and do not alter global data structures. Such functions are called reentrant functions . But a reentrant kernel is not limited only to such reentrant functions. Instead, the kernel can include nonreentrant functions and use locking mechanisms to ensure that only one process can execute a nonreentrant function at a time.

If a hardware interrupt occurs, a reentrant kernel is able to suspend the current running process even if that process is in Kernel Mode. This capability is very important, because it improves the throughput of the device controllers that issue interrupts. Once a device has issued an interrupt, it waits until the CPU acknowledges it. If the kernel is able to answer quickly, the device controller will be able to perform other tasks while the CPU handles the interrupt.

Now let’s look at kernel reentrancy and its impact on the organization of the kernel.

First we need a definition.

A Kernel Control Path denotes the sequence of instructions (some code) executed by the kernel to handle a system call, an exception, or an interrupt.

One example of a Kernel Control Path would be an interrupt handler.

A Kernel Control Path can be interrupted under rare circumstances.

Synchronization and Critical Regions

Implementing a reentrant kernel requires the use of synchronization.

For example, if A Kernel Control Path is running and needs to Read than Write, to Increment, a Global Variable, if it is preempted by another in the middle of that read write process, the Global variable will be corrupted.

This issue is called a Race Condition.

Most time, the solution to this is using Atomic Operations. (which are single, non-interruptible operation)

If the data structure cannot be accessed in a single operation (for example a linked list) a simple Atomic Operation wouldn't work. Theses sections would be called a Critical Regions.

This can also apply to processes sharing data but we will be focusing on how to synchronizing Kernel Control Paths.

Kernel preemption disabling

One solution would be to remove the possibility of being preempted in Kernel Mode.

A Kernel Control Path could even relinquish control voluntarily but it would need to leave all data structures in a consistent state (not broken), and it would need to recheck any value that was previously accessed that could have changed.

If you wanted to implement this with more granularity, you could disable kernel preemption only in critical regions and reenable it when you leave them.

This would work on a single core system but would fail as soon as you add more cores, a Kernel Control Path from another core could still be an issue.

Interrupt disabling

Another way would be to disable hardware interrupts while in a critical region, but this offers the same issues as the previous solution on multicore systems.

Semaphores

Semaphores are effective on single core and multicore machines:

An integer variable
A list of waiting processes
Two atomic methods: down( ) and up( )

down( ) reduces the integer by 1, if the integer is lower than 0 it will put its name on the list of waiting process.

up( ) will add 1 to the integer and than Execute a process from the waiting list (if it exist)

Each data structure in a critical region would need to be protected by a Semaphore.

Mutex (Modern kernel)

For simplicity, a Mutex is like a Semaphore but:

only one process can lock it at a time
only the process that locked it can unlock it

In the kernel, Semaphores and Mutexes are called Sleeping locks^[3]

Spin locks

Semaphores while being useful can be slow, if your goal is that another core on your machine doesn't use a resource that you will use for very limited time, the upkeep of the waiting list can be a limitation.

The solution is a Spin Lock. It works like a semaphore but without the waiting list which forces a process waiting on a locked Spin Lock to.... Spin.

Note that Spin Lock are useless on a Single core machine as they will result in an infinite loop.

Deadlocks

Deadlocks occur when 2 Kernel Control Path each lock a resource needed by the other.

For example

Kernel Control Path 1 locked resource A and need resource B

while

Kernel Control Path 2 locked resource B and need resource A

This will lock both Kernel Control Path and prevent them from going further.

The solution to this is better design, the Linux kernel will request resources in a specific order in order to sidestep the issue.

Signals and Interprocess Communication

Process think User Mode

Unix signals provide a mechanism for notifying processes of system events. Each event has its own signal number, which is usually referred to by a symbolic constant such as SIGTERM. There are two kinds of system events:

Asynchronous notifications (Triggered from the outside)

For instance, a user can send the interrupt signal SIGINT to a foreground process by pressing the interrupt keycode (usually Ctrl-C) at the terminal.

Synchronous notifications (Triggered from the inside)

For instance, the kernel sends the signal SIGSEGV to a process when it accesses a memory location at an invalid address. (Memory access violation)

The Linux defines about 31 different signals, 2 of which are user-definable and may be used as a primitive mechanism for communication and synchronization among processes in User Mode. ^[4]

In general, a process may react to a signal delivery in two possible ways:

Ignore the signal.
Asynchronously execute a specified procedure (the signal handler).

If the process does not specify one of these alternatives, the kernel performs a default action that depends on the signal number. The five possible default actions are:

Terminate the process.
Write the execution context and the contents of the address space in a file (core dump) and terminate the process.
Ignore the signal.
Suspend the process.
Resume the process’s execution, if it was stopped.

Kernel signal handling is rather elaborate, because the POSIX semantics allows processes to temporarily block signals. Moreover, the SIGKILL and SIGSTOP signals cannot be directly handled by the process or ignored.

Since introduced by ATT’s Unix System V, Unix kernels use IPC (InterProcess Communication).

It is composed by:

Shared memory shmget( )
Message queues msgget( )
Semaphores semget( )

You must use syscalls to acquire them and IPC resources are persistent, they must be deallocated by their owner (or the superuser) to be removed.

Shared memory provides the fastest way for processes to exchange and share data. A process starts by issuing a shmget( ) system call to create a new shared memory of a specific size. After obtaining the IPC resource identifier, the process invokes the shmat( ) system call, which returns the starting address of the new region within the process address space. When you want to detach the shared memory from its address space, it invokes the shmdt( ) system call.

The implementation of shared memory depends on how the kernel implements process address spaces.

Message queues allow processes to exchange messages by using the msgsnd( ) and msgrcv( ) system calls, which insert a message into a specific message queue and extract a message from it, respectively. (not saying based on POSIX standard (IEEE Std 1003.1-2001))

and Semaphores are the one you already know but made for User Mode.

Process Management

Lets start by talking about 3 syscalls:

exec() -like

Will create a new process out of a given program. This will be done in a brand new address space

_ exit()

Will terminate the process.^[5]

fork()

Will create a new child process, the process that ran fork() will be the parent, they can find each other by a data structure which give the parent access the all their child and the child access to their parent.

The child created will have a "copy" of the data and the code from the parent process. (this is done by the hardware paging unit which will do Copy on Write and defer page duplication for as long as possible).

if a child gets _exit() it will send the parent process a signal SIGCHLD (ignored by default).

Zombie processes

When a process gets terminated, it becomes a Zombie until its parent releases it.

The way a parent process would do that is with the wait3() or wait4() which will allow to wait until a child terminates to extract data about the Zombie process and releases its memory.

What if the Parent gets terminated, what happens to the child?

The child process would keep on running and the Parent would have to be released using wait4(). That job will be done by PID1 The init (SystemD). The init will issue frequent wait4() to take care of any zombie process directly under it and this will put all of the zombie's childs under the init's control.

This is how a process started by a shell can still run even when the shell gets terminated.

Take note that wait4() is considered nonstandard in Linux, it is recommended to use waitpid() or waitid() which does the same thing but requires you to specify and identification for the process.

Sessions and Process groups

Processes can also have groups, groups are used to send signals to multiple process at the same time.

To identify them, the Process Descriptor will contain a Process Group ID (PGID) that will be equal to the Process ID of the first Process of the group called Process Group Leader.^[6]

A shell with that is compatible with process group like Bash would put all process from this command under the same group:

ls | sort | more

When opening a Unix system, you will be prompted to log into your account. The system will then execute a Shell (Like Bash) for you to interact with the system.

That Shell will be a Session Leader and will have a Session ID equal to its Process ID.

Multiple Group can be part of 1 Session but only one can be in the foreground and output to the shell.

The other Process Group will be in the background and if they attempt the read or write the the terminal they will receive a signal (SIGTTIN or SIGTTOUT) to let them know.

A New process will inherit the Process Group ID and the Session ID of its parent.

Memory Management

Virtual memory

All recent Unix systems provide a useful abstraction called Virtual Memory . Virtual memory acts as a logical layer between the application memory requests and the hardware Memory Management Unit (MMU). Virtual memory has many purposes and advantages:

Several processes can be executed concurrently.
It is possible to run applications whose memory needs are larger than the available physical memory.
Processes can execute a program whose code is only partially loaded in memory.
Each process is allowed to access a subset of the available physical memory.
Processes can share a single memory image of a library or program.
Programs can be relocatable — that is, they can be placed anywhere in physical memory.
Programmers can write machine-independent code, because they do not need to be concerned about physical memory organization.

The way to achieve that is to have a Virtual Address Space different from the Physical Memory Address. The CPU is equipped to handle Virtual Memory and will translate the Virtual address to its Physical equivalent.

Memory on X86_64 CPUs is handled in Pages which are 4KB long, the kernel keeps a Page Table that specify where the CPU should look for a given Page.

This makes Memory handling easier as you don't need Physical Memory to be contiguous for Virtual Memory to be.

Random access memory usage

The kernel has to do a few things with RAM:

Store the Kernel.
Satisfy the request from the kernel (for buffers, descriptors and more...).
Satisfy the request from the Processes.
Create caches to get better performance out of the system.

Along with these task a kernel should:

Be fast
Seek to reduce fragmentation on the Physical memory as an external Memory Controller (IOMMU) may operate at a reduced speed as they do not use virtual memory.
Minimize wasted memory

To achieve this goal, Linux will use 2 system:

Buddy system

The buddy system seeks to divide the memory into chuck and grant the smallest power of 2 to the process questing it.

Here is how it works:

Lets say that we have 128KB of memory and we need to store 25KB of data

The kernel will divide the 128KB into 2 "Buddies" of 64KB and will repeat this operation until it cannot go smaller and fulfill the request.

This will result in a 32KB block being allocated.

When the 32KB block would be released by the process, the kernel would fuse as much unused buddy as possible.

Slab

The kernel will take 1 or more page of contiguous memory and create a "slab", The inside of the slab will be divided in a repeated pre-allocated size.

For example:

Lets have a Slab 3 pages long (12KB) and storing 2KB chucks.

This would allow to store 6 chucks.

Slabs are use for small objects

Process virtual address space handling

The kernel stores a process virtual address space as a list of Memory Descriptor (struct mm_struct).

This will contain pointers to:

The executable code of the program
The initialized data of the program
The uninitialized data of the program
The initial program stack (i.e., the User Mode stack)
The executable code and data of needed shared libraries
The heap (the memory dynamically requested by the program)

Now if that seems like a lot, keep in mind that Unix kernel use Demand Paging.

Demand paging is the idea that we are not going to put data into memory if it wasn't requested. In practice, your program will run into non-present pages, that will trigger an exception that will get the MMU to free a page an initialize it with the required data.

Caching

The most limiting factor on your computer is most likely your hard drive. Even with an SSD, it is still at least 100x slower than ram.

To improve on this limitation, the kernel will delay disk access as much as possible by creating caches.

A read might stay in memory long after the process that requested it is terminated and

A write might be kept in ram for a long time.

This allows the kernel to check the faster cache instead of the slower hard drive, when it find what it needs, it is called a Cache Hit.

The kernel can be forced to write all the outstanding writes by issuing a sync() syscall. This will synchronize all the dirty buffers (buffers containing not yet written data) to the disk.

The system will write buffers to storage periodically to avoid data loss.

Device Drivers

The kernel interact with I/O Devices using Device Drivers. The Device Driver allows a few things:

Device-specific code can be encapsulated in a specific module and loaded dynamically (without reboot).
Vendors can add new devices by only knowing the interface specification (not the whole kernel).
The kernel can deal with Device Drivers in a standardized way.

The use will have access to the device either using syscalls or by interacting with the Device File (normally in /dev, like /dev/sda for the first drive.).

Linux Kernel Internals