General Functions

"New" mounting API

Adds mostly better logging and the possibility for upgrades.[1]

Lockdep

Lockdep (short for Lock Dependency) is a debugging tool built into the Linux kernel. It's designed to detect potential deadlocks that could occur due to inconsistent lock ordering. If Lockdep detects a circular dependency in lock acquisition order, it reports this as a potential deadlock situation.

source for the big brain[2]

Torn Write

Partial Write, also call tearing.

Buffered

Having the data in a buffer, not feeding the data char-per-char. Sitting at an address in memory.

Atomic write

Atomic writes refer to the ability of a system call to write a contiguous block of data to a file without interruption. "Cannot be divided" [3]

Block Device

/dev/sda is a block device, meaning a device controlled by block. the block subsystem take care of the talking between the device and the code (often FS code)

PS block device's block size cannot be bigger than page size (normally 4K)[4]

Secure VM Service Module (SVSM)

AMD ONLY uses function like those included by SEV . As the host cannot be trusted SVSM is a pipe between Firmware and guest VM that bypass host.[5][6]

Fast CPPC

Fast Collaborative Processor Performance Control is a driver that permits optimization of the frequency on a per core basis to get more perf from the same amount of power. [7]

FineIBT

Fine Indirect Branch Tracking is an hardware enhanced version of CFI.

Works with Intel CPU and is called Branch Target Identification (BTI) for ARM64[10]

Enabled by default in 6.2[11]

Control Flow Integrity (CFI)

Using smart compiler (LLVM) the kernel can be compiled with a smart way(jump table) to confirmed that a function is going/returning to a valid address and not to an infected program.[12]

DAMON

Data Access MONitor (DAMON) is a data access monitoring framework for DRAM[26]

Ex. of userspace implementation GitHub - awslabs/damo: DAMON user-space tool

Kernel Memory Sanitizer

KMSAN is a dynamic error detector aimed at finding uses of uninitialized values. It is based on compiler instrumentation (CLANG only) KMSAN is not intended for production use.[27]

Perf

Also called perf_events, it can instrument CPU performance counters, tracepoints, kprobes, and uprobes (dynamic tracing). It is capable of lightweight profiling.

Performance counters are CPU hardware registers that count hardware events such as instructions executed, cache-misses suffered, or branches mispredicted. They form a basis for profiling applications to trace dynamic control flow and identify hotspots. perf provides rich generalized abstractions over hardware specific capabilities. Among others, it provides per task, per CPU and per-workload counters, sampling on top of these and source code event annotation.[28]

Direct Rendering Manager (DRM)

DRM is the in between GPU driver and program like X that ensure no process block another from accessing the GPU. also standardize communications with the GPU. [29]

ROCm

AMD's code to enable their GPU to: general-purpose computing on graphics processing units (GPGPU), high performance computing (HPC), heterogeneous computing. It offers several programming models: HIP (GPU-kernel-based programming), OpenMP (directive-based programming), and OpenCL.

Works with the #AMDKFD driver

AMDKFD

AMD Kernel Fusion Driver is the driver to make #ROCm and opencl work.

in other words, kernel driver for computing on GPU.

Panfrost

GPU #Direct Rendering Manager (DRM) & MESA driver for some Arm [30]

SYSFS

SYSFS is a pseudo FS made to expose infos and config of linux subsystem and FS. /sys [31]

IDMAPPED Mounts

different mounts can expose the same file or directory with different ownership. help when you want to use your home dir on multiple computer of manage a FS that doesn't have permission (like fat & exFAT) with multiple user.[32]

vDSO

The "vDSO" (virtual dynamic shared object) is a small shared library that the kernel automatically maps into the address space of all user-space applications. There are some system calls the kernel provides that user-space code ends up using frequently, to the point that such calls can dominate overall performance. TLDR, Kernel mode take more perf to run than exposing lib to all app. [33]

User Name-space

User namespaces isolate security-related identifiers and attributes, in particular, user IDs and group IDs. User namespaces can be nested; that is, each user namespace except the _initial ("root")_[34]

VFS

Relocated VFS

NUMA

Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its own local memory faster than non-local memory.

IN LINUX, Linux divides the system’s hardware resources into multiple software abstractions called “nodes”. Linux maps the nodes onto the physical cells of the hardware platform, abstracting away some of the details for some architectures. As with physical cells, software nodes may contain 0 or more CPUs, memory and/or IO buses. And, again, memory accesses to memory on “closer” nodes–nodes that map to closer cells–will generally experience faster access times and higher effective bandwidth than accesses to more remote cells.

TLDR: Linux will bundle resources that are closer(faster) together to enhance performance.[35]

Landlock

The goal of Landlock is to enable to restrict ambient rights (e.g. global filesystem or network access) for a set of processes. added 5.13[36]

Scatter-Gather I/O (S/G)

Some applications may need to read or write data to multiple buffers, which are separated in memory. Although this can be done easily enough with multiple calls to read and write, it is inefficient because there is overhead associated with each kernel call. Instead, many platforms provide special high-speed primitives to perform these _scatter-gather_ operations in a single kernel call.[37]

Page Attribute Table (PAT)

x86 Page Attribute Table (PAT) allows for setting the memory attribute at the page level granularity.

WB|Write-back

UC|Uncached

WC|Write-combined -> Take multiple small write to mem to burst it later to storage.

WT|Write-through

UC-|Uncached Minus[38]

IOMMU

Input–output memory management unit (IOMMU) is a memory management unit (MMU) connecting a direct-memory-access–capable (DMA-capable) I/O bus to the main memory. Like a traditional MMU, which translates CPU-visible virtual addresses to physical addresses, the IOMMU maps device-visible virtual addresses (also called device addresses or memory mapped I/O addresses in this context) to physical addresses. [39]

DeviceTree

An operating system used the Device Tree to discover the topology of the hardware at runtime, and thereby support a majority of available hardware without hard coded information. DT is often in the firmware on a board. [40]

KPTI

Kernel page-table isolation fixes these leaks(Meltdown) by separating user-space and kernel-space page tables entirely. [41]

Struct_ops

plug into kernel code that allows userspace to inject bpf program to run part of the code, the first implementation was a TCP congestion program but it is now a little bit everywhere. [42]

StackLeak

StackLeak is a subsystem made to add more security to kernel memory management. it helps protect against Stack depth overflow (CWE-674), Uninitialized Vars (CWE-457) and Info exposure (CWE-200). VERY SIMPLIFIED the wa y they do it is by making sure that the memory is always clear to a special "poison" value at the end of each syscall and on all uninitialized values.

https://youtu.be/5wIniiWSgUc?si=BdbvjGuuuQrXAGl6&t=233

Sched_ext

Generic CPU Vulnerabilities Reporting

The generic CPU vulnerabilities support reports the various vulnerabilities and whether the running system/CPU is affected by the vulnerabilities and if so the mitigation status. This is conveniently exposed under _/sys/devices/system/cpu/vulnerabilities_ across x86/x86_64, ARM, AArch64, and other architectures. added riscV and loongarch in 6.12[43]

Kernel address space layout randomization (KASLR)

added in 3.14, enables address space randomization for the Linux kernel image by randomizing where the kernel code is placed at boot time. [44]

Bus lock

A split lock is any atomic operation whose operand crosses two (CPU)cache lines. Since the operand spans two cache lines and the operation must be atomic, the system locks the bus while the CPU accesses the two cache lines.

A bus lock is acquired through either split locked access to writeback (WB) memory or any locked access to non-WB memory. This is typically 1000x of cycles slower than an atomic operation within a cache line. It also disrupts performance on other cores and brings the whole system to its knees.[45]

Error Detection And Correction (EDAC)

Subsystem to manage errors with PCI devices and ECC memory

System Management Mode (SMM)

System management mode is an execution mode in x86 processors that can only be entered via an #System management interrupt (SMI). called Ring -2 or "Black box" as this code is not possible to debug/see while executed. [46]

System management interrupt (SMI)

System management interrupts are high priority unmaskable hardware interrupts which cause the CPU to immediately suspend all other activities, including the operating system, and go into a special execution mode called #System Management Mode (SMM). Once the system is in SMM, the interrupt is handled by firmware code. [46]

VirtIO

Subsystem that allows to passthrough HW to KVM or also allow data sockets #VirtIO Vsock

VirtIO Vsock

Vsock are data socket meant to replace network socket with better perf, by bypassing iptables, netfilter... all the network stuff that wouldn't be needed in a Host(Hypervisor)-guess context[47]

Integrity Policy Enforcement (IPE)

Integrity Policy Enforcement (IPE) relies on immutable security properties of the system component and is engineered for fixed-function systems like network firewall devices, IoT platforms, etc, that are only ever running certain application-targeted code. TLDR only execute what is immutable to remove the possibility of foreign code breaking you system. [48]

Replay Protected Memory Block (RPMB)

RPMB is a several year old specification for having a portion of memory be more secure and accessed via a hidden security key. The RPMB block in eMMC can be used for matters like storing DRM protection keys, OEM security keys, and other information that can't -- for whatever legal or security reasons -- can't be stored via normal storage. RPMB aims to be tamper resistant and requires authentication for reads/writes. [49]

XZ Embedded

Code used to decompress the kernel at boot time [50]

Protected KVM (pKVM)

Arm confidential computing side adds support for booting an ARM64 kernel as a protected guest under Android's Protected KVM "pKVM" hypervisor. History Android was always a fragmented mess but turns out the kernel and hypervisor world of android was even worst, every model had a different kernel and may have a hypervisor without any standard. There was some initiatives that were created to fix that, GKI Generic Kernel Image to standardize what a kernel for android should be and how a vendor can add to it. pKVM is the logical extension of that issue, we need to standardize the Hypervisor so lets add one. now there are security implications read >> [52] [51]

Big Kernel Lock (BKL)

removed in 2.6.39 (2011) for finer-grained locking.

BKL was a kernel wide lock, in other words only one thread was able to operate in kernel space.[53]

Shadow Stack

The shadow stack itself is a second, separate stack that "shadows" the program call stack. When needed, the shadow stack and main stack will compare each other, showing if an attack was attempted. Guarded Control Stack (GCS) is a type of Shadow Stack.[54]

virtio-mem

The main goal of virtio-mem is to allow for dynamic resizing of virtual machine memory. virtio-mem provides a flexible, cross-architecture memory hot(un)plug solution that avoids many limitations imposed by existing technologies, architectures, and interfaces: paravirtualized memory hot(un)plug.[55]

statmount()

listmount()

Merged back at the start of the Linux 6.8 merge window were the VFS mount API updates that introduce two new system calls: statmount() and listmount() for reading more detailed information about file-system mounts. [56]

Second Level Address Translation (SLAT)

Rapid Virtualization Indexing (RVI) for AMD.

Extended Page Table (EPT) For Intel.

Stage-2 page-tables For ARM.

basicaly:

Non VM mem handling: process → Page table/TLB → Physical mem

VM mem handling: process → Page table/TLB(Guest) → Page table/TLB(Host) → Physical mem

The first solution to help that was:

Software based shadow page table

Shadow page tables translate guest virtual addresses directly to host physical addresses. Each VM has a separate shadow page table and the hypervisor is in charge of managing them. While shadow page tables are faster than double translation, they are still expensive compared to not running in a virtual machine: every time a guest updates its page tables, it requires the hypervisor to also manage changes in the shadow tables.

SLAT

Using a special extension the VM is able to directly translate guest to physical mem.

Time Stamp Counter (TSC)

The Time Stamp Counter (TSC) is a 64-bit register present on all x86 processors since the Pentium. It counts the number of CPU cycles since its reset. The instruction RDTSC returns the TSC in EDX:EAX.[58]

PID

Process ID, number unique to a process. at least inside of the same:

PID Namespace

PID namespaces isolate the process ID number space, meaning that processes in different PID namespaces can have the same PID. PID namespaces allow containers to provide functionality such assuspending/resuming the set of processes in the container and migrating the container to a new host while the processes inside the container maintain the same PIDs.[59]

pid_max

The sysctl value that govern the max number of PID. was global before 6.14 but now Namespace specific.

PIDFD

Since Linux 5.1, pidfds have more or less allowed one to refer to a process using a file descriptor, making it possible to eliminate a set of race conditions and ambiguities.[60]

Moved functions

SLAB

Pages

Page writeback

Dirty page

Page fault

Transparent Huge Pages (THP)

Translation Lookaside Buffer (TLB)

Anonymous memory

Folios

kmem_cache

all moved to Mem Management

BPF

Retired sources:

Sources:


  1. : Finishing the conversion to the "new" mount API [LWN.net

  2. : Runtime locking correctness validator — The Linux Kernel documentation

  3. :

  4. :

  5. : The Linux SVSM project [LWN.net

  6. :

  7. : AMD Fast CPPC To Be Merged For Linux 6.11 - Phoronix

  8. :

  9. : What's next for the SLUB allocator [LWN.net

  10. : Indirect branch tracking - Wikipedia

  11. : Linux Moving Ahead With Enabling Kernel IBT By Default - Phoronix

  12. : Control-flow integrity - Wikipedia

  13. : Page Tables — The Linux Kernel documentation

  14. : Linux Page Cache Basics - Thomas-Krenn-Wiki-en

  15. : Understanding and troubleshooting page faults and memory swapping: Site24x7

  16. : Linux_2_6_38 - Linux Kernel Newbies

  17. : Linux_6.8 - Linux Kernel Newbies

  18. : Transparent Hugepage Support — The Linux Kernel documentation

  19. :

  20. :

  21. : Large folios for anonymous memory [LWN.net

  22. : MatthewWilcox/Folios - Linux Kernel Newbies

  23. : Large folios for anonymous memory [LWN.net

  24. : Folios merged for 5.16 [LWN.net

  25. : Folio Improvements For Linux 5.17, Large Folio Patches Posted - Phoronix

  26. : DAMON: Data Access MONitor — The Linux Kernel documentation

  27. :

  28. :

  29. : Direct Rendering Manager - Wikipedia

  30. : Panfrost — The Mesa 3D Graphics Library latest documentation

  31. : sysfs - Wikipedia

  32. : IDMAPPED Mounts Aim For Linux 5.12 - Many New Use-Cases From Containers To Systemd-Homed - Phoronix

  33. :

  34. :

  35. : What is NUMA? — The Linux Kernel documentation

  36. : Landlock: unprivileged access control — The Linux Kernel documentation

  37. :

  38. :

  39. : Input–output memory management unit - Wikipedia

  40. : Linux and the Devicetree — The Linux Kernel documentation

  41. : Kernel page-table isolation - Wikipedia

  42. : Kernel operations structures in BPF [LWN.net

  43. : RISC-V Enabling Generic CPU Vulnerabilities Reporting - Phoronix

  44. : Address space layout randomization - Wikipedia

  45. : 22. Bus lock detection and handling — The Linux Kernel documentation

  46. : realtime:documentation:howto:debugging:smi-latency:start [Wiki

  47. : VSOCK: From Convenience to Performant VirtIO Communication

  48. : Linux 6.12 Landing Integrity Policy Enforcement "IPE" Module - Phoronix

  49. : Replay Protected Memory Block "RPMB" Subsystem Submitted For Linux 6.12 - Phoronix

  50. : XZ data compression in Linux — The Linux Kernel documentation

  51. : KVM for Android [LWN.net

  52. : Linux 6.12 To Support Arm's Permission Overlay Extension - Phoronix

  53. : Giant lock - Wikipedia

  54. : Shadow stack - Wikipedia

  55. : virtio-mem · GitBook

  56. : Linux 6.8 Introduces New Syscalls For More Detailed File-System Mount Information - Phoronix

  57. : Second Level Address Translation - Wikipedia

  58. : Time Stamp Counter - Wikipedia

  59. :

  60. : Pidfd - dankwiki, the wiki of nick black