BcacheFS

was added on linux 6.7

Under the hood

first announced in 2015, BcacheFS is based on Bcache(the work on that was started 2010[1]) and ended up getting him a job at google to work on it). The goal of bcache was to use the speed of an SSD as a cache for a bigger HDD while keeping in mind the performance and restriction of both. bcache was describe by its main dev Kent Overstreet as a prototype for bcachefs, in fact in 2018, 80% of bcachefs code was shared with on bcache.[2]

BcacheFS is, just like BTRFS, making use of a B+ Tree which can be roughly summarize as a binary tree that can hold more than 2 leaves VISUALIZE HERE log structure, so you can ensure fast write by only having to append nodes in case of a change instead of a complete rewrite.[3] The big superpower of the FS is the power of hindsight, Kent was able to sidestep most issues that plagues other FS like ZFS and BTRFS as he was able to engineer his FS with those problem in mind from the start.

Feature

Copy on write (COW)

do not overwrite only write the changes to the data, redirect on write insult....

also support - Nocow mode

Snapshot

Snapshot are implemented differently than other CoW system, instead of copying the B Tree, it adds version key ,also called snapshot id, to FS item (inodes, dirents, xattrs, extents).

The ability to have 2 file with the same root data

Multiple devices and replication

You can use Bcache with multiple drive at a time, have some redundancy and even set a specific job for each drive:

bcachefs format \

--label=ssd.ssd1 /dev/sdA \

--label=ssd.ssd2 /dev/sdB \

--label=hdd.hdd1 /dev/sdC \

--label=hdd.hdd2 /dev/sdD \

--label=hdd.hdd3 /dev/sdE \

--label=hdd.hdd4 /dev/sdF \

--replicas=2 \

--foreground_target=ssd \

--promote_target=ssd \

--background_target=hdd

the replicas command can give you something akin to Raid 1 or Raid10 depending on how many drive you feed into the FS

Full data and metadata checksumming

Ensuring that corruption are detected

Compression

gzip, lz4, and zstd

Encryption

AEAD style encryption (ChaCha20/Poly1305)

more STD things that all FS have:

Xattr

Extended file attributes are additional attributes that only apply to specific file like copyright name or date a picture was taken.

ACL

Access Control Lists is, as the name implies, a list of permission added to the main perm, help do fine tuning of the permission of a file in a complex environment.

Quotas

BcacheFS has what we call user/group/project quotas, a project id can be assign to any Dir and will be enforced recursively

Unfinished feature

Erasure coding / Raid 5&6

the ability to make Raid 5/6 style config, meaning having parity stripped across all of your drive is still in the works, remember BTRFS, it is a big problem for those type of FS[4]

Terms

Stripes:

Bcachefs stripes data by default, similar to RAID0. Redundancy is handled via the replicas option. 2 drives with --replicas=2 is equivalent to RAID1, 4 drives with --replicas=2 is equivalent to RAID10, etc.[5]

Dirty:

What isn't yet saved into permanent storage, isn't the same as cache but don't know why....

Disk accounting:

The collection of size (meta)data on stored file

History

BEFORE in kernel:

argument with linux about norms with Kent [6]

6.7:

Got into the kernel

6.11:

Conversion to the "New" Mounting API. "New" Mounting API

Better lockdep coverage (Lockdep is a tool) LockDep

Self healing on read IO/checksum error

data is now automatically rewritten if we get a read error and then a

successful retry

Disk accounting rewrite

Older Disk accounting wasn't scalable, now it is.

We now have counters for compression type/ratio, per-snapshot-id usage, per-btree-id usage, and pending rebalance work.

Stripe sectors accounting

This is needed for ensuring that erasure coding is working correctly, as

well as completing stripe creation after a crash. [7]

adding another disk accounting counter to be used for counting disk usage and the number of extents per inode number. This "bcachefs_metadata_version_disk_accounting_inum" is needed for Bcachefs to track fragmentation. (so to implement defrag in the futur)

also fixed an issue where a thread would be left running under specific circumstance.[8]

getting in a fight with linus and violating simlpe rules of Kernel devs [9]

In the benchmark BcacheFS is now only slower than BTRFS in 2 benchmark, getting better and better. [10]

6.12

Kent wants his FS to be stable next year and claims it can be 3-4x faster than XFS [11]

linus is thinking about removing after patch leading to unbuildable kernel on big endiand machine.

Lack of transparency and testing by kent and the wider community is to blame. [12]

Bug reduced by 40% [13]

6.13

was banned for this release after Code of Conduct violation

6.14

The bcachefs filesystem has a lot of changes after missing the 6.13 development cycle; these include a major on-disk format change that will require a "big and expensive" format upgrade. These changes include self-healing improvements, filesystem-checking time "improved by multiple orders of magnitude", also "This is planned to be the last big on disk format upgrade before the experimental label comes off" and more; see this merge message for more information.[14]

Sources: