2025-1-2 Htree The secret savior of EXT3

https://youtu.be/K7DXzxCzOeY

added HTREES in Linux 2.5.42 NOT 2.5.40 like Wikipedia suggest

Made by Daniel Phillips

5,000 lines for ext2 btree= x2 that

145-times improvement

Structure (wasn't really used... still unable to use a script and sound good....)

Hello and welcome to this video, Today we are going to talk about HTrees. where do they come from, what do they do and why are they SO OP. coming up

This video will focus on the Htrees which is probably the least well known feature of the EXT family, so prepare to be part of the elite of tech enthusiast with the secret knowledge of Htrees.

So lets get to (The history)

Since ext3 released in 2001 with Linux 2.4.15, it became obvious that the main problem that needed to be fixed with the EXT family was the directory structure.

Lets get a quick refresher on how it works.

So each Directory has a table which contains one entry per file(or directory++...) that are inside of the directory.

If your directory has a 1000 files, you have 1002 entries in your directory table, that's because of the parent entry and the self entry that are added at the beginning of your table.

Lets also keep in mind that everything inside of the filesystem is divided into blocks, which should be the same size as you page size (4K).

If you have too much entries to fit into one block, the Filesystem will allocate a new one for you to fill.

Now why is that bad.

To find a file, your system will read your directory table like a linked list.

If your file is at the start of the table that's fine but if your file is at the end of a 40 000 file long directory table....

The issue of the directory structure was known since EXT2 was created,

in fact, they even made flags to be able to add a Btree to make thing faster.

But Maple, what's a Btree???

A btree is another structure for your data.

Lets take this tree as an example, it contain the same values as our linked list example but can be search way faster.

make the example...

Lets make some quick observations, for a btree to exist, there is at least 2 things that needs to be true:

  1. our values need to be ordered
  2. we need to know how many values we have

with those 2 things, we would be able to find the middle and search through our data as a binary tree.

_why were we even talking about Binary trees?_

Well that's because most Filesystems will use Binary trees to make searching a file way faster.

Most will use a modified version with more than one value per node to make things even faster.

But the EXT devs never implemented it,

Why?

There was 2 rules that the EXT devs gave themselves for new features.

  1. It needs to be simple.
  2. It needs to be forward and backward compatible with older version of the Filesystem.

Meaning it could be transformed from the older version and read with the older version.

Binary trees are fun and all but they break both rule.

The simplest implementation of a Binary tree algorithm was estimated to be twice as big as the EXT2 code.

and making it forward and backward compatible would absolutely make this an implementation nightmare.