VFS

Schematics

Virtual Filesystem

VFS is the mid layer for FS, when a program call a write() or read() it is sent to VFS that then send it to the appropriate FS/other code. provide a standard for simple operations.

for the big brain: [1][2]

Directory entry cache / inode cache

the dentry contain: name, pointer to parent inode, pointer to inode, bool for network driver (to say that you revalidate with DCACHE_OP_REVALIDATE)

Inodes have unique inode number that are derived but not identical to FS inode number

Dentry/inode cache are purely volatile beings, they don't get saved

inodes can be one of :

  • regular file
  • directory
  • symbolic link
  • block device
  • character device
  • socket
  • FIFO (First in, first out: Named Pipe)

VFS also takes care of registering and mounting FS

Registering with struct file_system_type (Big linked list)

Mounting with the superblock object

History:

0.96a:

VFS added to the kernel (Right before EXT 0.96c)

6.12

VFS now allows bigger block size than the system page size, first implementation with XFS This will allow bigger FS/Filesize and optimization with some hardware. [3]

Reduced the file struct size, representing an open file, Right now, (focusing on x86) struct file is 232 bytes. After this series struct file will be 184 bytes. they mostly did this by shoving this out that shouldn't have been there in the first place [4]

EXPLAINATION: [5]

6.13

There is a new sysctl knob, fs.dentry-negative, that controls whether the virtual filesystem (VFS) layer deletes a file's kernel-internal directory entry ("dentry") when the file itself is deleted. It seems that some benchmarks do better when dentries are removed, while others benefit from having a negative dentry left behind, so the kernel developers have put the decision into the system administrator's hands. The default value (zero) means that dentries are not automatically deleted, matching the behavior of previous kernels.[6]

There have been some deep reference-counting changes within the VFS layer that yield a 3-5% performance improvement on highly threaded workloads[6]

6.16

VFS Freeze support. Allows the userspace to initiate the freezing of a filesystem for suspend/hibernation. This makes sense when you know that suspend/hibernation is something that is mostly done in userspace (SystemD). This would allow less chance of data corruption by freezing the FS.[7]

6.17

2 NEW syscalls to set filesystem inode attribute: file_getattr() and file_setattr()

They come to replace FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR which:

  1. fix the very bad name making you think they have anything to do with extended attribute (which they don't)
  2. The original syscalls could not apply to special files like FIFO, SOCKET.... which means that you could have a weird problem where if you where to set the inode attribute of a directory (for example for XFS's quota projects) it would be applied recursively on any inode created under it. the problem comes if you create a special file in that dir. there will be no way for you to clear or change that attribute on that special file's inode... the new syscalls solve that by working with special files too.[9]

NEW flag for fallocate() syscall: FALLOC_FL_WRITE_ZEROES

fallocate() is used to preallocated files efficiently. if you want to preallocate a file full of zeros, you would use the FALLOC_FL_ZERO_RANGE flag. This would, in most FS, leave the preallocated block in an unwritten state.(So you would have a note on your extent saying, you own these block, write zeros ion first write.) This will result in numerous metadata changes and journal I/O which is probably not what you want if you are preallocating a file full of zeros. (; The usual solution to this issue would be to DD a big file full of zeros. This is obviously atrocious while you are preallocating but it leaves you with a perfect chunk of zeros on you FS. The new flag allows you to use modern commands in SSD's to zero your reallocated chunk of FS. SSDs usually have a "unmap write zeroes" command that will assume zeros over a set range until written. This allows you to get your preallocated file full of zeros without the fuss of spamming your disk. Using this on a disk without the feature will result in you spamming zeros but keep in mind that virtually all SSD techs have this ability. Support for this flag was added to EXT4 as a working example.[10]

Will Let Multi-Device Filesystems Better Cope With Losing A Disk. VFS normally makes a "shutdown" call as soon as a block-device used by an FS is removed. This results in the FS shutting down even if it could manage without it. it will now use a new super_operations::remove_bdev() callback which will give a change for the FS to manage the crisis. It is currently not implemented on the FS side but we should probably see some movement with BTRFS next release.[8]

Sources: