BGB
2024-12-11 11:35:22 UTC
Granted, this group isn't very active, but it is one of the few groups
where this sort of thing is on-topic, so...
So, for my project I have made a new experimental filesystem I am
calling DFS2.
Why?...
The existing filesystems didn't really fit what I wanted.
Namely:
Can support Unix style features;
File ownership;
Symbolic links;
...
Not excessively complicated;
Needs to be relatively low overhead.
Neither RAM nor CPU time are abundant.
Target is 50MHz with RAM measured in MB.
I also wanted things like file compression.
But, more for IO optimization than saving space.
Currently I am using FAT32:
Does not natively support the desired features;
FAT chain walking and large clusters are not efficient to deal with;
No built-in compression.
A few existing options:
Minix:
Too limited, missing features.
EXT2:
Not 1:1 with wanted features;
Some aspects of the design seem annoying.
NTFS:
Significantly over complicated.
Most other Linux filesystems:
Tend to have a problem of most being over-complicated;
Most are trying to "aim higher" than EXT2.
Or, are specialized for specific cases, like "intitrd".
General design I went with:
Uses an inode table (kinda like EXT2)
Though, the table is itself represented with an inode (pros/cons);
Superblock gives the address of the inode table
with inode 0 as the inode-table inode.
Inode structure consists of multiple tagged members.
Sorta like the NTFS MFT entries.
Fixed size directory entries
They are 64 byte with a 48 byte name field;
Multiple entries are used for longer names (sorta like FAT).
Though, this should be infrequent,
as names longer than 48 bytes are rare.
Directory entries are organized into an AVL tree.
Tradeoff for AVL:
An AVL tree is more complicated than could have been hoped. But, it is
less complicated than a B-Tree would have been. While a case could have
been made for using linear search (like FAT), it seemed like AVL was
probably worth the complexity (does mean directory lookups are roughly
log2 N).
By my current estimates, likely the break-even point between an AVL tree
and a B-Tree (with k=16) would likely not be reach until around 1000
files, which is well over the size of most directories. If one could
argue that most directories were smaller than around 8-16 files, a case
could have been made for linear search; but AVL doesn't really add much
cost beyond some additional code complexity.
Where, each directory entry encodes:
A name field;
The inode number;
Left, Right, and Parent links;
The Z depth of this node;
Directory Entry Type.
Initially, there was no parent-link in the tree, but the lack of a
parent link added significant complexity to tasks like walking the
directory or rebalancing nodes (it was necessary to keep track of this
externally), so I ended up adding a link. Though, at present this
reduces the maximum theoretical directory size to 2M files (unlikely to
be an issue).
Note that the filesystem is case-sensitive and assumes UTF-8 for
filenames (with some wonk for names longer than 48 bytes).
Block management within inodes:
Uses a scheme similar to EXT2, namely (with 32-bit block numbers):
16 entries, direct block number (16KB with 1K blocks)
8 entries, indirect block (2MB )
4 entries, double indirect block (256MB)
2 entries, triple indirect block (16GB )
1 entries, quad indirect block (4TB )
1 entries, penta indirect block (1PB )
For larger volumes and compressed files, 64 bit numbers are used:
8 entries, direct block number (8K / 128K, with 1K or 16K)
4 entries, indirect block (512K / 8MB )
2 entries, double indirect block (32MB / 512MB)
1 entries, triple indirect block (2GB / 32GB )
1 entries, quad indirect block (256GB / 1TB )
Blocks and inodes are allocated via bitmaps, which are themselves
assigned inodes (and store their data the same as normal files).
Where the reducing the number of entries avoids making the inode bigger.
Currently, the design can use 256 or 512 byte inodes.
The former layout with 64b entries requires a 512B inode.
Had considered spans tables, but didn't seem worth it (though
conceptually simpler, spans are more complicated to deal with in terms
of code complexity).
For the 64-bit block numbers, with compressed files, the high-order bits
are used for some additional metadata (such as how this logical block of
the file is stored, and how many disk-blocks were used).
For file compression:
A larger logical block size is used in this case:
Say, 16K/32K/64K vs 1K/2K/4K.
Currently I am using 16K and 1K.
This larger block is compressed,
and stored as a variable number of disk blocks.
Blocks are decompressed when loaded into a block-cache;
And, re-compressed when evicted (and dirty).
Underlying blocks are dynamically reallocated as needed.
Currently, using a non-entropy-coded LZ77 variant.
Say, along vaguely similar lines to LZ4.
Currently, with all of this, code size is still moderately smaller than
my existing FAT32 driver... Granted, this isn't saying much.
Any thoughts?...
where this sort of thing is on-topic, so...
So, for my project I have made a new experimental filesystem I am
calling DFS2.
Why?...
The existing filesystems didn't really fit what I wanted.
Namely:
Can support Unix style features;
File ownership;
Symbolic links;
...
Not excessively complicated;
Needs to be relatively low overhead.
Neither RAM nor CPU time are abundant.
Target is 50MHz with RAM measured in MB.
I also wanted things like file compression.
But, more for IO optimization than saving space.
Currently I am using FAT32:
Does not natively support the desired features;
FAT chain walking and large clusters are not efficient to deal with;
No built-in compression.
A few existing options:
Minix:
Too limited, missing features.
EXT2:
Not 1:1 with wanted features;
Some aspects of the design seem annoying.
NTFS:
Significantly over complicated.
Most other Linux filesystems:
Tend to have a problem of most being over-complicated;
Most are trying to "aim higher" than EXT2.
Or, are specialized for specific cases, like "intitrd".
General design I went with:
Uses an inode table (kinda like EXT2)
Though, the table is itself represented with an inode (pros/cons);
Superblock gives the address of the inode table
with inode 0 as the inode-table inode.
Inode structure consists of multiple tagged members.
Sorta like the NTFS MFT entries.
Fixed size directory entries
They are 64 byte with a 48 byte name field;
Multiple entries are used for longer names (sorta like FAT).
Though, this should be infrequent,
as names longer than 48 bytes are rare.
Directory entries are organized into an AVL tree.
Tradeoff for AVL:
An AVL tree is more complicated than could have been hoped. But, it is
less complicated than a B-Tree would have been. While a case could have
been made for using linear search (like FAT), it seemed like AVL was
probably worth the complexity (does mean directory lookups are roughly
log2 N).
By my current estimates, likely the break-even point between an AVL tree
and a B-Tree (with k=16) would likely not be reach until around 1000
files, which is well over the size of most directories. If one could
argue that most directories were smaller than around 8-16 files, a case
could have been made for linear search; but AVL doesn't really add much
cost beyond some additional code complexity.
Where, each directory entry encodes:
A name field;
The inode number;
Left, Right, and Parent links;
The Z depth of this node;
Directory Entry Type.
Initially, there was no parent-link in the tree, but the lack of a
parent link added significant complexity to tasks like walking the
directory or rebalancing nodes (it was necessary to keep track of this
externally), so I ended up adding a link. Though, at present this
reduces the maximum theoretical directory size to 2M files (unlikely to
be an issue).
Note that the filesystem is case-sensitive and assumes UTF-8 for
filenames (with some wonk for names longer than 48 bytes).
Block management within inodes:
Uses a scheme similar to EXT2, namely (with 32-bit block numbers):
16 entries, direct block number (16KB with 1K blocks)
8 entries, indirect block (2MB )
4 entries, double indirect block (256MB)
2 entries, triple indirect block (16GB )
1 entries, quad indirect block (4TB )
1 entries, penta indirect block (1PB )
For larger volumes and compressed files, 64 bit numbers are used:
8 entries, direct block number (8K / 128K, with 1K or 16K)
4 entries, indirect block (512K / 8MB )
2 entries, double indirect block (32MB / 512MB)
1 entries, triple indirect block (2GB / 32GB )
1 entries, quad indirect block (256GB / 1TB )
Blocks and inodes are allocated via bitmaps, which are themselves
assigned inodes (and store their data the same as normal files).
Where the reducing the number of entries avoids making the inode bigger.
Currently, the design can use 256 or 512 byte inodes.
The former layout with 64b entries requires a 512B inode.
Had considered spans tables, but didn't seem worth it (though
conceptually simpler, spans are more complicated to deal with in terms
of code complexity).
For the 64-bit block numbers, with compressed files, the high-order bits
are used for some additional metadata (such as how this logical block of
the file is stored, and how many disk-blocks were used).
For file compression:
A larger logical block size is used in this case:
Say, 16K/32K/64K vs 1K/2K/4K.
Currently I am using 16K and 1K.
This larger block is compressed,
and stored as a variable number of disk blocks.
Blocks are decompressed when loaded into a block-cache;
And, re-compressed when evicted (and dirty).
Underlying blocks are dynamically reallocated as needed.
Currently, using a non-entropy-coded LZ77 variant.
Say, along vaguely similar lines to LZ4.
Currently, with all of this, code size is still moderately smaller than
my existing FAT32 driver... Granted, this isn't saying much.
Any thoughts?...