Designing a recoverable filesystem

This is a somewhat random rant based on my experience writing recovery algorithms for several filesystems. What would I do if I had to design a system to be easily recoverable? I would have all metadata structures adhere to a specific format to facilitate the discovery of the metadata by searching the disk.

Metadata structure template

Each sector of each data structure will contain

  1. 12-byte combined signature consisting of
    • 4-byte filesystem signature,
    • 4-byte structure-specific signature, and
    • 4-byte structure version number.

    For example, MYFSINOD0002 will be My Filesystem, Inode, version 2. This signature will be at offset 0 in the sector, and it will be human-readable. A maximum of 9999 versions should be quite enough. If it proves not to be, we can fall back to AAAA through ZZZZ.

  2. 12-byte random unique filesystem ID (to differentiate between different filesystems occupying the same area). This way, one can scan the entire disk without the requirement to establish partition boundaries first.
  3. The 8-byte offset of this sector from the start of the volume. This will significantly help with RAID reconstruction. On second thought, every written sector, including file data, will contain its own offset in the first eight bytes. This will facilitate RAID recovery and allow recovery from flash dumps without FTL at the cost of performance loss due to loss of alignment.
  4. 2-byte index of this sector in the structure, and 2-byte total number of sectors in the structure, decremented by one. I will have no metadata or data structure larger than 65536 512-byte sectors. 32MB ought to be enough for everyone.
  5. The first sector of the structure will include a 4-byte CRC32 of the entire structure, computed with the CRC field set to zero. On second thought, make it 32-byte and use whatever cryptographic hash works fastest.

All of the above makes searching for filesystem metadata easy. In the same manner, it makes sorting which metadata came from which volume easy. In the same manner, it makes throwing away damaged metadata structures easy. Try implementing such a search for FAT directories, and you will learn to appreciate the difference.

Superblocks and allocation policy

I will have one primary superblock at a fixed location and one other backup superblock at another fixed location, but not at the end of the volume. A good place at precisely 81% of the partition capacity is much less likely to be overwritten. Every other allocation, including file data, will be at a random place. It will be hell on performance, but I am designing a recoverable filesystem, not a fast one, and I don't care. Also, an SSD does not have as much random seek penalty as a spinning drive. The rationale is that as the volume is reformatted and has something written over it, there is less chance of overwriting something valuable. This applies equally to overwriting with the same filesystem or any other traditional filesystem.

Flexibility

One can build a filesystem of any complexity on top of this template (not that it is a well-thought-out template), from CP/M to ZFS, and it will be a breeze to recover data from it.

Filed under: Rants.

Created Wednesday, November 2, 2022