Filesystem structure and data recovery

Most filesystems are built on the same principles. Naturally, data recovery is also based on principles complementary to the file system design principles.

Typical filesystem structure

Typical filesystem is structured like a pyramid.

  1. On top, there is a small bootstrap dataset. Typically, it contains global filesystem parameters, like cluster size, sizes for various tables or blocks, and sometimes disk layout. This dataset also contains a set of pointers to main metadata. This is where the driver starts opening the file system. There may be one or several copies of this dataset, and it is always at some fixed location inside the volume. It has to be at some fixed well-known location or there is no way to find it and no way to start unraveling the filesystem.
  2. In the middle, there is main filesystem metadata, typically in some form of file table. It references all the file data and all the directories. Most often there is only one copy of the filesystem metadata because it is large and relatively slow to update. This set of metadata contains all the file information. For each file, it tracks on-disk location, file name, parent-child relationships, file attributes and whatnot.
  3. Bottom level of the filesystem holds file data. Directory content may also be placed on the bottom level in some filesystems.

Let's draw.

Generic filesystem
Bootstrap
Metadata
File data

Typical filesystem structure.

Structure of select filesystems

Now, let's see how this translates into actual on-disk structures of some select filesystems.

FAT16/32
Boot sector(s)1
Allocation tables
Directories
File data

Structure of FAT16 and FAT32 filesystems.
1 FAT16 has one copy of a boot sector; FAT32 has two copies, both near the start of the volume.

NTFS
Boot sectors2
$MFT3
File data

Structure of NTFS filesystem.
2 NTFS has two copies of a boot sector, at the start and at the end of the volume.
3 note that $MFT does not contain dedicated directory indexes. These are not used in data recovery on NTFS.

ext3, ext4
Superblocks3
i-nodes
Directories
File data

Structure of ext3 and ext4 filesystems.
3 ext filesystems use multiple superblocks distributed across the entire volume; however, all superblocks are in fixed positions.

ZFS
Uberblocks
Disk labels
Volume tables
Snapshots
Directories
File data

Structure of ZFS filesystem.

Data recovery

Filesystem works through the pyramid from top to bottom. There is an obvious weak point in this - the tip of the pyramid, while required, is small and thus more vulnerable to data loss.

Data recovery software cannot rely on any kind of small metadata in fixed positions. This includes FAT or NTFS boot sectors, EXT or XFS superblocks, ZFS disk labels and uberblocks, and RAID metadata of any kind. While human doing recovery can decide, within reason, if boot sectors are good, or at least worth a try, the software can't reliably do this. So the software avoids small, localized items of metadata as best as it can.

If the filesystem is overwritten with another blank one, all the new bootstrap metadata is stored as the same fixed location as the old metadata. So now there is a correct set of bootstrap metadata on disk, but it is empty, or points to empty filesystem. Worse yet, the new parameters may be subtly different from the original parameters, and there is no way to reliably detect the difference, at least automatically.

Fortunately, the small set of filesystem-wide parameters, such as cluster size, position of the first cluster, and disk order (if needed), can always be reconstructed from the large set of middle-level metadata. Unfortunately, discovering all the middle-level metadata requires data recovery software to scan the entire disk, or multiple disks. In some cases, file data (bottom level of the pyramid) can be used, especially in RAID analysis. This also requires entire disk, or set of disks, to be scanned.

Scanning entire disk is required if one is to pick all clues about the data to be recovered. This is slow, especially as hard drives (and arrays) increase in size, but in cases of heavy damage there is no way around it.

Created Thursday, February 7, 2019

I have a low volume mailing list, for news and tips, which I send out once or twice a month.
Subscribe if you are interested.