Filesystem structure and data recovery

Most filesystems are built on the same principles. Naturally, data recovery is also based on principles complementary to file system design principles.

Typical filesystem structure

A typical filesystem structure is like a pyramid.

  1. On top, there is a small bootstrap dataset. Typically, it contains global filesystem parameters, like cluster size, sizes for various tables or blocks, and sometimes disk layout. This dataset also contains a set of pointers to the main metadata. This is where the driver starts opening the file system. There may be one or several copies of this dataset, always at some fixed location inside the volume. It has to be at some fixed, well-known location, or there is no way to find it and no way to start unraveling the filesystem.
  2. In the middle is the primary filesystem metadata, typically in some form of a file table. It references all the file data and all the directories. Most often, there is only one copy of the filesystem metadata because it is large and relatively slow to update. This set of metadata contains all the file information. For each file, it tracks on-disk location, file name, parent-child relationships, file attributes, and whatnot.
  3. The bottom level of the filesystem holds file data. Some filesystems also place directory content on the bottom level.

Let's draw.

Generic filesystem
Bootstrap
Metadata
File data

Typical filesystem structure.

Structure of select filesystems

Now, let's see how this translates into actual on-disk structures of some select filesystems.

FAT16/32
Boot sector(s)1
Allocation tables
Directories
File data

Structure of FAT16 and FAT32 filesystems.
1 FAT16 has one copy of a boot sector; FAT32 has two copies, both near the start of the volume.

NTFS
Boot sectors2
$MFT3
File data

Structure of NTFS filesystem.
2 NTFS has two copies of a boot sector at the start and the end of the volume.
3 note that $MFT does not contain dedicated directory indexes. These are not used in data recovery on NTFS.

ext3, ext4
Superblocks3
inodes
Directories
File data

Structure of ext3 and ext4 filesystems.
3 ext filesystems use multiple superblocks distributed across the entire volume; however, all superblocks are in fixed positions.

ZFS
Uberblocks
Disk labels
Volume tables
Snapshots
Directories
File data

Structure of ZFS filesystem.

Data recovery

The filesystem works through the pyramid from top to bottom. There is an obvious weak point in this - the tip of the pyramid, while required, is small and thus more vulnerable to data loss.

Data recovery software cannot rely on any small metadata in fixed positions. This includes FAT or NTFS boot sectors, EXT or XFS superblocks, ZFS disk labels and uberblocks, and RAID metadata of any kind. While a human doing recovery can decide, within reason, if boot sectors are correct or at least worth a try, the software can't reliably do this. So the software avoids small, localized metadata items as best as it can.

If the filesystem is overwritten with another blank one, all the new bootstrap metadata is in the same fixed location as the old metadata. So now there is a correct set of bootstrap metadata on disk, but it is empty or points to an empty filesystem. Worse yet, the new parameters may be subtly different from the original parameters, and there is no way to reliably detect the difference, at least automatically.

Fortunately, the small set of filesystem-wide parameters, such as cluster size, the position of the first cluster, and disk order (if needed), can always be reconstructed from the large set of middle-level metadata. Unfortunately, discovering all the middle-level metadata requires data recovery software to scan the entire disk or multiple disks. In some cases, file data (bottom level of the pyramid) can be used, especially in RAID analysis. This also requires the entire disk or set of disks to be scanned.

Scanning the entire disk is required if one is to pick all clues about the data to be recovered. The full scan is slow, especially as hard drives (and arrays) increase in size, but there is no way around it in cases of heavy damage.

Created Thursday, February 7, 2019