BTRFS filesystem recovery

BTRFS is more than a traditional filesystem. A regular storage stack consists of a separate RAID controller (hardware or software), volume manager, and filesystem driver. BTRFS combines all three into a single package; however, one can still use it as the filesystem part of a traditional stack.

Single or multi-disk BTRFS

There are two distinct use cases for BTRFS:

  1. Have a single disk or a pre-existing RAID (either hardware or software), and deploy BTRFS over a single disk or partition.
  2. Use BTRFS in full, deploy it over multiple disks or partitions, and let it handle the RAID processing.

For these two use cases, there are three distinct recovery scenarios:

  1. Single disk install
    1. If BTRFS is on a single disk or partition, point Klennet Recovery to that disk and scan. If the partition table is damaged, point Klennet Recovery to the entire disk, and it will sort out what's where.
    2. If BTRFS is on a RAID, then that RAID must be functioning properly, or you must recover the RAID first. After that, scan the RAID.
  2. Multi-disk install. If BTRFS is on multiple disks, select all the disks and scan them.

You should disable all other filesystems in Settings if you are recovering a large BTRFS volume. The memory requirements are bad enough without the added requirement to search for multiple filesystem types simultaneously. More importantly, if the BTRFS volume stores virtual machine disk images on it, Klennet Recovery will try to sort the virtual machine filesystems too, which wastes RAM.

Checksums and file validation

BTRFS uses checksums to check the file and metadata integrity. Klennet Recovery can use the same checksums to give you a list of what file data is correct and what is not.

This feature, however, requires temporary SSD space, about 2% of the size of the disk or disks you are recovering. For this reason, the checksum validation is disabled by default, and you have to enable it in Settings explicitly.

The checksum-based verification is not perfect. As the checksum information is stored separately from other file metadata, the checksums for a particular file may not be available due to filesystem damage. If this happens, Klennet Recovery reverts to the default validation of known file types.

Chunks

Normally, filesystem metadata addresses the disk directly. The metadata entry for a file points directly to an address on disk. BTRFS uses a more complicated scheme with one more level of indirection. Simplifying, the file metadata in BTRFS points to something called a "chunk tree", and the chunk tree, in turn, references the disk addresses.

Simple filesystem metadata

Typical filesystem (FAT, NTFS, EXT) metadata

BTRFS metadata with chunk layer

BTRFS metadata with chunk layer

If the chunk tree is damaged, the address translation is impossible, and it is impossible to find the data corresponding to a file. Depending on the extent of the damage, the problem may affect any number of files.

Once Klennet Recovery rebuilds the filesystem and completes the file validation, the files for which the corresponding chunk tree element is missing are highlighted in magenta. Klennet Recovery shows an additional warning message at the bottom of a filesystem view if there are many such files.

Klennet Recovery has an experimental feature to try and reconstruct the chunk tree. The reconstruction

  • is not guaranteed to be effective for all files;
  • very expensive in terms of CPU use;
  • requires temporary SSD space, about 5% of the size of the disk or disks you are recovering;
  • takes one to five hours per 1 TB of data on reasonably modern hardware.

Given all of the above, chunk reconstruction is off by default. If you are recovering BTRFS, I recommend you run the recovery with the default settings first. If the result is unsatisfactory because of missing chunks, enable chunk reconstruction and do another run.