Klennet Storage Software

Using Klennet ZFS Recovery

Klennet ZFS Recovery does not have any configurable options. The recovery process is as follows:

  1. Select source drives, click Start,
  2. wait for the scan to complete,
  3. select what you need from the output,
  4. click Start copying and specify where to.

Selecting source drives

This is straightforward. ZFS pools normally consist of multiple physical drives, so you need many drives analyzed simultaneously in recovery. Select as many drives as possible, but do not select drives from multiple pools. All drives must be members of one ZFS pool. If you need to recover multiple pools, recover each pool in turn.

Progress stages and indications

There are four steps in the recovery:

  1. Disk scan. During this step, the entire disk set is scanned. All disks are scanned in parallel, so increasing number of disks does not result in proportional increase in scan time.
  2. Disk order analysis is when ZFS Recovery figures out RAID levels, which disk is at which position in the pool, and so on.
  3. Object set analysis. In ZFS, an object set is a table describing all files in the pool. Because ZFS is a copy-on-write filesystem, many previous versions of object set are scattered all over the disks. There may be millions of object sets, most of them partially overwritten and damaged. However, all of them must be processed, because there is no way to know in advance which one contains the file or files you need.
  4. Checksum verification. Once all files are located, ZFS Recovery proceeds to verify the data to see if checksums are matching the data on disks. This is roughly equivalent to resilvering the pool, and takes a long time.

Checksum verification can actually be cancelled. This is useful if you only need to recover some small number of files. In this case, there is no point in waiting while checksums are verified for the entire pool. Once the file list is shown, you can select these specific files and verify checksums for these files only.

Understanding output view

This part is a bit more complicated because of how ZFS works. ZFS pool contains multiple object sets, each of which describes all files in the particular filesystem. There are often thousands and sometimes tens of thousands old versions of each object set in any given pool. Many if not most of these versions are partially damaged. Klennet ZFS Recovery reads all these versions and sorts them into groups, looking like this:

Klennet ZFS Recovery results view

Klennet ZFS Recovery results view

  • Object sets (which are effectively filesystem snapshots made at various times) are listed in the top section. You can sort object sets in different ways, and I recommend you use either Timestamp, Valid file count or Valid file count, Timestamp.
  • Once you select one of the object sets, its corresponding directory tree is displayed at bottom left.
  • If you select a directory, bottom right panel shows the list of files, with their sizes and checksum status for each file.

Duplicate files and file versions

Because of copy-on-write, ZFS pool contains multiple snapshots of the filesystem even if you have not explicitly ask for snapshots. Files on the volume rarely change. This means, if you take two snapshots of the volume ten minutes apart, most of files will be identical in both snapshots. To keep the exploding number of files under control, ZFS Recovery only puts the file into the latest snapshot the file is in. So, of all the snapshots for a given filesystem, the latest one will have all the files, and previous ones will only contain previous versions of modified files.

In effect, the latest snapshot (object set) becomes the baseline and holds all the files. Previous versions of the same snapshot only hold files which are different from the baseline.

Possible actions

  1. Verify checksums. This will verify checksums for whatever files are the currently selected. If the checksums are already verified for some of the selected files, these checksums will not be re-verified again. This is only needed if you skipped checksum verification during analysis.
  2. Export file names. This will export list of file names, sizes, and checksum status into a CSV file.
  3. Copy selected files. The most useful of then all, this asks you where to put the selected files and then starts copying.

I have a low volume mailing list, for news and tips, which I send out once or twice a month.
Subscribe if you are interested.