Undeleting ZFS datasets

So, I came across this Reddit post today (emphasis mine).

I booted to Windows and tried Klennet ZFS Recovery, which, also took about a week to scan the whole drive, and the results were not terribly useful. I could see the changes including deleting the datasets, and even documents and pictures I had added to tank/personal/old-secret before attempting any of this, but it was very piecemeal since it was a record of the transactions, not the data.

Doing the undelete is a little more complicated than recovering after a full-stop crash This is because files are deleted in groups ranging from several files to several hundreds of files per transaction, the recovery process is performed per transaction, and the recovery results are grouped by transactions.

The recovery results are grouped by transactions because it is the only reasonable option for a damaged filesystem. The transaction ID is always there, while any other file or dataset attributes may be missing because of the damage. This includes file names, child-to-parent relationships, and timestamps.

The following checklist covers this use case:

  1. Let recovery run its course, including complete checksum verification.
  2. Have enough space to hold a copy of the entire pool. There will be no separation of datasets. The two options are either to get it piecemeal or the entire lot, and you get the entire lot.
  3. Use file selection in the object set view to select all files with good checksums.
  4. Now, with all good files across all transactions and all datasets selected, start copying. You will be prompted for the copier options. Configure copier to
    • put all object sets into a single directory, and
    • skip existing files.

This will produce a copy of all the latest versions of all recoverable files. This works because the copy is performed according to the sort order, and the default sort order is the most recent transactions first. So, if the transaction contains a recoverable file, this file is copied out. If the next (older) transaction contains the same file, the copier will skip it because the most recent version was already copied. The unfortunate side effect is that all datasets are blended into a single directory tree. This is because it is often not possible to determine the name of the object set (be it a dataset or a snapshot) due to filesystem damage. With tens of thousands of object sets, squashing everything into a single directory tree seems like the most practically viable option.

Filed under: ZFS.

Created Friday, November 13, 2020