Recovery of destroyed or corrupt ZFS pools

Created Sunday, December 30, 2018

Here, I'm talking about completely damaged ZFS pools, when the pool cannot be mounted at all. I will talk about undeleting individual files from functioning pools later.

ZFS disk labels

ZFS uses disk labels to record which disk belongs to which pool and what are the parameters of that pool. The important thing is that the labels hold disk order and RAID levels for vdevs composing the pool. Four labels are stored on each disk, two at the start and two at the end. The rationale is that any kind of massive overwrite is likely to go from start to end and if the overwrite is aborted in time, the labels at the end of the disk survive. The labels also hold pointers to the filesystem metadata and are thus updated every time the filesystem is modified. If all the labels on a given disk are damaged, ZFS can no longer identify the disk as the member of the pool and the disk drops out. If the number of missing disks exceeds pool redundancy, the pool no longer mounts. If the labels are damaged, it does not matter if the rest of the data (or most of it anyway) is intact.

Common failure modes

There are several ways to damage multiple disk labels at once. None of them are really exclusive to ZFS, all filesystems are more or less vulnerable to the same set of problems.

  • The most drastic and the most common, it seems, is to destroy the pool and create another pool over the same disk set. This happens when people confuse disks, pool names, and whatnot. Since labels are always in the same place on disks, the new labels overwrite the old ones precisely. The old filesystem metadata is then impossible to reach by normal means.
  • Sometimes, faulty RAM causes wrong data to be written to the filesystem. Since labels are updated often, errors tend to propagate fairly quick. If enough data is written before the system crashes, the pool may be damaged beyond repair.
  • Additionally, pools may crash after a power failure. In theory, ZFS is supposed to protect against this type of failure, but there is an important restriction - ZFS relies on the hard drive reliably flushing its cache, and that sometimes does not happen. Some drives and some USB configurations will lie about caches being written, all to improve performance. This indeed improves performance, but at a cost of reliability.
  • Last not least, various random glitches, either in hardware or software, take their fair share of failure cases.

Symptoms

Apart from just not being able to locate the pool to import, there are several error messages associated with catastrophic disk label or other metadata damage.

The pool metadata is corrupted. The pool cannot be imported due to damaged devices or data.
cannot import 'tank': one or more devices is currently unavailable
One or more devices are missing from the system. The pool cannot be imported. Attach the missing devices and try again.
Additional devices are known to be part of this pool, though their exact configuration cannot be determined.
One or more devices could not be used because the label is missing or invalid.

Recovery

Klennet ZFS Recovery reads the filesystem without requiring disk labels to be readable or correct. What’s worse, if a new pool was created over the deleted one, the labels are readable, perfectly correct, but point to the metadata of a new empty pool. The solution is to find all the remnants of the metadata of the original pool by scanning all the available drives, then assemble what metadata can be salvaged and read it. Because ZFS is copy-on-write there are multiple copies of old metadata scattered all around, recovery is pretty good, even from somewhat overwritten pools.

I have a low volume mailing list, for news and tips, which I send out once or twice a month.
Subscribe if you are interested.