Klennet Storage Software

Reviews, comparison testing, and challenges in data recovery

There is a slight problem with data recovery software reviews and especially with testing. Reviews are fine when they talk features, user interface, and pricing, but not quite so fine when it comes to recovery capabilities. The slight problem with testing recovery capabilities is that nobody knows exactly how to do it. The seemingly straightforward idea is to put multiple damaged drives through recovery and see how different software fares against the set of tasks. However, nobody seems to know what makes a good test set. There are several approaches to testing, covering full spectrum from completely artificial to testing on real-life cases.

Artificially created test sets

Artificial test cases are crafted by either generating the disk image from scratch or inserting some well defined data into the disk image of a blank filesystem.

DFRWS challenges from 2006 and 2007 are pretty good examples of artificially generated test sets used to investigate one specific aspect of data recovery. Except they got cluster sizes wrong. Even back in 2006 nobody used 512 bytes per cluster. Is this important? Yes, carving performance depends on the cluster size for speed and to some extent for output quality. The two DFRWS challenges, by the way, are yet to be completely solved (as of summer 2018).

The big problem of crafting test cases is that nobody knows how close they are to real-world cases. Moreover, it is almost never clear what properties are to be reproduced and to what precision. The only sure way to list all the factors affecting the recovery is to study the algorithms involved, but these are generally not available for study.

On the other side of a coin, data recovery software can theoretically be tuned to whatever benchmarks are available. Such tuning will almost invariably cause performance to degrade on real-life recoveries. I’m not aware of this ever done in practice though, most likely because there are no widely accepted and established benchmarks in data recovery.

Use of artificial test sets is the most scientific approach where all known factors are being controlled, and as such it is often used in academic research.

Real filesystems created for test purposes

Another approach is to put some test data onto a blank real filesystem (on a USB flash drive for example), do some known damage to it (delete some files or format the drive), and then run recovery on it to see how much can be recovered. Whichever software recovers the most is then considered the best.

That is sort of middle-ground approach, and arguably the best overall, but again some factors can’t be easily replicated.

  1. Newly formatted blank filesystem allocation patterns differ from these of an old used filesystem. Blank filesystem has low, if any, amount of very predictable fragmentation. Used filesystem has bits of partly overwritten files scattered around, and fragmentation patterns are nothing like these of a blank filesystem. This phenomenon is called filesystem aging and it is difficult to simulate with any accuracy.
  2. Different filesystem implementations have different aging behavior and different allocation strategies. The standard Windows FAT filesystem driver behaves differently from drivers used in photo or dash cameras. There is a difference even in most simple use cases. Copying a video file to a memory card may not be the same as dash camera recording video to the same file. Sure, the files are identical, but their location on media may be different enough to matter.

This approach is what reviewers use. In simple recoveries, like unformatting something, they actually do create realistic scenarios, but the performance in these simple scenarios is pretty same across first-line data recovery software. As scenarios grow more complex, like RAID recoveries, the quality of test cases and reviews inevitably declines. I don’t think anybody cares though.

Benchmarking on real-life cases

The last and most realistic approach is to take a real-life damaged media, make a disk image from it, and run it through whatever recovery software is being tested.

This option is only available to data recovery technicians. Real-life cases vary immensely - some are for mechanical repair and do not need any software intervention, some are outright unrecoverable, and finally a limited set of cases are good to test software on. Real-life cases are hard to come by for a layman. Also, technicians with their specialized knowledge are in the best position to tweak various software parameters and settings for best results in each case, so their reviews (which are few and far between, although some can be found on specialized forums) are usually the most comprehensive.

Benchmarking on real-life cases is what data recovery technicians use. They have an abundance of cases to test on, and in process of testing they determine not only which one software works best, but also if the software fits their established processes and procedures. However, as far as I’m aware, technicians invariably end up using all sorts of different software depending on their understanding of each specific case.

Created Tuesday, June 5, 2018

I have a low volume mailing list, for news and tips, which I send out once or twice a month.
Subscribe if you are interested.