Carving for fragmented files

Carving, or file carving, traditionally refers to the method of finding files by scanning the media for file headers (and maybe footers) when no filesystem metadata is available. Once the header is found, the data between the header and the footer is copied to the output file. Traditional carving only works when the target file is not fragmented, i.e, stored contiguously on the media.

There are certain ideas to extend this method for fragmented files, i.e., files that are stored on a disk in several disjoint fragments. Look up references at the bottom of this page to get some idea about how bad the problem is. The problem is bad indeed.

Generally, to carve for a given file type, one needs

  • a method to identify file header as a starting point,
  • a method to determine either precisely, or to a certain probability, if the given block is the next in the file (block chaining), and
  • a validation function, which can reliably determine if a given sequence of bytes forms a valid file.

Problems with this approach are, in no particular order,

  • The process is computationally expensive and scales poorly as the drive size increases. In the worst case, the recovery is an O(n3) process; that is, the time required to complete the recovery is proportional to the third power of the input data size.
  • The methods used are mostly probabilistic; the chaining function produces a result like "there is a 95% probability that the block X belongs to the file Y". As the number of possible combinations increases, the number of false-positive recognitions increases.
  • While human intervention may be useful sometimes, especially with photo files, humans can't help with files that are not human-readable, and many files of forensic/recovery value are not.
  • Files within files (like ZIPs within a ZIP or thumbnails within a JPEG), partially overwritten files, and partially duplicate files (with copy-on-write filesystems like ReFS) all add complexity to the task.
  • Reliable validation functions are difficult or impossible to construct for some file formats.

But, even given the above, the problem is still tractable. The task parallelizes well, so if one has an infinite number of CPUs available, the carving can be done in a reasonable time. Klennet Carver does it for certain file types on good hardware. If you need a file type that is not supported, drop me a note and I will look into it for you.

References

  1. Nasir Memon and Anandabrata Pal, Automated Reassembly of File Fragmented Images Using Greedy Algorithms.
  2. Simson Garfinkel, Carving Contiguous and Fragmented Files with Object Validation.

Filed under: File carving.

Created Friday, July 7, 2017

Updated 22 May 2018