Klennet Storage Software

Carving for fragmented files

Carving, or file carving, traditionally refers to the method of finding files by scanning the media for file headers (and maybe footers) when no filesystem metadata is available. Once the header is found, the data between header and footer is copied to the output file. Traditional carving only works when the file being carved is not fragmented, that is, the file is stored contiguously on the media.

There are certain ideas to extend this method for fragmented files, that is, files which are stored on a disk in a several disjoint fragments. You may want to look up references at the bottom of this page to get some idea about how bad the problem is. The problem is bad indeed.

Generally, to carve for a given file type, one needs

  • a method to identify file header (starting point),
  • a method to determine either precisely, or to a certain probability, if the given block is the next block in the file (block chaining), and
  • a validation function, which can reliably determine if a given sequence of bytes forms a valid file.

Problems with this approach are, in no particular order,

  • The process is computationally expensive, and it does not scale too well as the drive size increases. The recovery is in worst case O(n3) process, that is, time required to complete the recovery is proportional to the third power of the input data size.
  • The methods in use are mostly probabilistic; the chaining function produces a result like “there is a 95% probability that the block X belongs to the file Y”. As the number of possible next blocks increases, the number of false-positive recognitions increases.
  • While human intervention may be useful sometimes, especially in case of photo files, it cannot be used for files which are not human-readable, and many of the files of forensic/recovery value are not.
  • Files within files (like ZIPs within a ZIP, or thumbnails within a JPEG), partially overwritten files, and partially duplicate files (with copy-on-write filesystems like ReFS) all add complexity to the task.
  • Reliable validation functions are difficult or even impossible to construct for some file formats.

But, even given the above, the problem is still tractable. The task parallelizes well, so if one has infinite number of CPUs available, the carving can be done in a reasonable time. In modern day with the proliferation of multicore CPUs, it should be possible to implement, at least for the file types valuable with respect to forensics and/or data recovery.


  1. Nasir Memon and Anandabrata Pal, Automated Reassembly of File Fragmented Images Using Greedy Algorithms.
  2. Simson Garfinkel, Carving Contiguous and Fragmented Files with Object Validation.