Progress bars do not accurately represent time

Progress bars indicate the amount of work done vs. work remaining, not the time elapsed vs. time remaining. At least in disk-related operations in general and data recovery in particular. This is because different pieces of work take different amounts of time to complete, often significantly so.

Physical limitations

The first and most obvious factor is hardware limitations, especially those of mechanical hard drives. Mechanical hard drives read sequential data quickly but take a long time to find two disjoint pieces of data. This is why defragmentation is a thing on mechanical hard drives.

Let's say we need to copy multiple files. The entire progress bar will reflect the total number of files or bytes to copy. However, it takes about the same time to copy 1000 files of one byte each as it takes to copy one 100 MB file. If you take the number of bytes copied as the metric, it takes the same time to copy 99% of the size as it takes to copy the remaining 1%. It is the same if you take the number of files as the metric. Either way, if files happen to be grouped by size, you are stuck with a progress bar that moves to 99% and then sits there unexpectedly long. Worse yet, the same effect happens on two large files when one is fragmented, and the other is contiguous on disk.

Caching

Some operations, mostly related to metadata, are cached. Whatever data is read from the disk is kept in memory (cached) so that the next time you need the same data, you can quickly retrieve it from the cache (memory buffer). Work typically starts with "cold", empty, cache. As work progresses and the cache fills up, more requests can be satisfied from the cache, thus making the process faster.

Early termination

Sometimes there is a particular area to search for an object or a specific set of solutions to try. The progress bar then estimates the worst-case scenario of scanning the entire area without finding the object. Once the object is found, there is no point in continuing the search, and the search stops, giving an impression of a jumpy progress indication.

Progress bars in disk scans

There is a case where one would expect the progress bar actually to reflect processing time. It is a full disk scan, which is done sequentially, from the very beginning to the very end of the disk (or whatever other media). While this is the closest you get to a predictable process, there are still some factors to throw off the progress bar.

  • Bad blocks. Bad blocks are extremely slow to process. Even a single one may cost you a second or so, not to mention a bunch of them.
  • Computation. Some objects, if discovered, require complex computations immediately. If there are many of these, they may slow down parts of the scan. The slowdown is especially noticeable when the objects are clustered in some regions of the disk.
  • Speed change over the media surface. A hard drive reads faster on the outer tracks (logical start of the drive) than on the inner tracks (logical end of the drive). The difference is because the drive reads one track per spindle revolution, and tracks are longer on the outside than on the inside. Flash-based storage (SSDs and memory cards) do not exhibit this behavior.

Created Monday, February 18, 2019