Progress bars do not accurately represent time

Progress bars indicate the amount of work done vs. work remaining, not time elapsed vs. time remaining. At least in disk-related operations in general and in data recovery in particular. This is because different pieces of work take different time to complete, often significantly so.

Physical limitations

First and most obvious factor is hardware limitations, especially those of mechanical hard drives. Mechanical hard drives read sequential data quickly, but take long to find two disjoint pieces of data. This is why defragmentation is a thing on mechanical hard drives.

Let’s say we need to copy multiple files. The entire progress bar will reflect either the total number of files or the total number of bytes to copy. However, it takes about the same time to copy 1000 files of one byte each as it takes to copy one 100 MB file. If you take number of bytes copied as a metric, it takes the same time to copy 99% of the size as it takes to copy the remaining 1%. It is the same if you take number of files as metric. Either way, if files happen to be grouped by size, you are stuck with a progress bar which moves to 99% and then sits there unexpectedly long. Worse yet, the same effect happens on two large files when one of them is fragmented and the other is contiguous on disk.

Caching

Some operations, like, reading various metadata, are cached. That is, whatever data is read from disk is kept in memory (cached), so that next time the data is needed, it can be quickly retrieved from cache (memory buffer). Work typically starts with “cold”, empty, cache. As work progresses and the cache fills up, more requests can be satisfied from cache, thus making process faster.

Early termination

Sometimes there is a certain area to search for an object or a certain set of solutions to try. The progress bar then estimates the worst-case scenario of scanning the entire area and not finding the object. Obviously, once the object is found there is no point in continuing search, and the search stops, giving an impression of jumpy progress indication.

Progress bars in disk scans

There is a case where one would expect the progress bar to actually reflect processing time. It is a full disk scan, which is done sequentially, from the very beginning to the very end of the disk (or whatever other media). While this is the closest you get to a predictable process, there are still some factors to throw off the progress bar.

  • Bad blocks. Bad blocks are extremely slow to process. Even a single one may cost you a second or so, not to mention a bunch of them.
  • Computation. Some objects, if discovered, require complex computations immediately. If there are many of these, they may slow down parts of the scan, especially when the objects are clustered in certain areas of the disk.
  • Speed change over meida surface. Hard drive reads faster on the outer tracks (logical start of the drive) than on the inner tracks (logical end of the drive). The difference is because the drive reads one track per one spindle revolution, and tracks are longer on the outside than on the inside. Flash-based storage (SSDs and memory cards) do not exhibit this behavior.

Created Monday, February 18, 2019

I have a low volume mailing list, for news and tips, which I send out once or twice a month.
Subscribe if you are interested.