Disks, Partitions, Pools, Volumes, Datasets, and Filesystems

Today, I saw a question conflating disks, pools, volumes, and datasets in one sentence, so I decided that written-down definitions and explanations may be handy. In everyday talk, there is no big harm in mixing and matching these terms to whatever meanings are required. In storage planning or data recovery, precise understanding is essential. However, the terminology is something of a mess, with different conventions used in Windows and Linux and in different storage systems.

Modern storage systems are layered. Layering allows system builders to mix and match various physical hardware with various software architectures, to produce the desired balance between speed, capacity, redundancy, and functional requirements. The concepts of disks, partitions, pools, volumes, datasets, and filesystems are all used to define various layers of storage systems.

While there are different combinations of layers, a disk is always on the lowest level (nearest to hardware), partitions (if present) are the second lowest level, and the filesystem is the top level (nearest to the user).

Disk

Disk, in most cases, refers to a single physical data storage device. Disks are also called drives (as in hard drive). There are a few things to consider:

  • With removable media requiring a specialized reader-writer device (like floppies or DVD-RWs), drive refers to a reader-writer device, and disk refers to the media itself, as in there is no disk in the drive.
  • With fixed disks, the terms disk and drive are interchangeable.
  • Both disk and drive are often used to refer to things other than a traditional platter-and-heads hard drive. SSDs, USB flash drives, and even SD/microSD memory cards are sometimes called disks. However, you should avoid this usage whenever possible in data recovery situations because you want to maintain the distinction.

Partition

Partition is a designated contiguous part of a single disk (or another storage device), defined by starting position and length. There can be several partitions on one storage device, but partitions cannot span non-adjacent regions or multiple devices. Partitions are optional. Some removable storage, like USB sticks and memory cards, does not use partitions. Two standard formats are used to store partition information – MBR and GPT.

Filesystem

A classical filesystem is a set of rules defining how files are stored. Filesystem defines:

  • How files and directories are stored and organized on the underlying media (disk, partition, volume, or even in a file).
  • Various numerical limits: how many files can be on the filesystem, what is the maximum file size, how long can a file name be, and so on.
  • Additional features ranging from symlinks to compression and deduplication.

A classical filesystem is isolated from the details of its underlying storage; it will store files the same way on a hard drive, on a RAID, or on an SD card. With RAIDs and flash-based media, where alignment requirements matter, steps to ensure proper alignment are usually taken at the partition level, keeping the filesystem isolated from the details. Examples of classical filesystems are FAT, NTFS, exFAT, XFS, and EXT-series filesystems.

Modern filesystems still perform all the functions outlined above. Additionally, they are aware of their underlying media and behave differently when placed on top of different configurations of physical drives, pools, or volumes. Most notably

  • ReFS, when used on top of Windows Storage Spaces, peeks down into the Storage Spaces pool layer for error recovery and to optimize disk space usage.
  • BTRFS and ZFS, when used with multiple disks, control disk layout, distribution of data across disks, fault tolerance, and error recovery.

Pool, Volume, and Dataset

Pool, volume, and dataset are often intertwined concepts, with different meanings in ZFS and Windows with and without Storage Spaces.

On Windows without Storage Spaces, volume is a collection of one or several partitions. If one partition is used, calling it a volume is superficial and is only done for consistency reasons. You can combine multiple partitions into a single volume using Windows software RAID, and the volume defines RAID level and redundancy.

On Windows with Storage Spaces, pools and volumes are controlled by the Storage Spaces, and filesystems are mostly independent.

  • A pool is a collection of physical disks, and that's it. The pool does not define redundancy but defines disk roles – which disks are used to store data, which SSDs are used for caching, and which disks are held in reserve as spares.
  • A volume is a disk space allocated from the pool; it defines redundancy (RAID level) and caching policies. Multiple volumes with different redundancy levels and caching settings can coexist in the same pool. Filesystems are created on top of volumes.

In ZFS

  • A pool is a collection of disks grouped into one or more disk groups (called vdevs in ZFS). Each vdev defines the RAID level used for its disk group; the redundancy (RAID level) is therefore defined at the pool level.
  • Dataset is a filesystem (set of files and directories) using disk space from the pool. ZFS can also export a block device, called zvol, for use via iSCSI. However, internally it implements zvol as a dataset with a single file.

Unlike Windows Storage Spaces, ZFS controls the entire storage stack from top to bottom.

Diagrams

Let's draw some diagrams, going from simple to complex:

Removable device
(e.g., an SD card)
Filesystem
Physical device
Simple setup
(Windows or Linux)
Filesystem
Partition
Physical drive
Windows software RAID
Filesystem
Volume
Partition Partition
Physical drive Physical drive
Windows Storage Spaces
Filesystem Filesystem
Volume Volume
Pool
Physical drive Physical drive
ZFS
Filesystem Dataset Dataset
Pool
Partition Partition
Physical drive Physical drive

Hopefully, this clarifies some of the confusion surrounding this part of the terminology.

Filed under: ZFS.

Created Thursday, September 5, 2019