Disks, Partitions, Pools, Volumes, Datasets, and Filesystems

Today, I saw a question conflating disks, pools, volumes, and datasets in one sentence, so I decided that written-down definitions and explanations may be handy. In everyday talk, there is no big harm in mixing and matching these terms to whatever meanings required. In storage planning, or in data recovery, precise understanding is important. However, the terminology is something of a mess, with different conventions used in Windows and Linux, and in different storage systems.

The modern storage systems are layered. Layering allows system builders to mix and match various physical hardware with various software architectures on top of it, thus producing desired balance between speed, capacity, redundancy, and functional requirements. The concepts of disks, partitions, pools, volumes, datasets, and filesystems are all used to define various layers of the storage systems.

While there are different combinations of layers, disk is always on the lowest level (nearest to hardware), partitions (if present) are second lowest level, and filesystem is the top level (nearest to user).

Disk

Disk in most cases refers to a single physical data storage device. Disks are also called drives (as in hard drive). There are a few things to consider:

  • With removable media requiring specialized reader-writer device (like floppies or DVD-ROMs), drive refers to a reader-writer device, and disk refers to the media itself, as in there is no disk in drive.
  • With fixed disks, terms disk and drive are interchangeable.
  • Both disk and drive are often used to refer to things other than traditional platter-and-heads hard drive. SSDs, USB flash drives, and even SD/microSD memory cards are sometimes referred to as disks. However, in data recovery situations, you should avoid this usage whenever possible, because you do not want to lose the distinction.

Partition

Partition is a designated contiguous part of a single disk (or other storage device), defined by starting position and length. There can be several partitions on one storage device, but partitions cannot span non-adjacent regions or multiple devices. Partitions are optional. Some removable storage, like USB sticks and memory cards do not use partitions at all. There are two common formats used to store partition information – MBR and GPT.

Filesystem

Classical filesystem is a set of rules defining how files are stored. Filesystem defines:

  • How files and directories are stored and organized on the underlying media (disk, partition, volume, or even in a file).
  • Various numerical limits: how many files can be on the filesystem, what is the maximum file size, how long can a file name be, and so on.
  • What additional features are available, ranging from symlinks to compression and deduplication.

Classical filesystem is isolated from the details of its underlying storage; it will store files the same way on a hard drive, on a RAID, or on a SD card. With RAIDs and flash-based media, where alignment requirements matter, steps to ensure proper aligment are usually taken at partition level, keeping filesystem isolated from the details. Examples of classical filesystems are FAT, NTFS, exFAT, XFS, and EXT-series filesystems.

Modern filesystems still perform all the functions outlined above, but additionally they are aware of their underlying media, and behave differently when placed on top of different configuration of physical drives, pools, or volumes. Most notably

  • ReFS, when used on top of Windows Storage Spaces, peeks down into Storage Spaces pool layer for error recovery and to optimize disk space usage.
  • BTRFS and ZFS, when used with multiple disks, control disk layout, distribution of data across disks, fault tolerance, and error recovery.

Pool, Volume, and Dataset

Pool, volume, and dataset are often intertwined concepts, with different meanings in ZFS and Windows with and without Storage Spaces.

On Windows without Storage Spaces, volume is a collection of one or several partitions. If one partition is used, calling it a volume is essentially superficial and is only done for consistency reasons. If multiple partitions are combined into a single volume, they are used as s software RAID, and volume defines RAID level and redundancy.

On Windows with Storage Spaces, pools and volumes are controlled by the Storage Spaces, and filesystems are mostly independent.

  • Pool is a collection of physical disks, and that’s it. Pool does not define redundancy, but defines disk roles – which disks are used to store data, which SSDs are used for caching, and which disks are held in reserve as spares.
  • Volume is a disk space allocated from the pool; it defines redundancy (RAID level) and caching policies. Multiple volumes with different redundancy levels and caching settings can coexist in the same pool. Filesystems are created on top of volumes.

In ZFS

  • Pool is a collection of disks, grouped into one or more disk groups (called vdevs in ZFS). Each vdev defines RAID level used for its disk group; the redundancy (RAID level) is therefore defined at the pool level.
  • Dataset is a filesystem (set of files and directories) using disk space from the pool. ZFS can also export a block device, called zvol, for use via iSCSI. However, internally it implements zvol as a dataset with a single file.

Unlike Windows Storage Spaces, ZFS controls the entire storage stack from top to bottom.

Diagrams

Let's draw some diagrams, going from simple to complex:

Removable device
(e.g. SD card)
Filesystem
Physical device
Simple setup
(Windows or Linux)
Filesystem
Partition
Physical drive
Windows software RAID
Filesystem
Volume
Partition Partition
Physical drive Physical drive
Windows Storage Spaces
Filesystem Filesystem
Volume Volume
Pool
Physical drive Physical drive
ZFS
Filesystem Dataset Dataset
Pool
Partition Partition
Physical drive Physical drive

Hopefully this clarifies some of the confusion surrounding this part of terminology.

Created Thursday, September 5, 2019

I have a low volume mailing list, for news and tips, which I send out once or twice a month.
Subscribe if you are interested.