RAID10 quirks in ZFS

The implementation of RAID10 in ZFS is quirky. Technically, there is no RAID10 in ZFS. ZFS provides some undetermined distribution of data over any number of mirrors (RAID1s). In some cases, this distribution is as close to RAID10 as you need for practical purposes. In other cases, not so much.

  • If you build your system once and for all, never adding new disks, ZFS is no different from traditional RAID10.
  • If you plan to expand capacity by adding new disks at some later point, the differences are significant and worthy of consideration.

Let's see. Traditional RAID10 is built over an even number of drives, 4, 6, 8, and so on, organized like a set of mirrored pairs (RAID1 part). The data blocks are written onto each mirror pair in turn (RAID0 part). The resulting layout looks like this:

Mirror 1 Mirror 2
Disk A Disk B Disk C Disk D
1 1 2 2
3 3 4 4
5 5 6 6

Typical RAID10 layout with four disks.

RAID10 works fast because many blocks can be read in parallel. In the above example, blocks 1 to 4 can all be read in parallel. Disks A and B serve blocks 1 and 3, while disks C and D serve blocks 2 and 4. In a RAID10, the maximum number of blocks you can read in parallel is equal to the number of disks. Also, linear read speed increases proportionally to the number of disks.

So far, ZFS works the same as traditional RAID10.

Array expansion in ZFS and in traditional RAID10

Now, if we want to expand an array by adding more disks, things become very different between ZFS and traditional hardware (or software) RAID10. Let's say, for example, we add two disks to the sample array above and then write three new blocks, 7, 8, and 9, onto the new array.

Hardware RAID rebalances data across a full set of disks. The same applies to the most common software RAID implementations. Once you add three blocks of new data, the resulting layout is like this:

Mirror 1 Mirror 2 Mirror 3
Disk A Disk B Disk C Disk D Disk E Disk F
1 1 2 2 3 3
4 4 5 5 6 6
7 7 8 8 9 9

Hardware RAID10 after it was expanded, and three more blocks of data were written. Newly written blocks 7, 8, and 9 are shaded.

So, hardware RAID10 will integrate the disks fully into the array.

  • The process of rebalancing (often called reshaping) takes a long time (hours and sometimes days). During this time, the array is either offline or very slow.
  • After rebalancing, the performance is the same on the entire array. No matter which data you need to read, the maximum number of blocks you can read in parallel equals the new number of disks. Large sequential read requests are served by all disks.

ZFS does not rebalance. This is the crucial point. After you add new disks, the array becomes non-uniform, with two different areas with different performance characteristics.

Mirror 1 Mirror 2 Mirror 3
Disk A Disk B Disk C Disk D Disk E Disk F
1 1 2 2 7 7
3 3 4 4 8 8
5 5 6 6 9 9

ZFS RAID10 after it was expanded, and three more blocks of data were written. Newly written blocks 7, 8, and 9 are shaded.

  • ZFS expansion process is instantaneous or nearly so because no rebalancing is involved.
  • After expansion
    • old data works with old performance (up to four reads in parallel),
    • new data is only written onto the newly added mirror (up to two reads in parallel),
    • there is no (or close to none) data that you can read at six reads in parallel.

Note that this only applies if the original 4-disk array was nearly full before expansion)

Conclusions

  • If you never add new hard drives to your setup, there is nothing special to consider when using ZFS.
  • If you add drives to your ZFS-based RAID10 setup, consider backing up data, destroying the pool, rebuilding the pool from scratch with all drives, and restoring from backup. Otherwise, you are most likely missing something in performance.

Filed under: ZFS.

Created Sunday, August 4, 2019