ZFS and ECC RAM rant

Is ECC RAM required to run ZFS?

No.

Should I use ECC RAM in my new build?

Probably. Maybe. Whatever.

What?

Setting aside the common factor of not having a valid backup, ZFS data recoveries come broadly in several groups:

User error. Delete the wrong dataset or repartition and reuse the wrong pool. User error covers about half the cases.

Power failure. "But.. but... I heard ZFS does not fail when power fails!" Yeah, good luck with that. Technically, ZFS should not fail if your hardware matches pretty exact specifications. Either the specification is exceedingly tricky to match, or people don't bother. Either way, power failure covers about half of the remaining cases, for a running total of 75%.

The remaining 25% include, among others:

  • Gross negligence. RAIDZ2 array failed due to disk failure after engaging three or four hot spares over the years; no disk was ever replaced.
  • Using junk substandard hardware. How would you like a daisy chain of USB hubs?
  • Intentional tampering, like when a cryptovirus encrypts everything on a writeable share.

But what about RAM failures? Nothing. Nothing in the top five most frequent causes could have been prevented by having ECC RAM.

But!

"The system dies" isn't the primary concern of non-ECC RAM. The much bigger concern is that it lives and silently corrupts your data.

Whatever method you use to prevent tampering with your backups (e.g., when a cryptovirus silently corrupts your data) should catch the RAM problem too. If it does not, you need to change the method rather than rely on ECC RAM.

Finally

If your backup and recovery strategy relies on ZFS using ECC RAM, you should rework your backup strategy. After you do not need ECC RAM, you can buy it as a matter of convenience to save time troubleshooting and/or restoring from the backup.

Filed under: Rants, ZFS.

Created Friday, November 25, 2022