ZFS for Mortals: Snapshots, Scrubs, and Surviving a Dead Disk

The filesystem that refuses to lose your data

10-04-2026

ZFS for Mortals: Snapshots, Scrubs, and Surviving a Dead Disk

Contents

Most filesystems are optimists. They write your data, assume the disk told the truth, and trust that the bytes you read back tomorrow are the bytes you wrote today. ZFS is a pessimist, and that pessimism is exactly why people who care about their data fall in love with it. It checksums everything, verifies what it reads, repairs what it can, and tells you loudly when it cannot. Born at Sun Microsystems and now thriving as OpenZFS on Linux, FreeBSD, and macOS, it folds the volume manager, the RAID layer, and the filesystem into one coherent whole. This guide walks you through the ideas and the actual commands you need to run ZFS at home without a degree in storage engineering.

What Makes ZFS Different

Traditional filesystems modify data in place. When you overwrite a block, the old contents are gone the instant the new write lands, and if the power dies halfway through, you are left with a half-written mess that needs fsck to untangle. ZFS never works this way. It is copy-on-write: a changed block is written to a fresh location, and only once that write succeeds does ZFS atomically update the pointers to reference it. The old data stays put until nothing needs it. This means the on-disk state is always consistent. There is no fsck for ZFS because there is never an inconsistent filesystem to repair.

The second pillar is end-to-end checksumming. Every block carries a checksum stored in its parent, forming a tree of hashes all the way up to the root. When ZFS reads a block, it verifies the checksum. If the data has silently rotted, been mangled by a flaky cable, or corrupted by a dying drive, ZFS knows immediately. And if you have a mirror or RAIDZ, it does not merely complain: it fetches a good copy from the redundant device and rewrites the bad one. This is “self-healing”, and it is the feature that separates ZFS from nearly everything else.

Pools, Vdevs, and Datasets

ZFS has three layers worth understanding before you type a single command. A vdev (virtual device) is a group of physical disks arranged for redundancy: a mirror, a RAIDZ group, or a lone disk. A pool (zpool) is one or more vdevs stitched together into a single chunk of storage. A dataset is a filesystem carved out of that pool, and you can have many, each with its own properties.

The key rule to internalise: redundancy lives at the vdev level. A pool stripes data across its vdevs with no redundancy between them, so if any single vdev dies completely, the whole pool dies with it. Build each vdev to survive the failures you expect, and never add a bare single disk to a pool you care about.

Creating Your First Mirror Pool

A two-disk mirror is the friendliest place to start. It tolerates one disk failing entirely and gives you self-healing reads. First, identify your disks by their stable /dev/disk/by-id/ paths rather than /dev/sdX, which can shuffle on reboot.

1
ls -l /dev/disk/by-id/

Then create the mirror, here named tank:

1
2
3
sudo zpool create -o ashift=12 tank mirror \
  /dev/disk/by-id/ata-DISK1 \
  /dev/disk/by-id/ata-DISK2

The ashift=12 forces 4096-byte sectors, which is correct for virtually all modern drives and prevents a painful performance penalty. Check your handiwork:

1
2
zpool status tank
zpool list

Now create a dataset and set a couple of sensible properties. Compression is essentially free on modern CPUs and often speeds things up by reducing the bytes that hit the disk.

1
2
3
sudo zfs create tank/documents
sudo zfs set compression=lz4 tank/documents
sudo zfs set atime=off tank/documents

Your new filesystem is mounted at /tank/documents and ready to use.

Snapshots and Rollbacks

A snapshot is a frozen, read-only view of a dataset at a moment in time. Because of copy-on-write, taking one is instant and initially costs zero extra space; it only grows as the live data diverges from the snapshot.

1
2
sudo zfs snapshot tank/documents@before-cleanup
zfs list -t snapshot

Deleted the wrong thing five minutes later? Roll the entire dataset back:

1
sudo zfs rollback tank/documents@before-cleanup

If you only need one file back rather than the whole dataset, every snapshot is browsable under a hidden directory:

1
ls /tank/documents/.zfs/snapshot/before-cleanup/

Copy the file out, no rollback required. Automate snapshots with a tool like zfs-auto-snapshot or sanoid and you get a time machine that costs almost nothing until you actually change data.

Scrubs and Catching Bit Rot

Checksums only help when something reads the data. A scrub reads every block in the pool, verifies its checksum, and repairs any damage from redundant copies. This is how you find silent corruption before it spreads into your backups.

1
2
sudo zpool scrub tank
zpool status tank

The status output reports progress and any errors found. A healthy pool shows errors: No known data errors. Schedule scrubs roughly monthly; most distributions ship a cron job or systemd timer that does this for you. If a scrub finds and fixes errors on a particular disk repeatedly, that disk is telling you it is on its way out.

Replacing a Failed Disk

This is the moment ZFS earns its keep. Suppose zpool status shows a disk as FAULTED or DEGRADED. The pool is still serving data from its surviving mirror member, but you have no redundancy until you fix it. Here is the procedure.

Identify the dead device from the status output and note its by-id path.
1
zpool status tank
Physically replace the drive. If your enclosure supports it, offline the disk first so it is safe to pull:
1
sudo zpool offline tank /dev/disk/by-id/ata-DISK2

Tell ZFS to replace the old device with the new one (use the new disk’s by-id path):

1
2
sudo zpool replace tank /dev/disk/by-id/ata-DISK2 \
  /dev/disk/by-id/ata-NEWDISK

Watch the resilver rebuild redundancy onto the new disk:
1
zpool status tank

Once the resilver completes, zpool status returns to ONLINE and your mirror is whole again. No data was ever at risk, because the surviving disk carried a complete, checksum-verified copy throughout.

ARC, RAM, and Performance

ZFS caches data aggressively in RAM using the Adaptive Replacement Cache (ARC), which is cleverer than a simple least-recently-used cache because it balances recently used and frequently used data. The ARC is why a well-fed ZFS box feels fast: hot data never touches the disk. By default it will use up to half your RAM, and it releases that memory back when applications need it, though tools like free can make the usage look alarming until you understand it.

This brings us to the perennial myth: you do not need ECC RAM to run ZFS, and you certainly do not need “a gigabyte per terabyte”. A modest home NAS runs happily on 8GB. ECC is genuinely nice to have because it protects the data sitting in the cache before it is checksummed to disk, but its absence does not make ZFS more dangerous than any other filesystem on the same hardware. More RAM simply means a larger ARC and better cache hit rates.

Backups with Send and Receive

Snapshots protect against mistakes; they do not protect against the whole machine burning down. For real backups, zfs send serialises a snapshot into a stream you can pipe anywhere, and zfs receive reconstitutes it. Send the first full snapshot, then only the incremental differences thereafter.

1
2
3
4
5
6
7
# Full send to a backup pool on another machine
sudo zfs send tank/documents@snap1 | \
  ssh backup-host "zfs receive backup/documents"

# Later, send only what changed between two snapshots
sudo zfs send -i tank/documents@snap1 tank/documents@snap2 | \
  ssh backup-host "zfs receive backup/documents"

Because the stream is just the changed blocks, incrementals are fast and small. Tools like syncoid wrap this whole dance, including pruning old snapshots on both ends.

It is worth being clear about what zfs send is and is not. It is a superb replication tool: block-for-block, checksum-verified, incremental copies of a dataset to another ZFS pool. What it is not is a general-purpose, deduplicating, encrypted-at-rest backup archive that you can restore file-by-file onto any filesystem. For that role — offsite copies to a cloud bucket or a friend’s NAS, with client-side encryption and easy single-file restore — a dedicated backup tool is the better fit, and I cover the trade-offs in Borg vs Restic for painless encrypted backups. The mature home setup often uses both: zfs send for fast local replication between two pools, and Borg or Restic layered on top for encrypted offsite archives. They solve different halves of the problem.

ZFS on a Hypervisor

Plenty of home labs meet ZFS not on a bare NAS but underneath a virtualisation host, because Proxmox ships with ZFS as a first-class option for both the boot pool and VM storage. This is a genuinely good pairing: your virtual machine disks become ZFS datasets, which means you inherit snapshots, checksums, and send/receive replication for the VMs themselves, not just their files. A snapshot of a running VM’s dataset, shipped to a second Proxmox node with zfs send, is the backbone of a cheap home high-availability setup. If you are building out that kind of host, Proxmox 101 walks through turning an old PC into the base, and ZFS slots underneath it naturally. One caution specific to this arrangement: give the ARC a firm memory limit via zfs_arc_max when ZFS shares a box with hungry virtual machines, or the cache and the guests will fight over RAM and the guests will lose in confusing ways.

Pitfalls Worth Knowing

A few sharp edges deserve a mention. RAIDZ expansion arrived only recently in OpenZFS and remains relatively new; for years you could not simply add a single disk to grow a RAIDZ vdev, and even now you cannot freely reshape an existing layout, so plan capacity ahead. A pool’s redundancy is fixed at the vdev level, so never stripe a precious pool across a lone unprotected disk. Avoid filling a pool past roughly 80 to 90 per cent, as copy-on-write needs free space to work efficiently and performance falls off a cliff when it runs out. Finally, deduplication is a tempting feature that is almost always a trap for home users: it demands enormous RAM and rarely pays for itself. Stick with compression, which is the genuinely free win.

Conclusion

ZFS asks you to learn a handful of new concepts and a dozen new commands, and in return it hands you a storage system that detects corruption, heals itself, snapshots instantly, and ships backups across the network with a single pipe. For anyone who has ever stared at a corrupted file and wondered how long it had been quietly broken, that trade is no contest. Start with a two-disk mirror, schedule a monthly scrub, automate your snapshots, and send those snapshots somewhere off-site. Do that, and you have built yourself a filesystem that genuinely refuses to lose your data.

Written by Smarc

Founder and editor of vo.rs. A lifelong tinkerer who self-hosts far more than is sensible, hardens Linux boxes for fun, and prods the latest AI tools to see what they can really do. The how-to guides here are the notes Smarc wishes had existed the first time round.

Tagged#linux #zfs #storage #backup