Corrupting a ZFS File on Purpose

69 points13 comments3 days ago
guardiangod

I ran 5 external USB + SMR hard disks in RAIDZ 5 for 10 years. The only thing I had to change was to use Highpoint's enterprise level USB controllers- commercial USB controllers from Realtek and Renasus are junk and will drop the drives after a while.

Even then, I had multiple cases where files were corrupted, and once the whole array refused to be online due to corrupted metadata. I had to make ZFS to replay the journal log with undocumented commands. Sometimes it takes a few days of hair-rising recovery but I always manage to get the array back intact.

The files that are corrupted are always extremely large files (>50 GB) with many small read/writes (eg. iSCSI image files.)

It's pretty impressive how resilient ZFS is, really, given I had what likely to be the worst possible hardware combination.

ralferoo

Hmmm, it's been a long long time since I actually had a failed drive (and also I don't use zfs), but from what I remember of my last failing drive 20 years ago, the drive was able to detect that sectors had been corrupted, and then failed the read rather than just returning silently corrupted data. If my memory is correct, replacing random bytes on disk wouldn't actually reflect the typical way data corruption manifests itself.

I always thought that the reason zfs did its extensive CRC checks was primarily to detect data corruption while it was in RAM or over the network, with a side effect that in the rare cares that data on disk got corrupted without the drive detecting it because the CRC was still valid, it'd also be spotted.

But anyway, it might be worth testing by replacing some of the disk images with actually truncated ones so that there are holes when reading, so that it returns an actual read error rather than junk data.

show comments
anonymous_user9

> The DVA was correct, the sector math was correct, the dd command was correct. The right place, the wrong mental model.

God the intensity is tiresome. Whether or not it's AI slop, it's also bad writing. Things can be fun or interesting or worthwhile without being a harrowing battle of discovery!

show comments
lanycrost

I miss ZFS, only had a chance once to work with it in production and liked it very much. It's have performance overhead compared to journal filesystems but greatly designed.

igtztorrero

I always run my servers on zfs pool mirrored using raid1 on 2 nvme drives, because when nvme fails, fail completely. How can a File be corrupted on normal operations?