I have lots of files on my computer (about 20 TB). Much of this is stored on RAID 5 disks using SOFTRaid, and all of it is fully backed up on RAID 5 disks (4 sets, that rotate off-site).
I just had a disk fail, and I was reading up on RAID 5, and I found out that I didn’t actually understand the limitations when it comes to bit rot (not disk failure).
SoftRAID has a “Validation” pass that purports to fix any parity errors. What I just realized is that it works by looking at the data blocks (say, disks 1, 2, and 3 of a 4-disk RAID) and then fixes the parity block (disk 4 in this case) if it isn’t consistent. However, if a bit has flipped in the data blocks (75% chance when there is a parity mismatch), this just “bakes in” the error, and the file is now compromised.
In fact, I had an 8-disk RAID 5 Time Machine drive (which is now in long-term storage) that I validated, and SoftRAID reported that it “fixed” 584 parity errors. Given the figures above, perhaps 73 (1/8) of these changes created valid parity blocks for good files, while 511 (7/8) created valid parity blocks for corrupted files. (Validating a disk is good for identifying bit rot, but can’t, in general, fix it).
zfs and other formats use checksums which supposedly allow you to identify and repair bit rot.
Has anyone successfully set up a relatively cheap and low-maintenance system to allow one’s Mac to use a file system with checksums? I am only going to do this if it is easy. (For instance, maintaining a linux computer and keeping it up to date is not “easy.”) I would envision doing this for my data disks, while continuing to use APFS for my Time Machine backups (going with “easy”).
Any ideas?