Blobbers - To RAID or not to RAID?

RAID (Redundant Array of Independent Disks) comes in various flavours including straight duplication, redundancy through parity and a combination of both.

The appeal to potential Service Providers when hosting data across multiple drives is that in case of drive failure, the failed drive can be replaced and data can be reconstructed from the other drives. The ‘expense’ is usually just seen as the amount of drive space reserved for the striping/parity storage but in reality it’s a little more complicated than that…

On the 0chain network, it is entirely up to the blobber if they want to choose a RAID option themselves. Since there is already redundancy in the data because of the erasure encoding across multiple blobbers, the concerns are mitigated on the client side, so it is an economical choice for the blobber.

However, what is not common knowledge is that RAID actually puts much more stress on HDDs and so risk of drive failure actually increases! Think about it. Instead of writing to a single disk, using RAID requires writes to multiple disks with stripe and parity data. So based on risk of drive failure being proportional to number of writes, these drives will likely fail much earlier than if written to independently.

Then there is the repair process of RAID. In the event of drive failure and replacement, the new drive data is reconstructed from the other drives in the array. This is a very intensive process and must run concurrently with normal running of the machine (unless the machine is taken 'offline’). The larger the disks, the longer this process will take and more risk of another drive in the array failing. Remember these are typically identical drives that have been written to equal amount because of RAID so chance of failure around the same time is increased anyway but the additional stress of the repair process makes this exponentially more likely.

When using RAID5 (single parity), a second disk failure during repair process would be disastrous and likely result in loss of all data.

If RAID6 (double parity) is used, drive failure could be tolerated but the additional stress of the repair process would be the same.

There are other variants of the above generally considered superior, including RAID Z, but the above principles are the same and in fact the repair time for RAID Z is much longer than for its RAID5/6 equivalent.

Interestingly, although the erasure-encoding used by 0chain is very close in nature to RAID, the diverse and distributed nature of blobbers will actually deliver meaningful and consistent performance to the theoretical reliability indexes of redundancy, which for reasons stated above, physical drives in the same array do not.

For Operating System drives, (and in the case of 0Chain, Miner and Sharder data), it is recommended to consider using Raid 1 (mirror). This would be an investment in the reliability of the server and likely worthwhile relative to the cost of the rest of the system.

1 Like