Saturday, March 6, 2010

What's the best pool to build with 3 or 4 disks?

I've been asked many times in variations, "I just started using EON and I'm new to opensolaris, What's the best pool to build with 3 or 4 disks"? I usually answer, it depends! Credit that reflex answer to Prof. Gordon, one of the best Calculus and Differential equations teacher that walked in my time. May the force be with you, wherever you are!

I'll use Richard Elling's research to explain. Let's say I have 500Gb drives, with IOPs (for avg, small, random, cache-miss, read I/O operations per sec) = 70.59 and max media bandwidth of 133Mbytes/s(includes read and write). What can we build?
RAID Type   Disks   Sets   Storage Space   Performance (IOPS)   Max BW(Mbytes/s)
RAIDZ       4       1      3x500=1500Gb    4x70.59/3=94         3x133=399
RAIDZ       3       1      2x500=1000Gb    3x70.59/2=106        2x133=266 (1 spare)
STRIPE      4       1      4x500=2000Gb    4x70.59=282          4x133=532
STRIPE      3       1      3x500=1500Gb    3x70.59=212          3x133=399
MIRROR      2       2      2x500=1000Gb    4x70.59=282          4x133=532
With 4 disks in the first raidz set, we get higher bandwidth (399Mbytes/s) vs the 3 disk raidz bandwidth (266Mbytes/s), but the 3 disk raidz pool has a higher I/O operations per second capability. Note, as "sets" are added to the 3 disk raidz (3 disks each time) the difference of IOPS between the 4 disk raidz widens. If you exhaust the usable storage space, it will cost 4 or 3 times the cost of a drive for each new "Set", to add or grow the storage. So the 3 drive raidz has a more economical cost per set. This can be repeated to add more "Sets" or more storage and bandwidth as needed. So this is a very flexible choice. The change with 1 additional set would look like.
RAID Type   Disks   Sets   Storage Space   Performance (IOPS)   Max BW(Mbytes/s)
RAIDZ       4       2      3000Gb          188                  798
RAIDZ       3       2      2000Gb          212                  532
Both 4 and 3 disk raidz allows only 1 disk to fail but if all disks had the same probability of failure then the 4 disk raidz pool would have a higher probability of a failure than the 3 disk version.

Has great bandwidth numbers, usable storage and IOPS, but any disk failure would cause the pool to fail and lose ALL your data. Did I mention that good storage is NOT a substitute for a GOOD backup? This pool is not easily expanded when the usable storage is exhausted and offers no data redundancy.

Has great bandwidth numbers, a higher cost per usable storage and allows failure of 2 disks. It has roughly twice the write bandwidth and up to 4 times the read performance as ZFS is capable of reading from all disks in the mirror in parallel. This configuration will most likely provide the best balance of performance and data protection at the expense of disks or usable storage. Expanding or growing this pool when the usable storage is being exhausted, is also simple.

Hopefully this will help architect pools that suits your workload, cost dynamics and growth needs.


Constantin said...

Thanks for going through all the permutations, that's quite thorough :).

By now, I sound like a broken record, but let me repeat: Disks are cheap. 1TB costs less than $100. Spending money on disks should be your last concern. Treat them like ink tanks for your printer.

What's not cheap is your data. Few people think of backups when storing data at home, that makes it more important to prioritize disk failure resistance over capacity.

At 3-4 disks, there aren't very many options, and given that consumer disks fail often and quickly, it makes sense to use one of the disks as a hot spare.

So if you have only 3 disks, use one as a hot spare and mirror the rest. You'll get good performance, good fault tolerance and since capacity is cheap, you should spend the extra cash for capacity, knowing that it's more important to keep your data safe. And then you can sync in the hot-spare into your mirror already, before anything breaks, by creating a 3-way mirror!

With 4 disks, you don't have much choice. Either you go with RAID-Z (but then you need to replace any failed disk very quickly), or you go with the 2+2 mirror. Which I'd recommend.

Expanding mirrors is easy: You either add another pair or you exchange 2 disks for bigger ones (by zpool attaching them, then zpool removing the old ones). Either way, the granularity of disks to spend money on is 2.

Expanding RAID-Z is more complicated. Either you zpool add a mirror, which creates a mixed pool config (RAID-Z + mirror) which is awkward, or you expand your pool evenly by adding sets of the same size (3 disks or more). Which is a bigger granularity.

I'm currently using 3 disks for my main pool. I started with 2 x 1.5 TB. When one of them reported a few read failures, I added a 2 TB one as a 3rd mirror half. When one of the 1.5 TB disks fails for good, I'll buy the 2nd 2TB disk and rip out both 1.5 TB disks. That will rejuvenate my pool while growing its size organically, as I react to inevitable disk failures.

A more thorough discussion of this is in my blog (including lot of comments): Home Server: RAID-GREED and Why Mirroring is Still Best


Anonymous said...

thank for share, it is very important . ̄︿ ̄

dit said...


i have 4 disks :
1x500gb as filesystem
3x500gb as databank raidz1

what if the 1st disk get fail [completely]?
can i get back my databank automatically once im installing new filesystem using the new disk?

thx b4