RAID10 is one of the combinations of RAID1 (mirroring) and RAID0
(striping) which are possible. There used to be confusion about what
RAID01 or RAID10 meant and different RAID vendors defined them
differently. About five years or so ago I proposed the following standard
language which seems to have taken hold. When N mirrored pairs are
striped together this is called RAID10 because the mirroring (RAID1) is
applied before striping (RAID0). The other option is to create two stripe
sets and mirror them one to the other, this is known as RAID01 (because
the RAID0 is applied first). In either a RAID01 or RAID10 system each and
every disk block is completely duplicated on its drive’s mirror.
Performance-wise both RAID01 and RAID10 are functionally equivalent. The
difference comes in during recovery where RAID01 suffers from some of the
same problems I will describe affecting RAID5 while RAID10 does not.
Now if a drive in the RAID5 array dies, is removed, or is shut off data is
returned by reading the blocks from the remaining drives and calculating
the missing data using the parity, assuming the defunct drive is not the
parity block drive for that RAID block. Note that it takes 4 physical
reads to replace the missing disk block (for a 5 drive array) for four out
of every five disk blocks leading to a 64% performance degradation until
the problem is discovered and a new drive can be mapped in to begin
recovery. Performance is degraded further during recovery because all
drives are being actively accessed in order to rebuild the replacement
drive (see below).
If a drive in the RAID10 array dies data is returned from its mirror drive
in a single read with only minor (6.25% on average for a 4 pair array as a
whole) performance reduction when two non-contiguous blocks are needed from
the damaged pair (since the two blocks cannot be read in parallel from both
drives) and none otherwise.
RAID5 uses ONLY ONE parity drive per stripe and many
RAID5 arrays are 5 (if your counts are different adjust the calculations
appropriately) drives (4 data and 1 parity though it is not a single drive
that is holding all of the parity as in RAID 3 & 4 but read on). If you
have 10 drives or say 20GB each for 200GB RAID5 will use 20% for parity
(assuming you set it up as two 5 drive arrays) so you will have 160GB of
storage. Now since RAID10, like mirroring (RAID1), uses 1 (or more) mirror
drive for each primary drive you are using 50% for redundancy so to get the
same 160GB of storage you will need 8 pairs or 16 – 20GB drives, which is
why RAID5 is so popular. This intro is just to put things into
perspective.
RAID5 is physically a stripe set like RAID0 but with data recovery
included. RAID5 reserves one disk block out of each stripe block for
parity data. The parity block contains an error correction code which can
correct any error in the RAID5 block, in effect it is used in combination
with the remaining data blocks to recreate any single missing block, gone
missing because a drive has failed. The innovation of RAID5 over RAID3 &
RAID4 is that the parity is distributed on a round robin basis so that
there can be independent reading of different blocks from the several
drives. This is why RAID5 became more popular than RAID3 & RAID4 which
must sychronously read the same block from all drives together. So, if
Drive2 fails blocks 1,2,4,5,6 & 7 are data blocks on this drive and blocks
3 and 8 are parity blocks on this drive. So that means that the parity on
Drive5 will be used to recreate the data block from Disk2 if block 1 is
requested before a new drive replaces Drive2 or during the rebuilding of
the new Drive2 replacement. Likewise the parity on Drive1 will be used to
repair block 2 and the parity on Drive3 will repair block4, etc. For block
2 all the data is safely on the remaining drives but during the rebuilding
of Drive2′s replacement a new parity block will be calculated from the
block 2 data and will be written to Drive 2.
Now when a disk block is read from the array the RAID software/firmware
calculates which RAID block contains the disk block, which drive the disk
block is on and which drive contains the parity block for that RAID block
and reads ONLY the one data drive. It returns the data block. If you
later modify the data block it recalculates the parity by subtracting the
old block and adding in the new version then in two separate operations it
writes the data block followed by the new parity block. To do this it must
first read the parity block from whichever drive contains the parity for
that stripe block and reread the unmodified data for the updated block from
the original drive. This read-read-write-write is known as the RAID5 write
penalty since these two writes are sequential and synchronous the write
system call cannot return until the reread and both writes complete, for
safety, so writing to RAID5 is up to 50% slower than RAID0 for an array of
the same capacity. (Some software RAID5′s avoid the re-read by keeping an
unmodified copy of the orginal block in memory.)