Redundant array of independent disks
|
In computing, a redundant array of independent disks (more commonly known as a RAID) is a system of using multiple hard drives for sharing or replicating data among the drives. Depending on the version chosen the benefit of RAID is a one or more of increased data integrity, fault-tolerance, performance or capacity compared to single drives. In its original implementations (in which it was an abbreviation for "Redundant Array of Inexpensive Disks"), its key advantage was the ability to combine multiple low-cost devices using older technology into an array that together offered greater capacity, reliability, and/or speed than was affordably available in singular devices using the newest technology.
At the simplest level, RAID is one of many ways to combine multiple hard drives into one single logical unit. Thus, instead of seeing several different hard drives, the operating system sees only one. RAID is typically used on server computers, and is usually implemented with identically-sized disk drives. With decreases in hard drive prices and wider availability of RAID options built into motherboard chipsets, RAID is also being found and offered as an option in higher-end end user computers, especially computers dedicated to storage-intensive tasks, such as video and audio editing.
The original RAID specification suggested a number of prototype "RAID Levels", or combinations of disks. Each had theoretical advantages and disadvantages. Over the years, different implementations of the RAID concept have appeared. Most differ substantially from the original idealized RAID levels, but the numbered names have remained. This can be confusing, since one implementation of RAID 5, for example, can differ substantially from another. RAID 3 and RAID 4 are often confused and even used interchangeably.
The very definition of RAID has been argued over the years. The use of the term redundant leads many to split hairs over whether RAID 0 is "real" RAID. Similarly, the change from inexpensive to independent confuses many as to the intended purpose of RAID. There are even some single-disk implementations of the RAID concept. For the purpose of this article, we will say that any system which employs the basic RAID concepts to recombine physical disk space for purposes of reliability or performance is a RAID system.
Contents |
History
RAID was first patented by IBM in 1978. In 1988, RAID levels 1 through 5 were formally defined by David A. Patterson, Garth A. Gibson and Randy H. Katz in the paper, "A Case for Redundant Arrays of Inexpensive Disks (RAID)" (http://www-2.cs.cmu.edu/~garth/RAIDpaper/Patterson88.pdf). This was published in the SIGMOD Conference 1988: pp 109–116. The term "RAID" started with this paper.
It was particularly ground-breaking work in that the concepts are both novel and "obvious" in retrospect once they have been described. This paper spawned the entire disk array industry.
RAID implementations
Inexpensive vs. independent
While the "I" in RAID now generally means independent, rather than inexpensive, one of the original benefits of RAID was that it did use inexpensive equipment, and this still holds true in many situations, where IDE/ATA disks are used.
More commonly, independent (more expensive) SCSI hard disks are used, although the cost of such disks is now much lower than it once was—and much lower than the systems RAID was originally intended to replace.
Hardware vs. software
RAID can be implemented either in dedicated hardware or custom software running on standard hardware.
With a software implementation, the operating system manages the disks of the array through the normal drive controller (IDE, SCSI, Fibre Channel or any other). With present CPU speeds, software RAID can be faster than hardware RAID, though at the cost of using CPU power which might be best used for other tasks. One major exception is where the hardware incorporates a battery backed up write cache and an application, like a database server, is flushing writes to secure storage to preserve data at a known point if there is a crash. In this case the software solution is limited to no more flushes than the number of rotations or seeks per second of the drives, while the hardware approach is faster and limited instead by RAM speeds, the amount of cache and how fast it can flush the cache to disk. For this reason, battery-backed caching disk controllers are often recommended for high transaction rate database servers.
A hardware implementation of RAID requires (at a minimum) a special-purpose RAID controller. On a desktop system, this may be a PCI expansion card, or might be a capability built in to the motherboard. In larger RAIDs, the controller and disks are usually housed in an external multi-bay enclosure. The disks may be IDE, ATA, SATA, SCSI, or Fibre Channel while the controller links to the host computer(s) with one or more high-speed SCSI, Fibre Channel or iSCSI connections, either directly, or through a fabric, or is accessed as Network Attached Storage. This controller handles the management of the disks, and performs parity calculations (needed for many RAID levels). This option tends to provide better performance, and makes operating system support easier. Hardware implementations also typically support hot swapping, allowing failed drives to be replaced while the system is running.
Both hardware and software versions may support the use of a hot spare, a preinstalled drive which is used to immediately (and almost always automatically) replace a failed drive.
Standard RAID levels
RAID 0
A RAID 0 (also known as a striped set) splits data evenly across two or more disks with no parity information for redundancy. It is important to note that RAID 0 was not one of the original RAID levels, and is not redundant. RAID 0 is normally used to increase performance, although it is also a useful way to create a small number of large virtual disks out of a large number of small physical ones. Although RAID 0 was not specified in the original RAID paper, an idealized implementation of RAID 0 would split I/O operations into equal-sized blocks and spread them evenly across two disks. RAID 0 implementations with more than two disks are also possible, however the reliability of a given RAID 0 set is equal to the average reliability of each disk divided by the number of disks in the set. That is, reliability (as measured by mean time between failures (MTBF)) is inversely proportional to the number of members—so a set of two disks is half as reliable as a single disk. The reason for this is that the file system is distributed across all disks. When a drive fails the file system cannot cope with such a large loss of data and coherency since the data is "striped" across all drives. Data can be recovered using special tools. However, it will be incomplete and most likely corrupt.
Traditional RAID 0 A1 A2 A3 A4 A5 A6 A7 A8
Note: A1, B1, et cetera each represent one data byte.
RAID 0 is useful for setups such as large read-only NFS servers where mounting many disks is time-consuming or impossible and redundancy is irrelevant. Another use is where the number of disks is limited by the operating system. In Microsoft Windows, the number of drive letters for hard disk drives may be limited to 24, so RAID 0 is a popular way to use more than this many disks. However, since there is no redundancy, yet data is shared between drives, hard drives cannot be swapped out as all disks are dependent upon each other.
Concatenation (JBOD)
Although a concatenation of disks (also called JBOD, or "Just a Bunch of Disks") is not one of the numbered RAID levels, it is a popular method for combining multiple physical disk drives into a single virtual one. As the name implies, disks are merely concatenated together, end to beginning, so they appear to be a single large disk.
In this sense, concatenation is akin to the reverse of partitioning. Whereas partitioning takes one physical drive and creates two or more logical drives, JBOD uses two or more physical drives to create one logical drive.
In that it consists of an Array of Inexpensive Disks (no redundancy), it can be thought of as a distant relation to RAID. JBOD is sometimes used to turn several odd-sized drives into one useful drive. Therefore, JBOD could use a 3 GB, 15 GB, 5.5 GB, and 12 GB drive to combine into a logical drive at 35.5 GB, arguably more useful than the individual drives separately.
RAID 1
A RAID 1 creates an exact copy (or mirror) of all of data on two or more disks. This is useful for setups where redundancy is more important than using all the disks' maximum storage capacity. The array can only be as big as the smallest member disk, however. An ideal RAID 1 set contains two disks, which increases reliability by a factor of two over a single disk, but it is possible to have many more than two copies. Since each member can be addressed independently if the other fails, reliability is a linear multiple of the number of members. RAID 1 can also provide enhanced read performance, since many implementations can read from one disk while the other is busy.
One common practice is to create an extra mirror of a volume (also known as a Business Continuance Volume or BCV) which is meant to be split from the source RAID set and used independently. In some implementations, these extra mirrors can be split and then incrementally re-established, instead of requiring a complete RAID set rebuild.
Traditional RAID 1 A1 A1 A2 A2 A3 A3 A4 A4
Note: A representation of a typical RAID 1. Data A1, A2, et cetera is spread out across two disks, increasing reliability and speed.
RAID 2
A RAID 2 stripes data at the bit (rather than block) level, and uses a Hamming code for error correction. The disks are synchronized by the controller to run in perfect tandem. This is the only original level of RAID that is not currently used.
RAID 3
A RAID 3 uses byte-level striping with a dedicated parity disk. RAID 3 is extremely rare in practice. One of the side effects of RAID 3 is that it generally cannot service multiple requests simultaneously. This comes about because any single block of data will by definition be spread across all members of the set and will reside in the same location, so any I/O operation requires activity on every disk.
In our example, below, a request for block "A1" would require all three data disks to seek to the beginning and reply with their contents. A simultaneous request for block B1 would have to wait.
Traditional RAID 3 A1 A2 A3 Ap(1-3) A4 A5 A6 Ap(4-6) A7 A8 A9 Ap(7-9) B1 B2 B3 Bp(1-3)
Note: A1, B1, et cetera each represent one data byte.
RAID 4
A RAID 4 uses block-level striping with a dedicated parity disk. RAID 4 looks similar to RAID 3 except that it stripes at the block, rather than the byte level. This allows each member of the set to act independently when only a single block is requested. If the disk controller allows it, a RAID 4 set can service multiple read requests simultaneously. Network Appliance uses RAID 4 on their Filer line of network storage servers.
In our example, below, a request for block "A1" would be serviced by disk 1. A simultaneous request for block B1 would have to wait, but a request for B2 could be serviced concurrently.
Traditional RAID 4 A1 A2 A3 Ap B1 B2 B3 Bp C1 C2 C3 Cp D1 D2 D3 Dp
Note: A1, B1, et cetera each represent one data block.
RAID 5
A RAID 5 uses block-level striping with parity data distributed across all member disks. RAID 5 is one of the most popular RAID levels, and is frequently used in both hardware and software implementations. Virtually all storage arrays offer RAID 5.
In our example, below, a request for block "A1" would be serviced by disk 1. A simultaneous request for block B1 would have to wait, but a request for B2 could be serviced concurrently.
Traditional RAID 5 A1 A2 A3 Ap B1 B2 Bp B3 C1 Cp C2 C3 Dp D1 D2 D3
Note: A1, B1, et cetera each represent one data block.
Every time a data "block" (sometimes called a "chunk") is written on a disk in an array, a parity block is generated within the same stripe. (A block or chunk is often composed of many consecutive sectors on a disk, sometimes as many as 256 sectors. A series of chunks [a chunk from each of the disks in an array] is collectively called a "stripe".) If another block, or some portion of a block is written on that same stripe, the parity block (or some portion of the parity block) is recalculated and rewritten. The disk used for the parity block is staggered from one stripe to the next, hence the term "distributed parity blocks".
Interestingly, the parity blocks are not read on data reads, since this would be unnecessary overhead and would diminish performance. The parity blocks are read, however, when a read of a data sector results in a cyclic redundancy check (CRC) error. In this case, the sector in the same relative position within each of the remaining data blocks in the stripe and within the parity block in the stripe are used to reconstruct the errant sector. The CRC error is thus hidden from the main computer. Likewise, should a disk fail in the array, the parity blocks from the surviving disks are combined mathematically with the data blocks from the surviving disks to reconstruct the data on the failed drive "on-the-fly".
This is sometimes called Interim Data Recovery Mode. The main computer is unaware that a disk drive has failed. Reading and writing to the drive array continues seamlessly, though with some performance degradation.
In RAID 5 arrays, which have only one parity block per stripe, the failure of a second drive results in total data loss.
The maximum number of drives is theoretically unlimited, but it is common practice to keep the maximum to 14 or fewer for RAID 5 implementations which have only one parity block per stripe. The reason for this restriction is that there is a greater likelihood of two drives in an array failing in rapid succession when there is greater number of drives. As the number of disks in a RAID 5 increases, the MTBF for the array as a whole can even become lower than that of a single disk. This happens when the likelihood of a second disk failing out of (N-1) dependent disks, within the time it takes to detect, replace and recreate a first failed disk, becomes larger than the likelihood of a single disk failing.
One should be aware that many disks together increase heat, which lowers the real-world MTBF of each disk. Additionally, a group of disks bought at the same time may reach the end of their Bathtub Curve together, noticeably lowering the effective MTBF of the disks during that time.
In implementations with greater than 14 drives, or in situations where extreme redundancy is needed, RAID 5 with dual parity (also known as RAID 6) is sometimes used, since it can survive the failure of up to three disks.
RAID 6
A RAID 6 uses block-level striping with parity data distributed twice across all member disks. It was not one of the original RAID levels.
In RAID 6, parity is generated and written to two distributed parity stripes, on two separate drives, using a different parity stripe in each two dimensional "direction".
Traditional Typical RAID 5 RAID 6 A1 A2 A3 Ap A1 A2 A3 p4 Dp B1 B2 Bp B3 B1 B2 p3 Cp B3 C1 Cp C2 C3 C1 p2 Bp C2 C3 Dp D1 D2 D3 p1 Ap D1 D2 D3
Note: A1, B1, etcetera each represent one data block.
RAID 6 is more redundant than RAID 5, but is very inefficient when used with a small number of drives. See also Double parity below for another, more redundant implementation.
Nested RAID Levels
Many storage controllers allow RAID levels to be nested. That is, one RAID can use another as its basic element, instead of using physical disks. You can think of the RAID arrays as layered on top of each other, with physical disks at the bottom.
Nested RAID arrays are usually signified by joining the numbers indicating the RAID levels into a single number, sometimes with a '+' in between. For example, RAID 10 (or RAID 1+0) conceptually consists of multiple RAID1 arrays stored on physical disks with a RAID 0 array on top, striped over the RAID 1 arrays. In the case of RAID 0+1, it is most often called RAID 0+1 as opposed to RAID 01 to avoid confusion with RAID 1. Opposed to this, when the top array is a RAID 0 (fx. RAID 10 and RAID 50), most vendors choose to omit the '+', probably because RAID 50 sounds fancier than the more explanatory RAID 5+0.
When nesting RAID levels, a RAID type that provides redundancy is typically combined with RAID 0 to boost performance. With these configurations it is preferable to have RAID 0 on top and the redundant RAID array at the bottom, because fewer disks then need to be regenerated when a disk fails. (Thus, RAID 10 is for example preferable to RAID 0+1.)
RAID 0+1
A RAID 0+1 (also called RAID 01, although it must not be confused with RAID 1) is a RAID used for both replicating and sharing data among disks. The difference between RAID 0+1 and RAID 10 is the location of each RAID system— it is a mirror of stripes. Consider an example of RAID 0+1: six 120GB drives need to be set up on a RAID 0+1. Below is an example configuration:
RAID 1 | /-----------------\ | | RAID 0 RAID 0 /-----------\ /-----------\ | | | | | | 120GB 120GB 120GB 120GB 120GB 120GB
where the maximum storage space here is 360GB, spread across two arrays. The advantage is that when a hard drive fails in one of the RAID 0's, the missing data can be transferred from the other array. However, adding an extra hard drive requires you to add two hard drives to balance out storage among the arrays.
It is not as robust as RAID 10 and cannot tolerate two simultaneous disk failures, if not from the same stripe. That is to say, once a single disk fails, all the disks in the other stripe are each individual single points of failure. Also, once the single failed disk is replaced, in order to rebuild its data all the disks in the array must participate in the rebuild.
RAID 10
A RAID 10, sometimes called RAID 1+0, is similar to a RAID 0+1 except that the RAID levels used are reversed—RAID 10 is a stripe of mirrors. Below is an example where three collections of 120 GB RAID 1's are striped together to add up to 360 GBs of total storage space:
RAID 0 | /-----------------------------\ | | | RAID 1 RAID 1 RAID 1 /------\ /------\ /------\ | | | | | | 120GB 120GB 120GB 120GB 120GB 120GB
One drive from each RAID 1 set could fail without damaging the data. However, if the failed drive is not replaced, the single working hard drive in the set then becomes a single point of failure for the entire array. If that single hard drive then fails, all data stored in the entire array is lost.
Extra 120GB hard drives could be added to any one of the RAID 1's to provide extra redundancy. Unlike RAID 0+1, all the "sub-arrays" do not have to be upgraded at once.
RAID 1.5
RAID 1.5 supports both striping and mirroring. Both RAID 1.5 and RAID 15 combine striping (read access over two drives simultaneously) and mirroring (data is written like in RAID 1).
The controller handles physical striping—i.e., data are alternately written (or read) to one disk and then the other, maximizing the data stream because both drives are being used—similar to RAID 0. Unlike RAID 0 however, the capacity available with the RAID 1.5 equals the capacity of a single hard drive. RAID 1.5 offers optimum performance and data security.
This is as quick at sequentially reading as RAID 0, while writing is as fast as RAID 1.
RAID 50 (RAID 5+0)
A RAID 50 combines the block-level striping with distributed parity of RAID 5, with the straight block-level striping of RAID 0. This is a RAID 0 array striped across RAID 5 elements.
Below is an example where three collections of 120 GB RAID 5's are striped together to add up to 720 GBs of total storage space:
RAID 0 | /------------------------------------------\ | | | RAID 5 RAID 5 RAID 5 /-------------\ /-------------\ /-------------\ | | | | | | | | | 120GB 120GB 120GB 120GB 120GB 120GB 120GB 120GB 120GB
One drive from each of the RAID sets could fail without damaging the data. However, if the failed drive is not replaced, the remaining working drives in that set then become a single point of failure for the entire array. If one of those drives then fail, all data stored in the entire array is lost. The time spent in recovery (detecting and responding to a drive failure, and the rebuild process to the newly inserted drive) represents a period of vulnerability to the RAID set.
In the example below, datasets may be striped across both RAID sets. A dataset with 5 blocks would have 3 blocks written to the 1st RAID set, and the next 2 blocks written to RAID set 2.
RAID Set 1 RAID Set 2 A1 A2 A3 Ap A4 A5 A6 Ap B1 B2 Bp B3 B4 B5 Bp B6 C1 Cp C2 C3 C4 Cp C5 C6 Dp D1 D2 D3 Dp D4 D5 D6
Note: A1, B1, et cetera each represent one data block.
The configuration of the RAID sets will impact the overall fault tolerancy. A construction of three seven-drive RAID 5 sets has higher capacity and storage efficiency, but can only tolerate three maximum potential drive failures. A construction of seven three-drive RAID 5 sets can handle as many as seven drive failures but has lower capacity and storage efficiency.
RAID 50 improves upon the performance of RAID 5 particularly during writes, and provides better fault tolerance than a single RAID level does. This level is recommended for applications that require high fault tolerance, capacity and random positioning performance.
As the number of drives in a RAID set increases, and the capacity of the drives increase, this impacts the fault-recovery time correspondingly as the interval for rebuilding the RAID set increases.
Proprietary RAID levels
Although all implementations of RAID differ from the idealized specification to some extent, some companies have developed entirely proprietary RAID implementations that differ substantially from the rest of the crowd.
Double parity
One common addition to the existing RAID levels is double parity, sometimes implemented and known as diagonal parity. As in RAID 6, there are two sets of parity check information created. Unlike RAID 6, however, the second set is not a mere "extra copy" of the first. Rather, most implementations of Double Parity calculate the extra parity against a different group of blocks. While traditional RAID 5 and 6 calculates parity against one group of blocks (A1, A2, A3, AP), Double Parity calculates parity against different groups, for example, in our graph both RAID 5 and RAID 6 calculate against all A-lettered blocks to produce one or more parity blocks. However, it is fairly easy to calculate parity against multiple groups of blocks, instead of just A-lettered blocks, one can calculate all A-lettered blocks and all 1-numbered blocks.
Traditional Typical Double parity RAID 5 RAID 6 RAID 5 A1 A2 A3 Ap A1 A2 Ap Ap A1 A2 A3 Ap B1 B2 Bp B3 B1 Bp B2 Bp B1 B2 Bp B3 C1 Cp C2 C3 Cp C1 Cp C2 C1 Cp C2 C3 Dp D1 D2 D3 Dp Dp D1 D2 1p 2p 3p --
Note: A1, B1, et cetera each represent one data block.
RAID 7
RAID 7 is a trademark of Storage Computer Corporation. It adds caching to RAID 3 or RAID 4 to improve performance.
RAID S or Parity RAID
RAID S is EMC Corporation's proprietary striped parity RAID system used in their Symmetrix storage systems. Each volume exists on a single physical disk, and multiple volumes are arbitrarily combined for parity purposes. EMC originally referred to this capability as RAID S, and then renamed it Parity RAID for the Symmetrix DMX platform. EMC now offers standard striped RAID 5 on the Symmetrix DMX as well.
Traditional EMC RAID 5 RAID S A1 A2 A3 Ap A1 B1 C1 1p B1 B2 Bp B3 A2 B2 C2 2p C1 Cp C2 C3 A3 B3 C3 3p Dp D1 D2 D3 A4 B4 C4 4p
Note: A1, B1, et cetera each represent one data block. A, B, et cetera are entire volumes.
Matrix RAID
Matrix RAID is a feature that first appeared in the Intel ICH6R RAID BIOS. It is not a new RAID level. Matrix RAID utilizes two physical disks. Half of each disk is assigned to a RAID 0 array, the other half to a RAID 1 array. Currently, most (all?) of the other cheap RAID BIOS products only allow one disk to participate in a single RAID array. The product targets home users, providing a safe area (the RAID 1) for documents and other items that one wish to store redundantly, and a faster area for operating system, applications, etc.
See also
External links
- RAID Disk Space Calculator (http://www.ibeast.com/content/tools/RaidCalc/RaidCalc.asp)
- PCGuide.com's extremely detailed RAID pages (http://www.pcguide.com/ref/hdd/perf/raid/index-c.html)
- Disk Based Backup: All Hype or the Best Protection for your Data? (http://www.windowsecurity.com/articles/Disk-Based-Backup.html)
- Reference Guide — Hard Disk Drives (http://storagereview.com/guide/guide_index.html)
- RAID Levels — Tutorial and Diagrams (http://www.acnc.com/raid.html)
- Why RAID may not be a good idea for a home PC (http://www.bestpricecomputers.co.uk/reviews/Home-PC-RAID/index.htm)
- Slashdot: Which RAID for a Personal Fileserver? (http://slashdot.org/article.pl?sid=04/06/16/1658250) — "Ask Slashdot" article with 800+ comments and suggestions from network admins, technicians, or otherwise 'geeks' of the web.
- Experiences w/ Software RAID 5 Under Linux? (http://ask.slashdot.org/article.pl?sid=04/10/30/184256) — "Ask Slashdot" article on RAID 5.
- Linux Software-RAID HOWTO (http://www.tldp.org/HOWTO/Software-RAID-HOWTO.html)
- Floppy Disk RAID (http://ohlssonvox.8k.com/fdd_raid.htm) — A humorous experiment to create a RAID array using very inexpensive disks: floppy disks!
- iPod shuffle RAID (http://www.wrightthisway.com/Articles/000154.html) — Another humorous project using iPod Shuffles instead of floppy disks.de:RAID
es:RAID fr:RAID (informatique) it:Redundant array of independent disks nl:RAID ja:RAID pl:RAID (informatyka) pt:RAID fi:RAID (tietotekniikka) sv:RAID