HS-RAID 2 : Optimizing Small Write Performance in HS-RAID

. HS-RAID(HybridSemi-RAID),apower-awareRAID,savesenergybygroupingdisksinthearray.AllofthewriteoperationsinHS-RAIDaresmallwritewhichdegradethestoragesystem’sperformanceseverely.Inthispaper,weproposearedundancyalgorithm, dataincrementalparityalgorithm(DIP),whichemploysHS-RAIDtominimizethewritepenaltyandimprovestheperformance andreliabilityofthestoragesystems.TheexperimentalresultsshowthatHS-RAID 2 (HS-RAID with DIP) is faster and has higher reliability than HS-RAID remarkably.


Introduction
RAID (Redundant Array of Independent Disks) [1,2] combines multiple disk drives into a logical unit for the purposes of data redundancy or performance improvement.However, storage systems enlarge dramatically with the increasing of data.In April 2014, IDC reported that 4.4 ZB data was created by 2013, and the digital universe will be growing 40% a year into the next decade [3].To meet the capacity demand, storage systems have grown to petabyte scale [4], and thousands of disks are deployed in storage systems.That caused a problem that cannot be ignored: high power consumption.
Many studies focused on power-saving of storage systems in recent years.The reason lies that higher power consumption leads to not only higher power costs in both storage systems and cooling systems, but also increasing operating temperature which can degrade the reliability and stability of the whole systems.Therefore, researchers proposed many kinds of strategies, such as DPM (Dynamic Power Management) algorithms, physical device level, and systems level [5].
DPM algorithm is proposed to turn disk devices into standby state to reduce power consumption during their idle period.However, DPM works only in the independent disk and does not work in RAID.At the physical device level, manufacturers are developing new energy efficient drives and hybrid drives.A hybrid drive combines NAND flash solidstate drive (SSD) with hard disk drive (HDD), with the intent of adding some of the speed of SSDs to the cost-effective storage capacity of traditional HDDs.The SSD in a hybrid drive acts as a cache for the data stored on the HDD by keeping copies of the most frequently used data on the SSD for improved overall performance and energy-saving.
At the system level, a number of integrated storage solutions such as MAID [6] and PERGAMUM [7] have emerged which are based on the general principle of transitioning the disks automatically to a low-power state (standby) after they experience some predetermined period of inactivity.PARAID [8] exploits the free space on the disks to duplicate data and uses a skewed striping pattern to adapt to the system load by varying the number of powered disks, thus needing no specialized hardware.eRAID [9] focuses on conventional disk-based mirrored disk array architectures like RAID 0. The power-saving effect is limited on parity redundant disk arrays like RAID 5. Hibernator [10] makes use of multispeed disk and abstracts the power saving problem into an optimization problem.It exploits the optimum solution in data migration between disks.EERAID [11] is another energy-efficient RAID system architecture which conserves energy by taking advantage of redundant information.S-RAID [12] is an alternative RAID data layout for the kind of application that exhibits a sequential data access pattern.The data layout of S-RAID uses a grouping strategy that makes only part of the whole array active and puts the rest of the array into standby mode.Even in the sequential data access storage system there are lots of random data accesses, and these degrade the performance of S-RAID dramatically.In our prework, we proposed an alternative RAID data layout based on S-RAID and HS-RAID [13], to avoid random data access effects and save the power consumption of the storage systems.HS-RAID is divided into two parts: RAID 1 and S-RAID 4/5.The first one is composed by SSDs for metadata storage, while the latter is composed by HDDs for data storage.HS-RAID is designed for the kind of applications that exhibit a sequential data access pattern, uses a grouping strategy that makes part of the whole array active, and puts the rest of the array into standby mode.Hence, HS-RAID can greatly reduce the power consumption and improve the reliability while still satisfying the I/O requirements of applications.
However, different RAID levels store data utilizing a variety of striping, mirroring, and parity techniques.RAID schemes based on parity improve the reliability of storage systems by managing a recovery parity disk.But parity calculations degrade performance especially performing smallwrite.The same problem also exists in HS-RAID whose parity schemes utilize RAID 4/5, and all writes in it are small writes.
This paper describes and evaluates a powerful parity algorithm, Data incremental parity algorithm (DIP), for eliminating the small write penalty in HS-RAID.DIP calculates parity data with the new data and the old parity and does not read the old data of the blocks which will be written in.
The remainder of this paper is organized as follows.Section 2 introduces the HS-RAID data layout.Following that, Section 3 gives detailed discussions of DIP algorithm and the write operation in HS-RAID 2 .Then, Section 4 details data recovery in HS-RAID 2 .The experimental results are presented in Section 5. Section 6 closes with a summary.

HS-RAID Data Layout
2.1.S-RAID.S-RAID [12] is an alternative RAID data layout for the kind of application that exhibits a sequential data access pattern.The data layout of S-RAID uses a grouping strategy that makes only part of the whole array active and puts the rest of the array into standby mode.Therefore, S-RAID can greatly reduce the power consumption and improve the reliability while still satisfying the I/O requirements of the application.S-RAID [12] trades data transfer rate for energy efficiency and reliability and is suitable for the applications like video surveillance which requires moderate data transfer rate but large storage capacity and high reliability.These applications also exhibit a highly sequential data access pattern that S-RAID is optimized for.
However, even in the sequential data access application, there exist lots of random data accesses which degrade the performance of S-RAID dramatically.In order to avoid the adverse effects of random data access, HS-RAID was proposed in our prework.[12] includes two parts: RAID 1 and S-RAID.RAID 1 is composed of two SSDs, and S-RAID [10] is composed of a group of hard disks, as shown in Figure 1.Hard  disks are grouped in S-RAID and parallel in the group.The random I/O requests are mapped to RAID 1, while sequential I/O requests are mapped to S-RAID.When only one group of hard disks are busy under continuous data access mode, the others can be shut down or put into standby mode for energy-saving because of no requests on them.That can save energy of the whole storage systems and increase the costs slightly.HS-RAID is designed for applications whose I/O characteristics are sequential access.RAID 1 which is composed of SSDs in HS-RAID is divided into three parts: two for storing superblock and one for metadata.The virtualization manager in Figure 1 maps the logical address to the physical address in the way shown in Figure 2. The addresses of two superblock partitions and S-RAID are identical logically, but the metadata partition is managed individually.The logical address 0 ∼  SSD /2 − 1 is mapped to the front of RAID 1, and it is 4 KB for superblock of the file system. SSD /2 ∼  SSD /2 +  S-RAID − 1 is mapped to S-RAID for data. SSD /2 +  S-RAID ∼  SSD +  S-RAID − 1 is mapped to the second 4 KB of RAID 1 for superblock too.In fact, data of both superblocks is consistent.

HS-RAID. HS-RAID
In the metadata partition, the data is duplication of metadata which is in S-RAID.The metadata writes are done only if it is written in both the S-RAID and the metadata partition.The metadata, which is written once, never modified and read frequently in most sequential storage systems, is located in the metadata partition in order to avoid spinning up disks in standby mode.

HS-RAID 4.
HS-RAID 4 is composed of a RAID 0 and a S-RAID 4 (shown as in Figure 3).Figure 3 gives an example of data layout of HS-RAID 4.There are two SSDs (SSD 0 and SSD 1 ) and five HDDs ( 0 ,  1 ,  2 ,  3 , and ) in the storage system.One HDD () is the parity disk, and the other four HDDs ( 0 ,  1 ,  2 , and  3 ) are divided into two groups ( 0 and  1 ) every two HDDs.The arrows in Figure 3 denote the data block sequence in the Logic Block Address (LBA) ascending order. , denotes a data block in RAID 1 and  , denotes a data block in S-RAID, where  and  denote the SSD or HDD number and the stripe number, respectively.

HS-RAID 5.
The fixed parity disk is the bottleneck of HS-RAID 4 because it not only degrades the performance but also reduces the system's reliability.Replacing S-RAID 4 with S-RAID 5 (shown as in Figure 4) in HS-RAID, we can get HS-RAID 5 which has uniformly distributed parity blocks among the disks.HS-RAID 5 is composed of a RAID 0 and a S-RAID 5.For simplicity, when we discuss HS-RAID 5 in the rest of this paper, it means the S-RAID 5 partition.Figure 4 gives an example of a part of HS-RAID 5: S-RAID 5. Like in HS-RAID 4, we also use a grouping strategy that further divides the stripes into vertical groups in HS-RAID 5.There is no fixed parity disk in HS-RAID 5, instead of that we put parity blocks into different disks in each vertical stripe.
The arrows in Figure 4 denote the data block sequence in the Logic Block Address (LBA) ascending order. , denotes a data block in S-RAID, where  and  denote the HDD number and the stripe number, respectively.
There are five HDDs ( 0 ,  1 ,  2 ,  3 , and  4 ) in HS-RAID 5 which is shown in Figure 4. Five disks are divided into 2 groups.Each group may include different disks in different stripe because the parity blocks locate in different disk.For example, as shown in Figure 4  in Stripe 1 , and Group 1 includes  2 and  3 in the same stripe.But in Stripe 2 , the parity block is transferred into  3 .Group 0 includes the same disks as in Stripe 1 , but Group 1 includes  2 and  4 .For the same reason, in Stripe 6 , Group 0 includes  0 and  2 , and Group 1 includes  3 and  4 .

HS-RAID 2 : HS-RAID with New Redundancy Strategy
Before going into the detail of redundancy strategy, it is necessary to look at the write operation of HS-RAID.Writing in the RAID 1 of HS-RAID is easy to understand, and it is not the focus of this paper, so it should be noted that when we mention the write operation in HS-RAID on the rest of the paper, it refers to the write operation in the S-RAID of HS-RAID.

Write Operation in HS-RAID. RAID 4/5 utilizes parity
techniques to store data to enhance storage reliability.Write operations in RAID 4/5 are done only if the data and parities have been written.The parity is calculated from the data in the same address of the disks in RAID using XOR operation: where  and   are the parity disk and data disks, respectively.When calculating the parity for RAID 4/5, the RAID controller selects a computation based on the write request size.The large-write parity calculation is where  large-write ,  new , and  remaining are the parity disk, the data on the disk(s) to be written, and the remaining disk(s), respectively.After computing the parity, the data and the new parity are written to the disks and the parity disk, respectively.The small-write parity calculation is where  small-write ,  new , and  old are the parity disk, the data on the disk(s) to be written, the data to be replaced on the same disk(s), respectively.After computing the parity, the data and the new parity are written to the disks and the parity disk, respectively.The main goal of HS-RAID is energy-saving by dividing disk into groups.Generally, only one group is active at the same time.In HS-RAID, all write operations are small write and select the computation "read-modify-write" to avoid spanning up other disks of groups that are put into standby mode.
Small-write parity in HS-RAID can be computed by XORing the old and new data with the old parity the same as in RAID.The parity calculation is When write requests are sent to HS-RAID, the corresponding disks are selected and made active if needed.Then, all selected blocks and parity blocks in the same stripes are read out.Lastly, HS-RAID recalculates the parity and writes the new data and the new parity back to disk."Read-modifywrite" brings write penalty severely and degrades the storage system's performance dramatically.

HS-RAID 2 : HS-RAID with DIP. HS-RAID is designed
for the kind of applications that exhibit a sequential data access pattern, such as video surveillance, continuous data protection (CDP), and virtual tape library (VTL).These systems have typical workload characteristics: write-once, read-maybe, and new writes unrelated to old writes.
We propose a new parity calculation algorithm applied to those applications workload characteristics: data incremental parity algorithm (DIP).In the rest of the paper, HS-RAID 2 is named for HS-RAID with DIP.
As Figure 5 shows, data in HS-RAID is always written in a new block in order.We set a pointer LBA as the last block that is written to.The initial value of  LBA is "−1".In HS-RAID 2 , the parity data is not calculated from all blocks in the stripe, but only blocks which are written to.
Figure 5: Sequential write in HS-RAID.
To maximize the benefit of DIP, write alignment is used: a buffer is set to collect data to be written large enough to write the whole stripe of a group.
Writing to the first group and the other groups in HS-RAID 2 is in different ways: 3.2.1.Write to the First Group.When writing to the disks in the first group, blocks and parity blocks are empty, so the parity calculation is where  new and   are the parity and all the data that will be written to the array, respectively.
In blocks form, it is easy to transform formula (11) into formula (12): where   and  , are the parity block and the data block in disks of the first group, respectively.Disk number  and stripe number  can locate the block exactly.And   is the amount of disks in the group.After calculating the parity, the data and parity in the array are written.We can safely say that it can minimize the write penalty since read operations are not needed.
Parity blocks need to be calculated while writing in the first group and the others, but they are different.

Write to the Other Groups.
When writing to the disks in the other groups, parity blocks can be computed by XORing the old parity and the new data.Its calculation is Suppose that write requests lie in group .We can easily transform formula (12) into the following form: Then, the data and parity to the array are written.Therefore, it can also minimize the write penalty, because old data does not need to be read while it does in RAID and HS-RAID.

Data Recovery in HS-RAID 2
Disk failure happens every day and even every hour in data centers.As with RAID 4/5, only one disk is allowed to fail in HS-RAID 2 .When a disk in the array fails, it must be fixed or replaced in time to avoid another disk failing that causes all the storage system failure.HS-RAID 2 not only reduces the workload of write operations and minimizes the write penalty but also reduces the workload during the data recovery.In this section, we will discuss how HS-RAID 2 works during the data recovery after the failure disk is fixed or replaced and investigate the benefit in reducing the failure rate.
After the failure disk is fixed or replaced, how HS-RAID 2 recovers data depends on which disk fails.Generally, there are three cases.
(1) The Empty Disk.Although empty data disk fails rarely, it does.In this case, we need to replace the failure disk with a new one and initialize it without data written in.It notes that the data recovered from the array must be written into the new disk even if it means nothing in RAID and HS-RAID, because the parity of these arrays is calculated with all disks in it.
(2) The Disk with Data.If the failure disk is the one that has data written in, the data must be recovered immediately.Stripe numbers and disk numbers of blocks that need to be recovered can be calculated with the pointer  LBA : where   and  are the amount of disks in a group and the active group number, respectively. can be calculated as where  is the amount of stripes in the array.   is the starting logical address of the group: Then, how to recover the data depends on whether the disk is in group  or not.their best work.When write size is larger than 64 KB, the transfer rate of both HS-RAID 2 and HS-RAID is improved dramatically.The peak value is 82.5 MB/s when write size is 1 MB in HS-RAID 2 while it is 52.6 MB/s in HS-RAID.When write size is larger than 1 MB, the speedup ratio is less than 60%, but the speedup is approximately 30 MB/s in value.
Figure 7 shows the transfer rate speedup of HS-RAID 2 with 2 disks per group compared to HS-RAID with the same disks per group.These are also the 100% sequential write workloads in the experiments in the 16 KB to 4096 KB range.It gets the maximum speedup, which is 506%, when the write size is 32 KB.When write size is larger than 64 KB, the transfer rate of both HS-RAID 2 and HS-RAID is improved  dramatically.The peak value is 132.5 MB/s when write size is 1 MB in HS-RAID 2 , while it is 80.2 MB/s in HS-RAID.
Figure 8 shows the transfer rate speedup of HS-RAID 2 with 3 disks per group compared to HS-RAID with the same disks per group.It gets the maximum speedup, which is 397%, when the write size is 32 KB.When write size is larger than 128 KB, the transfer rate of both HS-RAID 2 and HS-RAID is improved dramatically.The peak value is 197.1 MB/s when write size is 2 MB in HS-RAID 2 while it is 123.2MB/s in HS-RAID.
The smaller the write request size is, the slower the systems transfer rate is, and the larger the write request size is, the more improved the system performance is.When writing to HS-RAID 2 and HS-RAID with 3 disks per group in the size 16 KB, the transfer rate of HS-RAID 2 is 565% faster than HS-RAID.It is the max speedup factor.When writing to HS-RAID 2 and HS-RAID with 2 disks per group in the size 1 MB, the transfer rate of HS-RAID 2 is 52.3% faster than HS-RAID.It is the min.speedup factor.
Above all, it is obvious that the performance of HS-RAID 2 has been greatly improved compared to HS-RAID.

Conclusion
HS-RAID saves the power consumption of storage systems by dividing disks in the array into groups.All of the write operations in HS-RAID are small write in order to avoid spanning up disks in standby mode.Small write degrades the storage system's performance.
HS-RAID 2 has the same architecture as HS-RAID and employs different parity calculation algorithm: DIP.DIP minimizes the write penalty in HS-RAID and improves the performance and reliability of storage systems.The experimental results show that HS-RAID 2 is faster and has higher reliability than the traditional method.
(a) The Failure Disk Is in Group .Data blocks from stripe 0 to stripe  should be recovered if the failure disk is in group .Data is recovered by XORing the blocks from group 0 to