SW RAID 10 on 2 HDD's - Theoretical Discussion

A 5 star hangout for overworked and underpaid system admins.
Post Reply
siterack_net
Posts: 4
Joined: 2011/12/05 04:35:23
Contact:

SW RAID 10 on 2 HDD's - Theoretical Discussion

Post by siterack_net » 2011/12/05 06:47:54

SW RAID 10 on 2 HDD's Theoretical discussion.

The flooding, in Thailand, has been an unfortunate event; especially for the people of Thailand, but also the tech industry, across the globe. All of us, in the hosting industry, and other tech industries have felt the crunch of the increased price in disk based storage. These sudden price hikes have come at a time where my small hosting company was in the middle of a fairly large expansion project. We now find ourselves unable to afford the initial plan of expansion, and are forced to reconsider storage and server layouts.

So how does one maintain disk i/o performance, and data integrity, while keeping the costs low?
Typically, I would use 4 Seagate Constellation .2's in HW RAID 10 format. However, the cost is now too prohibitive. Furthermore, I could use the lower cost WD RE4's, but my server chassis' are the Supermicro CSE512's which only have room for 2 such devices.

Since the 4 Seagates are too expensive, and the server only has room for 2 3.5” drives, how does one go about maintaining I/O and data integrity?
I could use a HW RAID 0 – good I/O but risky.
I could use HW RAID 1 – good redundancy but poor I/O.
I could use SW RAID 1 – good redundancy, ok I/O as the read processes will “load balance”. However the write processes will act only as 1 drive. I am unclear on how (dmraid) handles load balancing on “fakeraids”.
I could even pack in 2 3.5”drives, and 2 SLC SSD 2.5” drives. I could run the Disk drives independantly and spread the partitions across the two (again no redundancy, but decent i/o), and raid 0 the the ssd's for extreme IOPS on MySQL applications, while creating a big enough partition from lower cost, smaller, SSD's

As you can see, there are options, but none are as good as RAID 10.

A couple nights ago, on a long sleepless night, I had a novel thought.
I dreamt up the idea of using a software(mdadm)RAID to create a stripe and a mirror on 2 HDD's.
The thought process went something like taking two physical drives and splitting them into 2 subpartitions, where more partitions could be added (LVM?), creating a total of 4 “main” partitions. Two of these partitions would stripe, while the other 2 would mirror each stripe.
It would look something like this:

D1 D2
---- ----
|S1| |S2|
|M2| |M1|
---- ----

The idea behind this is that S1 and S2 are the striped volumes, and M1 and M2 are the mirrored volumes. M1 mirrors S1 and M2 mirrors S2. This way, each physical drives contains the full data incase of a drive failure; and the mdadm software can failover to the single drive.

As it turns out, my novel idea wasn;t so novel, after all. Highpoint and DFI have already done something similar dubbed RAID 1.5; a kind of pseudo RAID 15, which is where they came up with the term 1.5. (http://www.tomshardware.com/reviews/raid-1,646.html). To stay true to existing naming conventions, I am going to call my “solution” RAID 1.0.


The difference between HPT/DFI's RAID 1.5, and my RAID 1.0, is that the HPT method uses a parity bit. It is also a proprietary system, and quite frankly, no longer available. My method would use no parity bits, and would remain a software based, open sourced solution. I have never been fond of parity based raids, and prefer raw replicable data.

In benchmarking, the raid 1.5 showed great promise in server based applications, but not much for an average user. RAID 1.5 performance benefits became more apparent as IOPS loads increased. I feel my RAID 1.0 idea could offer similar benefits, if properly prioritized and layed out. However, the RAID 1.5 had the benefit of a hardware (kinda) solution to assist with timings and processes between the raid volumes. This RAID 1.0 would rely solely on mdadm procedures to keep everything in sync. I am not aware of any such function within mdadm that would allow for such intricate prioritization and timing. That being said, how will mdadm allow the striped volumes to work efficiently, and allow the mirrored volumes to keep the data synced, without disrupting the I/O so severely, that the whole system runs like a single MFM drive, from the 80's?

It would seem there would need to be 2 mdadm processes running, and somehow bridged together to keep everything synced. You wouldn't want the mirrors copying the stripes, while the stripes are trying to read/write from other processes. Then again, you wouldn;t want the mirrors to act too slowly, as to allow for data corruption in the event of a drive faulure. You would want the system to be non interfering of the different RAID volumes, but still reliable enough to keep a system running in the event of a disk crash. Furthermore, to improve I/O efficiency, even greater, one might want to make the mirrors read/writeable and allow the stripes to mirror the mirrors. Any of this, in theory, seems possible. The question is, how would it work, how would the timing work, how would the syncs work? What are the unknowns? What have a I missed?

Here are some scenarios I've come up with:
System works beyond anyone's wildest dreams ;)
System shows beneficial increase in read/write I/O and works redundantly.
System shows beneficial increase in read I/O and works redundantly. (worhtless since linux sw raid already does this)
System shows no increase or decrease in I/O
System reduces I/O performance in all aspects
System causes the bus to overload, locking up the whole system and results in a crash.

So I invite the Linux community to join in a discussion of whether or not SW RAID 10 is possible, using only 2 hdd's.



-Chris Walker
Siterack.net

User avatar
jlehtone
Posts: 4523
Joined: 2007/12/11 08:17:33
Location: Finland

Re: SW RAID 10 on 2 HDD's - Theoretical Discussion

Post by jlehtone » 2011/12/05 09:54:01

You do know the RAID modes 10 and 01, don't you? One mirrors
striped volumes while the other stripes mirrored volumes.
In a way that is analogous.

You do propose two disks with two partitions in each.

Model 1:
A2 mirrors A1 and B2 mirrors B1. IO is striped over A and B.
However, each write goes to two locations in each physical disk.
Bad for IO and loss of either disk destroys half of the data.

Model 2:
B1 mirrors A1 and B2 mirrors A2. IO is striped over partition 1 and 2
of physical disk. While data is safe, it is surely better to do IO
to continuous blocks than to two parts of same physical disk.

Do you still feel lucky?


PS. This is not CentOS specific. We might end up in Social .

siterack_net
Posts: 4
Joined: 2011/12/05 04:35:23
Contact:

Re: SW RAID 10 on 2 HDD's - Theoretical Discussion

Post by siterack_net » 2011/12/06 05:18:58

model 2 is bassically what I was laying out.

D1, Disk 1, contains S1(stripe1) and M2(Mirror 2)
D2, Disk 2, contains S2(stripe2) and M1(mirror1)

Graphically, much like this:

D1 D2
---- ----
S1 S2
M2 M1
---- ----

S1 and S2 do all the raid 0 functions, while M2 makes a copy of S2, onto D1; and M1 makes a copy of S1 onto D2.
This keeps the data redundant.
The whole issue is how would it be prioritised to offer a decent level of redundancy, incase of a disk crash, while also maintaing an improved striped I/O?
An improved I/O, with significant enough gain, over current sw raid 1, to make this whole thing worthwhile?

gulikoza
Posts: 188
Joined: 2007/05/06 20:15:23

SW RAID 10 on 2 HDD's - Theoretical Discussion

Post by gulikoza » 2011/12/06 14:34:56

This can already be achieved with mdraid10 and far layout. Probably less overhead then stacking raids or setting up LVM...
I've done it a couple of times, read performance is great (probably equal to stripe), but write performance is somewhat less exciting since each disk has to write 2 chunks and seek half across the platters. It is certainly worth thinking about it where only 2 disks are available since it'll be faster than raid1 in sequential reads (remember that raid1 has chunks mirrored exactly on the same sport so if disk2 is doing parallel reads to disk1, disk1 has to seek and skip ahead of the chunks disk2 read).

Post Reply