[Beowulf] Software RAID?

Joe Landman landman at scalableinformatics.com
Thu Nov 22 06:16:52 PST 2007


Vincent Diepeveen wrote:
> Wait a minute do i read that correctly,
> 
> "you have to rescan the scsi bus".
> 
> In short, first you spend really a lot of money to get SCSI
> drives in order to then get confronted with all the software raid issues.

No.  SATA presents itself as SCSI in Linux.

[...]

> IMHO the interesting issue with raid is how to get a raid system where
> you can hotswap and which supports both raid10 as well as different
> types of raid5 (with one extra spare).

RAID10 is not that hard in software, the hotswap is harder.  Cold-swap 
*may* be possible (last I looked it should work, haven't tried it 
recently on SATA).  The issue is whether or not the driver drags the 
kernel screaming and kicking into a kernel panic when a device is 
removed ...

FWIW, I plugged in a laptop SATA drive into one of our Pegasus boxen 
(many core workstation, think baby cluster), and without a reboot, or 
any interaction on my part, it found the drive, and mounted the file 
systems.  It was scary, but I did wat to see what happens.  It worked 
w/o kernel panic.

The interrupt issue is painful, but the more cores you have, the more 
pain you can stand there.  The CSW is harder.  They shouldn't be, but 
they *seem* to be a point of serialization in the kernel.  This is 
annoying ... high CSW turn a very fast machine to a whimpering mass 
quite quickly.

> Speaking of that, how to save power with raid when it's not currently
> streaming. is there hardware cards that let the drives idle when the raid
> array hardly gets used for i/o?

MAID or idle spin down.  Our units do this.

> I'm about to investigate how to cheap build a huge raid array (with
> hotswap) for private purposes (chess EGTB generation and i guess i'll
> require a TB or 4+ for that and raid10 as the write load iys also very
> high during generation).

1TB is not huge.  2 x 1TB disks in a RAID1.  4TB is not huge.  If the 
data is important, RAID6 with hot spares.  More expensive and a bit 
slower on RW, but faster on rebuilds is a RAID10.  Either way you can do 
this in 7-9 drives easily.  With the right motherboard (Supermicro 
variant comes to mind), you can have 6 SATA and 8 SAS (remember SAS does 
talk to / connect to SATA drives) you can up to 14 devices attached to 
the MB.  Couple this with a deskside/rack mount type case to handle this 
many, and you should be fine.

> 
> What solutions are there?
> 
> Vincent
> 
> On Wed, 21 Nov 2007, Joe Landman wrote:
> 
>> Ekechi Nwokah wrote:
>>> Hi,
>>>
>>> Does anyone know of any software RAID solutions that come close to the
>>> performance of a commodity RAID card such as LSI/3ware/Areca for
>>> direct-attached drives?
>> For small numbers of drives, yes, the MD driver is superb with two
>> (well, really three) caveats.
>>
>> First:  No hot swap.  You can do a kind-of-cold swap (have to take the
>> mount offline, and can execute a few MD raw-disassemble, and then turn
>> the device off, swap, force Linux to rescan the scsi bus, mark the drive
>> as a hot spare, and force reassembly ... then remount).  This may or may
>> not work, depending upon the linux driver for the SATA port.  Some get
>> very unhappy if the drive goes away after it found it.
>>
>> Second (and third):  Context switches (and interrupts) tend to quickly
>> swamp even fast systems with lots of processors.  This is because the
>> SATA drivers on Linux, while good for basic SATA operations, may have a
>> few issues with multiple CSW needed for each transfer.  You can drive a
>> fast system to become slow with a simple RAID0 across two drives.  Run
>> bonnie++ on it (not IOzone, unless you want to measure memory cache).
>> Now imagine that system serving NFS requests.  Additionally, the
>> interrupts driven by these hard IO operations also often drive the
>> system performance into the ground.  We see 15-20k CSW and 20+k
>> interrupts under heavy load for a simple two drive RAID0 serving NFS
>> over gigabit.
>>
>> That is, it is not a bad idea, and it is possible to do it.  But be
>> aware that you are going to need a fairly beefy machine (lots of RAM,
>> lots of cores) to handle the buffering and the interrupts.  Can't help
>> much on the CSW's, you will just have to pay that price.
>>
>>> With the availability multi-core chips and SSE instruction sets, it
>>> would seem to me that this is doable. Would be nice to not have to pay
>>> for those RAID cards if I don't have to. Just wondering if anything
>>> already exists.
>> The extra you pay for those RAID cards buys you hot swap, and if you
>> choose carefully, reasonable RAID engines.  They aren't perfect, their
>> small random IO performance on large files leaves something to be
>> desired (as do all RAID controllers from what I can see, unless you want
>> to buy Bluearc or other units)
>>
>> If you do choose to go the MD route, check out which SATA drivers are
>> well performing (low CSW/interrupts), and focus upon them.  There are a
>> few out there.
>>
>> Joe
>>
>>> Thanks,
>>> Ekechi
>>>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org
>>> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>>
>> --
>> Joseph Landman, Ph.D
>> Founder and CEO
>> Scalable Informatics LLC,
>> email: landman at scalableinformatics.com
>> web  : http://www.scalableinformatics.com
>>         http://jackrabbit.scalableinformatics.com
>> phone: +1 734 786 8423
>> fax  : +1 866 888 3112
>> cell : +1 734 612 4615
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org
>> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>>


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615



More information about the Beowulf mailing list