[Beowulf] Software RAID?
Joe Landman
landman at scalableinformatics.com
Thu Nov 22 06:16:52 PST 2007
Vincent Diepeveen wrote:
> Wait a minute do i read that correctly,
>
> "you have to rescan the scsi bus".
>
> In short, first you spend really a lot of money to get SCSI
> drives in order to then get confronted with all the software raid issues.
No. SATA presents itself as SCSI in Linux.
[...]
> IMHO the interesting issue with raid is how to get a raid system where
> you can hotswap and which supports both raid10 as well as different
> types of raid5 (with one extra spare).
RAID10 is not that hard in software, the hotswap is harder. Cold-swap
*may* be possible (last I looked it should work, haven't tried it
recently on SATA). The issue is whether or not the driver drags the
kernel screaming and kicking into a kernel panic when a device is
removed ...
FWIW, I plugged in a laptop SATA drive into one of our Pegasus boxen
(many core workstation, think baby cluster), and without a reboot, or
any interaction on my part, it found the drive, and mounted the file
systems. It was scary, but I did wat to see what happens. It worked
w/o kernel panic.
The interrupt issue is painful, but the more cores you have, the more
pain you can stand there. The CSW is harder. They shouldn't be, but
they *seem* to be a point of serialization in the kernel. This is
annoying ... high CSW turn a very fast machine to a whimpering mass
quite quickly.
> Speaking of that, how to save power with raid when it's not currently
> streaming. is there hardware cards that let the drives idle when the raid
> array hardly gets used for i/o?
MAID or idle spin down. Our units do this.
> I'm about to investigate how to cheap build a huge raid array (with
> hotswap) for private purposes (chess EGTB generation and i guess i'll
> require a TB or 4+ for that and raid10 as the write load iys also very
> high during generation).
1TB is not huge. 2 x 1TB disks in a RAID1. 4TB is not huge. If the
data is important, RAID6 with hot spares. More expensive and a bit
slower on RW, but faster on rebuilds is a RAID10. Either way you can do
this in 7-9 drives easily. With the right motherboard (Supermicro
variant comes to mind), you can have 6 SATA and 8 SAS (remember SAS does
talk to / connect to SATA drives) you can up to 14 devices attached to
the MB. Couple this with a deskside/rack mount type case to handle this
many, and you should be fine.
>
> What solutions are there?
>
> Vincent
>
> On Wed, 21 Nov 2007, Joe Landman wrote:
>
>> Ekechi Nwokah wrote:
>>> Hi,
>>>
>>> Does anyone know of any software RAID solutions that come close to the
>>> performance of a commodity RAID card such as LSI/3ware/Areca for
>>> direct-attached drives?
>> For small numbers of drives, yes, the MD driver is superb with two
>> (well, really three) caveats.
>>
>> First: No hot swap. You can do a kind-of-cold swap (have to take the
>> mount offline, and can execute a few MD raw-disassemble, and then turn
>> the device off, swap, force Linux to rescan the scsi bus, mark the drive
>> as a hot spare, and force reassembly ... then remount). This may or may
>> not work, depending upon the linux driver for the SATA port. Some get
>> very unhappy if the drive goes away after it found it.
>>
>> Second (and third): Context switches (and interrupts) tend to quickly
>> swamp even fast systems with lots of processors. This is because the
>> SATA drivers on Linux, while good for basic SATA operations, may have a
>> few issues with multiple CSW needed for each transfer. You can drive a
>> fast system to become slow with a simple RAID0 across two drives. Run
>> bonnie++ on it (not IOzone, unless you want to measure memory cache).
>> Now imagine that system serving NFS requests. Additionally, the
>> interrupts driven by these hard IO operations also often drive the
>> system performance into the ground. We see 15-20k CSW and 20+k
>> interrupts under heavy load for a simple two drive RAID0 serving NFS
>> over gigabit.
>>
>> That is, it is not a bad idea, and it is possible to do it. But be
>> aware that you are going to need a fairly beefy machine (lots of RAM,
>> lots of cores) to handle the buffering and the interrupts. Can't help
>> much on the CSW's, you will just have to pay that price.
>>
>>> With the availability multi-core chips and SSE instruction sets, it
>>> would seem to me that this is doable. Would be nice to not have to pay
>>> for those RAID cards if I don't have to. Just wondering if anything
>>> already exists.
>> The extra you pay for those RAID cards buys you hot swap, and if you
>> choose carefully, reasonable RAID engines. They aren't perfect, their
>> small random IO performance on large files leaves something to be
>> desired (as do all RAID controllers from what I can see, unless you want
>> to buy Bluearc or other units)
>>
>> If you do choose to go the MD route, check out which SATA drivers are
>> well performing (low CSW/interrupts), and focus upon them. There are a
>> few out there.
>>
>> Joe
>>
>>> Thanks,
>>> Ekechi
>>>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org
>>> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>>
>> --
>> Joseph Landman, Ph.D
>> Founder and CEO
>> Scalable Informatics LLC,
>> email: landman at scalableinformatics.com
>> web : http://www.scalableinformatics.com
>> http://jackrabbit.scalableinformatics.com
>> phone: +1 734 786 8423
>> fax : +1 866 888 3112
>> cell : +1 734 612 4615
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org
>> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>>
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 866 888 3112
cell : +1 734 612 4615
More information about the Beowulf
mailing list