[Beowulf] Re: building a RAID system: A long-delayed follow-on
canon at nersc.gov
Fri Aug 20 10:23:58 PDT 2004
I would strongly recommend 3ware. We have around 70 TB (maybe more) of
3ware based storage. Its been pretty reliable. The systems can do
dynamic sector repair and scan for bad sectors. In my view this is a
critical feature, since drives often develope isolated bad sectors.
This is something that doesn't appear to be supported in sw raid yet
(although I could be wrong). With the latest drivers, 3ware even works
great with smartmontools.
Also, I think you may find that while SW raid may perform better with a
low number of threads/clients, the hardware raid will do better with a
large number of clients. This has been our experience in the past,
although we haven't tested this with recent kernels.
Gerry Creager n5jxs wrote:
> I just found this note from last year's discussion. I've some
> follow-up. If you're not interested, I'll understand. Just hit
> <delete> and go on...
> We implemented a 1.6 TB RAID-5 system using HighPoint Technology
> controllers and Maxtor 200 GB parallel IDE drives. The performance
> wasn't what we expected, but some careful examination discovered that,
> just as chronicled below, the additional overhead, especially a
> complete 2nd round of buffering, was really slowing performance.
> OK, the next manufacturer up the proverbial foodchain was Promise.
> Got the hardware, better, but far less than stellar performance.
> Oh, and drivers were several kernel releases behind, and in some cases
> I considerd the kernel updates mandatory for security.
> We started looking at 3Ware, but work got in the way of the fun stuff.
> Also, a collaborator (co-conspirator is more accurate) at another
> institution had been doing similar work and suggested we look at
> software RAID. OK. It's quick to configure, we need the box back up,
> and it can't run any worse that the HighPoint stuff.
> Well, I'm still thinking I'd like to go with the 3Ware hardware, but
> that'll have to wait 'til we build the next 2 TB system... soon, real
> soon. And if it's slower than s/w RAID, I'll go back to that.
> Since we went to the s/w RAID-5 config we've seen 1 failure caused by
> stupid sysadmin tricks and an inadequate UPS when the campus went
> down. To confess completely, when RAID didn't come back up cleanly I
> attributed it to a missing entry in /etc/rc.d/rc.local... and
> technically, I was right. I did a raidstart and mounted the drive,
> without a cursory fsck. My bad. We got a "clean" mount, and went
> merrily ahead. To add to the confusion, I was doing all this from my
> laptop, at 70+mph (my wife was driving for most of this) using a
> Sprint 1xRTT connection, once we got into Minnesota. Iowa doesn't
> have Sprint coverage we could find, save for a 2-block stretch of Ames.
> About 3 days after "recovering" we started seeing a bunch of disk
> errors. By now, I was in _rural_ Wisconsin. We didn't have cellphone
> coverage of any sort at the inlaws, and on a good day, we got 26k
> dialup... throttling down to 9600 sometimes. I opted to drive into
> town and suck down coffee where I could get a 1xRTT connection...
> marginally acceptable. I took the array offline and started an 'fsck
> -a' which would run for hours with little to look at to indicate the
> system was even still responding... and then roll over for "too many
> errors" and a message to run without the '-a' option. 'fsck -y' was
> little better. We fought this for the rest of the vacation, whenever I
> had connectivity, and I never got the disk happy.
> Came home, immediately flew to DC and wrote a perl script on the plane
> to tell fsck in manual mode "yes, dammit" to all the 'do ya wanna fix
> this?' questions. Got into DC at 8pm, started the script, went to
> dinner. Came back script was still running and the screen was full of
> the Q&A. Went to bed. Got up, same thing. Went to the first day of
> meetings, and returned at 9pm. Still running. Another day of
> meetings, and back to the room. Still running but it completed while
> I was changing clothes before going to dinner.
> Overall, FSCK on a 1.7 TB machine appears to take about 96+/- hours to
> run when you've really abused it.
> I restarted the box, restarted RAID, remounted, manually started the
> LDM data collection system, and got on an airplane. By the time I was
> back in Texas, all the missing data from the 2-day odessy was replaced
> and the system was back up to speed.
> We're using this system to cache 30 days of all the Level II radar on
> it. I'll be doing some radar processing on a little 16-node dual
> opteron cluster (ob:cluster) to see about running some of the newer
> processing codes to better render the data. We'll also be extracting
> some of the data to initialize the MM5 and WRF models, once I figure
> out how to handle that.
> We'll still try 3Ware. I've got indications it's pretty good, from
> another guy. However, kudos to the kernel and RAID developers in
> Linux-land. They done good.
> pesch at attglobal.net wrote:
>> You write:
>> "The problem with offloading is, that while it made great sense in the
>> days of 1 MHz CPUs, it really doesn't make a noticable difference in the
>> load on your typical N GHz processor."
>> Did you have a maximum data storage size in mind? - or to put it
>> differently: at what data size do you see the
>> practical limit of SW RAID?
>> Jakob Oestergaard wrote:
>>> On Thu, Oct 09, 2003 at 08:50:17PM +0200, Daniel Fernandez wrote:
>>>> Hi again,
>>> Others have already answered your other questions, I'll try to take one
>>> that went unanswered (as far as I can see).
>>>> But must be noted that HW RAID offers better response time.
>>> In a HW RAID setup you *add* an extra layer: the dedicated CPU on the
>>> RAID card. Remember, this CPU also runs software - calling it
>>> 'hardware RAID' in itself is misleading, it could just as well be
>>> 'offloaded SW RAID'.
>>> The problem with offloading is, that while it made great sense in the
>>> days of 1 MHz CPUs, it really doesn't make a noticable difference in
>>> load on your typical N GHz processor.
>>> However, you added a layer with your offloaded-RAID. You added one
>>> CPU in the 'chain of command' - and an inferior CPU at that. That layer
>>> means latency even in the most expensive cards you can imagine (and
>>> bottleneck in cheap cards). No matter how you look at it, as long as
>>> the RAID code in the kernel is fairly simple and efficient (which it
>>> was, last I looked), then the extra layers needed to run the PCI
>>> commands thru the CPU and then to the actual IDE/SCSI controller *will*
>>> incur latency. And unless you pick a good controller, it may even be
>>> your bottleneck.
>>> Honestly I don't know how much latency is added - it's been years since
>>> I toyed with offload-RAID last ;)
>>> I don't mean to be handwaving and spreading FUD - I'm just trying to
>>> that the people who advocate SW RAID here are not necessarily smoking
>>> crack - there are very good reasons why SW RAID will outperform HW RAID
>>> in many scenarios.
>>>> HW raid offers hotswap capability and offload our work instead of
>>>> maintaining a SW raid solution ...we'll see ;)
>>> That, is probably the best reason I know of for choosing hardware RAID.
>>> And depending on who you will have administering your system, it can be
>>> a very important difference.
>>> There are certainly scenarios where you will be willing to trade a lot
>>> of performance for a blinking LED marking the failed disk - I am not
>>> : jakob at unthought.net : And I see the elder races, :
>>> :.........................: putrid forms of man :
>>> : Jakob Østergaard : See him rise and claim the earth, :
>>> : OZ9ABN : his downfall is at hand. :
>>> Beowulf mailing list, Beowulf at beowulf.org
>>> To change your subscription (digest mode or unsubscribe) visit
>> Beowulf mailing list, Beowulf at beowulf.org
>> To change your subscription (digest mode or unsubscribe) visit
More information about the Beowulf