[Beowulf] Big storage

Jerome, Ron Ron.Jerome at nrc-cnrc.gc.ca
Thu Apr 17 05:49:58 PDT 2008


For what it's worth, I have 4 of those Supermicro 16 drive chassis's
each having a single Areca 1160 card.  They have been running without
issue for about a year now (touch wood).  

I also just build a 48 drive box using an AIC chassis and 3 Areca 1261
cards, but that has not been put into service yet.

_________________________________________
Ron Jerome
National Research Council Canada
M-2, 1200 Montreal Road, Ottawa, Ontario K1A 0R6
Government of Canada
_________________________________________

> -----Original Message-----
> From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org]
> On Behalf Of Bruce Allen
> Sent: April 17, 2008 2:39 AM
> To: Gerry Creager
> Cc: beowulf at beowulf.org; Bruce Allen
> Subject: Re: [Beowulf] Big storage
> 
> Hi Gerry,
> 
> > Areca replacement; RAID rebuild (usually successful); backup; Areca
> > replacement with 3Ware controller or CoRAID (or JetStor) shelf;
> create
> > new RAID instance; restore from backup.
> >
> > Let's just say we lost confidence.
> 
> I understand.  Was this with 'current generation' controllers and
> firmware
> or was this two or three years ago?  It's my impression that (when
used
> with compatible drives and drive backplanes) the latest generation of
> Areca hardware is quite stable.
> 
> Cheers,
>       Bruce
> 
> 
> > Bruce Allen wrote:
> >> What was needed to fix the systems?  Reboot?  Hardware replacement?
> >>
> >> On Wed, 16 Apr 2008, Gerry Creager wrote:
> >>
> >>> We've had two fail rather randomly.  The failures did cause disk
> >>> corruption but it wasn't an undetected/undetectable sort.  They
> started
> >>> throwing errors to syslog, then fell over and stopped accessing
> disks.
> >>>
> >>> gerry
> >>>
> >>> Bruce Allen wrote:
> >>>> Hi Gerry,
> >>>>
> >>>> So far the only problem we have had is with one Areca card that
> had a bad
> >>>> 2GB memory module.  This generated lots of (correctable) single
> bit
> >>>> errors but eventually caused real problems.  Could you say
> something
> >>>> about the reliability issues you have seen?
> >>>>
> >>>> Cheers,
> >>>>     Bruce
> >>>>
> >>>>
> >>>> On Wed, 16 Apr 2008, Gerry Creager wrote:
> >>>>
> >>>>> We've used AoE (CoRAID hardware) with pretty good success
(modulo
> one
> >>>>> RAID shelf fire that was caused by a manufacturing defect and
> dealt with
> >>>>> promptly by CoRAID).  We've had some reliability issues with
> Areca cards
> >>>>> but no data corruption on the systems we've built that way.
> >>>>>
> >>>>> gerry
> >>>>>
> >>>>> Bruce Allen wrote:
> >>>>>> Hi Xavier,
> >>>>>>
> >>>>>>>>>> PPS: We've also been doing some experiments with putting
> >>>>>>>>>> OpenSolaris+ZFS on some of our generic (Supermicro + Areca)
> 16-disk
> >>>>>>>>>> RAID systems, which were originally intended to run Linux.
> >>>>>>
> >>>>>>>>>  I think that DESY proved some data corruption with such
> >>>>>>>>> configuration, so they switched to OpenSolaris+ZFS.
> >>>>>>
> >>>>>>>> I'm confused.  I am also talking about OpenSolaris+ZFS.  What
> did
> >>>>>>>> DESY try, and what did they switch to?
> >>>>>>
> >>>>>>> Sorry, I am indeed not clear. As far as I know, DESY found
data
> >>>>>>> corruption using Linux and Areca cards. They moved from linux
> to
> >>>>>>> OpenSolaris and ZFS, avoiding other corruption. This has been
> >>>>>>> discussed in HEPiX storage workgroup. However, I can not speak
> on
> >>>>>>> their behalf at all. I'll try to get you in touch with someone
> more
> >>>>>>> aware of this issue, as my statements lack of figures.
> >>>>>>
> >>>>>> I think that would be very interesting to the entire Beowulf
> mailing
> >>>>>> list, so please suggest that they respond to the entire group,
> not just
> >>>>>> to me personally.  Here is an LKML thread about silent data
> corruption:
> >>>>>> http://kerneltrap.org/mailarchive/linux-kernel/2007/9/10/191697
> >>>>>>
> >>>>>> So far we have not seen any signs of data corruption on
> Linux+Areca
> >>>>>> systems (and our data files carry both internal and external
> checksums,
> >>>>>> so we would be sensitive to this).
> >>>>>>
> >>>>>> Cheers,
> >>>>>>     Bruce
> >>>>>> _______________________________________________
> >>>>>> Beowulf mailing list, Beowulf at beowulf.org
> >>>>>> To change your subscription (digest mode or unsubscribe) visit
> >>>>>> http://www.beowulf.org/mailman/listinfo/beowulf
> >>>>>
> >>>>>
> >>>
> >>>
> >
> >
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf




More information about the Beowulf mailing list