[Beowulf] IBRIX Experiences

Wally Edmondson Wally-Edmondson at utc.edu
Fri Jun 1 08:24:40 PDT 2007

On Thu, 10 May 2007, Ian Reynolds wrote:

 > Hey all -- we're considering IBRIX for a parallel storage cluster
 > solution with an EMC Clarion CX3-20 at the center, as well as a handful
 > of storage servers -- total of roughly 40 client servers, mix of 32 and
 > 64 bit OSs.
 > Can anyone offer their experiences with IBRIX, good or bad? We have
 > worked with gpfs extensively, so any comparisons would also be helpful.

It looks like you aren't getting many answers your question, Ian.  I'll quickly share 
my IBRIX experiences.  I have been running IBRIX since late 2004 on around 540 
diskless clients and 50 regular servers and workstations with 8 segment servers and a 
Fusion Manager connected to a DDN S2A 3000 couplet with 20TB of usable storage.  The 
storage is 1Gb FibreChannel to the Segment Servers and it's non-bonded GigE for 
everything else.

I'll start with the bad, I guess.  We had our share of problems with the 1.x version 
of the software in the early days.  I suppose all parallel filesystems with 600 
clients are going to hit bumps.  That's what CFS said back then, anyways.  Stability 
wasn't a problem, but occasionally a file wouldn't be readable and to fix it you had 
to copy the file, stuff like that.   This was no longer an issue beginning with 
version 2.0.  You have to get a new build of the software if you want to change 
kernels.  Their are two RPMS, one generic for the major kernel number and the other 
specific to your kernel containing some modules.  They only support RHEL/CENTOS and 
SLES as far as I know, and SLES was only recently added.  I asked about Ubuntu and 
they don't yet support it, which sucks because I would like to use it on some 
workstations.  Oh, and make sure that the segment servers can always see each other. 
  Use at least two links through different switches.  We had some bad switch ports 
that caused the segment servers to miss heartbeats.  This caused automatic failovers 
to segment servers that also couldn't be seen.  This is a disaster.  I thought it was 
IBRIX's fault the whole time.  Turned out to be intermittent switch port problems. 
It was avoidable with a little bit more planning and a better understanding of how 
the whole thing worked.  Redundancy is set up with buddies rather than globally, so 
you tell it that one server should watch some other server's back.  It works, but it 
could be a problem if a failing server's buddy is down or a server goes down while it 
owns a failed segment.  In either case, some percentage of your files won't be 
accessible until one of the servers is fixed.  It hasn't happened to me, but it is a 
possibility.  I can bring down four of my eight servers without a problem, for 
instance, but it needs to be the right four.  Servers have failed and it has never 
been a problem for me.  The running jobs never know the difference.

Support has been top-notch.  Last year, we had a catastrophic storage controller 
failure following a scheduled power outage, major corruption, the works.  A guy at 
IBRIX stayed with me all weekend on the phone and AIM.  He logged in and remotely 
restored all the files he could (tens of thousands).  Apparently he could have 
restored more if I had already been running 2.0 or higher.  They know their product 
very well.  I'm not sure if I am the right person to compare it to GPFS or Lustre 
since I looked into those products back in 2004 and haven't really researched them 
since.  My setup is simple, too, so I only use the basics.  The performance is fine, 
using nearly all of my GigE pipes.  With more segment servers and faster storage you 
could get some pretty amazing speeds.  I don't use the quotas or multiple interfaces. 
  Their GUI looks nice at first but you really don't need it because their 
command-line tools make sense and have excellent help output if you forget something. 
  Adding new clients is a breeze.  There is a Windows client now but I haven't used 
it.  I use CIFS exports and it works just fine.  I also use NFS exports for my few 
remaining Solaris clients.  Everything is very customizable and the documentation 
seems pretty thorough.  You can put any storage you like behind it, which is nice.  I 
think I could use USB keys if I felt like it.  I have been very please with IBRIX 
overall, especially since we upgraded out of 1.x land.  It's usually the last thing 
on my mind, so I guess that's a good thing.  That's all I have time for right now. 
Let me know if you have any specific questions.


More information about the Beowulf mailing list