[Beowulf] IBRIX Experiences
Wally Edmondson
Wally-Edmondson at utc.edu
Fri Jun 1 08:24:40 PDT 2007
On Thu, 10 May 2007, Ian Reynolds wrote:
> Hey all -- we're considering IBRIX for a parallel storage cluster
> solution with an EMC Clarion CX3-20 at the center, as well as a handful
> of storage servers -- total of roughly 40 client servers, mix of 32 and
> 64 bit OSs.
>
> Can anyone offer their experiences with IBRIX, good or bad? We have
> worked with gpfs extensively, so any comparisons would also be helpful.
It looks like you aren't getting many answers your question, Ian. I'll quickly share
my IBRIX experiences. I have been running IBRIX since late 2004 on around 540
diskless clients and 50 regular servers and workstations with 8 segment servers and a
Fusion Manager connected to a DDN S2A 3000 couplet with 20TB of usable storage. The
storage is 1Gb FibreChannel to the Segment Servers and it's non-bonded GigE for
everything else.
I'll start with the bad, I guess. We had our share of problems with the 1.x version
of the software in the early days. I suppose all parallel filesystems with 600
clients are going to hit bumps. That's what CFS said back then, anyways. Stability
wasn't a problem, but occasionally a file wouldn't be readable and to fix it you had
to copy the file, stuff like that. This was no longer an issue beginning with
version 2.0. You have to get a new build of the software if you want to change
kernels. Their are two RPMS, one generic for the major kernel number and the other
specific to your kernel containing some modules. They only support RHEL/CENTOS and
SLES as far as I know, and SLES was only recently added. I asked about Ubuntu and
they don't yet support it, which sucks because I would like to use it on some
workstations. Oh, and make sure that the segment servers can always see each other.
Use at least two links through different switches. We had some bad switch ports
that caused the segment servers to miss heartbeats. This caused automatic failovers
to segment servers that also couldn't be seen. This is a disaster. I thought it was
IBRIX's fault the whole time. Turned out to be intermittent switch port problems.
It was avoidable with a little bit more planning and a better understanding of how
the whole thing worked. Redundancy is set up with buddies rather than globally, so
you tell it that one server should watch some other server's back. It works, but it
could be a problem if a failing server's buddy is down or a server goes down while it
owns a failed segment. In either case, some percentage of your files won't be
accessible until one of the servers is fixed. It hasn't happened to me, but it is a
possibility. I can bring down four of my eight servers without a problem, for
instance, but it needs to be the right four. Servers have failed and it has never
been a problem for me. The running jobs never know the difference.
Support has been top-notch. Last year, we had a catastrophic storage controller
failure following a scheduled power outage, major corruption, the works. A guy at
IBRIX stayed with me all weekend on the phone and AIM. He logged in and remotely
restored all the files he could (tens of thousands). Apparently he could have
restored more if I had already been running 2.0 or higher. They know their product
very well. I'm not sure if I am the right person to compare it to GPFS or Lustre
since I looked into those products back in 2004 and haven't really researched them
since. My setup is simple, too, so I only use the basics. The performance is fine,
using nearly all of my GigE pipes. With more segment servers and faster storage you
could get some pretty amazing speeds. I don't use the quotas or multiple interfaces.
Their GUI looks nice at first but you really don't need it because their
command-line tools make sense and have excellent help output if you forget something.
Adding new clients is a breeze. There is a Windows client now but I haven't used
it. I use CIFS exports and it works just fine. I also use NFS exports for my few
remaining Solaris clients. Everything is very customizable and the documentation
seems pretty thorough. You can put any storage you like behind it, which is nice. I
think I could use USB keys if I felt like it. I have been very please with IBRIX
overall, especially since we upgraded out of 1.x land. It's usually the last thing
on my mind, so I guess that's a good thing. That's all I have time for right now.
Let me know if you have any specific questions.
Wally
More information about the Beowulf
mailing list