[Beowulf] Big storage

Tue Sep 11 08:41:22 PDT 2007

[Sorry for my late answer, I was away and then busy]

My previous mails were obviously too long, so I'll try to be more
concise.

According to Joe Landman:
[...]
>
> > But, with only two RAID controllers there is no way your machine will
> > survive a controller failure when using RAID-5 or RAID-6 unless you're
> > willing to half the available space (or add the optional extra
> > controllers).
>
> I don't understand this as you wrote in another message:
>
> > We did not loose any data due to the controller failure.
> >
> > The problem occured a few days before a scheduled downtime, the
> > mainboard was replaced during the downtime and the machine rebooted
> > just fine.
>
> So it seems what you ascribe to be a problem for JackRabbit is also a
> problem for x4500, although the pain of replacing a motherboard is
> somewhat higher than a PCIe card ...
>
We specifically use a ZFS configuration that is resistant to a single
controller failure.

With one controller failed (8 disks unavailable) no data is lost.
Of course the machine is then very fragile since a single disk failure
on another controller will lead to data loss.

I think Bruce's initial implied question was, "did you experience
another hardware failure on that machine before the repair that
ultimately led to data loss ?" The answer to that question is no.

My point regarding the two controllers in your machine, was that with
two controllers you can't have a configuration resistant to a single
controller failure unless you mirror the data (or add optional
controllers).

Replacing the mainboard in a X4500 is actually easier than replacing a
PCI-e card.
You can change the "control module" without taking the machine out of
its rack and there's no (internal) cable to unplug.

But in this case I happen to be plain wrong.  As I've been told by one
of my coworker in charge of the X4500 operations, the SATA controllers
of the X4500 are not on the mainboard but on the backplane.  Changing
the backplane requires more work than changing a PCI-e card.

> >
> > The density of the X4500 is also slightly better (48 disks in 4U
> > instead of 5U).
>
Sorry, you're right.

I was referring to density in terms of disk slot per rack unit but
forgot to mention it.

[...]
>
> > As of today we have 112 X4500, 112U are almost 3 racks which is quite
> > a lot due to our floor space constraints.
>
> Ok, I am not trying to convert you.  You like your Sun boxen, and that
> is great.
>
> I will do a little math.  BTW:  thats a fairly impressive size floor you
> have there.  112U of x4500 or 112 x4500?
>
We have 112 X4500 in 14 racks.  That's almost 2.7 PBytes raw, 1.9
PBytes usable space.

According to Sun, we are the largest X4500 user in the world.
We were already last year, since we had one machine more than the Tokyo
Institute of Technology (featured as an "X4500 success story" on Sun
website).

[my benchmark is larger than yours :-)]
>
> What I like are real application tests.  We don't see many (enough) of
> them.  I think I have seen one customer benchmark over the last 6 years
> that was both real (as in real operating code) that actually stressed an
>  IO system to any significant degree.
>
We stopped using IOzone for our tenders a few years ago and moved to a
"model based I/O benchmark" simulating applications I/O workloads.
It's similar to "filebench" from Sun (but simpler) and is used to
test more useful I/O workloads (for instance threads with different
concurrent workloads and a few things that "filebench" does not, like
accessing raw devices -- useful for disk procurements for our HSM or
Oracle cluster).

My pointless result was of course mostly due to cache, with 4 threads
each writing 1 Gbyte to 4 existing 2 GBytes files (one file per
thread).  The block size used was 128 kBytes, all (random) accesses are
block aligned, the value is the average aggregated throughput of all
threads for a 20 minutes run.

[...]
>
> Regardless of that, I do appreciate your comments with regards to the
> tests.  Maybe worth talking about this offline at some point (or if you
> will be at SC07).  My major concern with most tests are that they
> generate numbers that users (and vendors) simply report without a
> detailed and in-depth discussion and analysis.  This is I believe your
> criticism, and if you look through the benchmark report, you will see
> that a substantial fraction is explaining what you see and why you see
> what you see.
>
Indeed, that was my point.

Loïc.
-- 
| Loïc Tortay <tortay at cc.in2p3.fr> -     IN2P3 Computing Centre     |