[Beowulf] While the knives are out... Wulf Keepers

SIM DOG steve_heaton at iinet.net.au
Wed Aug 9 17:47:00 PDT 2006


Mark started it so while we're asking loaded questions... =)

I recently visited a large educational institution (that shall remain
nameless) that hosts an excellent, world class, science research team.
They also have a reasonably large Beowulf environment (over 100 dual nodes).

Now maybe it was just the people I was talking too (management) but I
get the distinct impression that they treat their 'Wulf as an
'appliance'. It came as a great disappointment :/

Reliability is their prime (only?) concern. Researchers are expected to
address any performance issues with their code. Well, yeah, OK... with
>their code< but what about the underlying infrastructure? Who keeps the
Wulf 'tool' nice and sharp? For this code does anyone *understand* how
to check the Wulf behavior and see if its helping or hindering?

I know from personal experience that one piece of code they run would be
expected to be CPU bound... but the interconnect plays a bigger part
that *I* expected (and I've spent a fair bit of time under the bonnet...
with still more things to explore). What about using a second NIC to off
load the admin traffic? How deeply have you looked into your compiler
switches? Looked into a FNN topology? These and other questions...

Obviously reality bites. Staff cost money. So does extra hardware. You
don't want to have a sick, flakey, Wulf that can't keep the customers
happy. However, knowing the size of their research staff and what
they're likely to get paid, I'd have thought a least one person with the
skills to keep the Wulf *near* the edge would be a good thing?

In fairness, maybe I got the wrong impression. Maybe I was just talking
to the wrong people. It was just the overall feel I got. The Wulf was
big and seemed fast and they were happy. I was just disappointed that
they didn't seem prepared to put a tad of TLC into their Wulf. Wouldn't
dropping a run from four days to three be worth investigating? [Sigh]

Is this typical of educational institutions? Am I missing some
consideration that would explain the apparent apathy? I'm prepared to
accept I'm just being naive.

However... If you're in Australia, run a Wulf that you like to see run
nice and sharp. Drop me a line... I'm looking for a job and you could be
my kinda place! :)

Stevo



More information about the Beowulf mailing list