[Beowulf] What services do you run on your cluster nodes?
Joe Landman
landman at scalableinformatics.com
Tue Sep 23 05:27:44 PDT 2008
John Hearns wrote:
> That's a reason why I'm no great lover of Ganglia too - it just sprays
> multicast packets all over your network.
Yeah ... try to use a network in the middle of a multicast storm. As I
remember, every machine seeing a multicast packet has to at least
inspect the packet to see if this IP is being subscribed to. If so,
they have to deliver the contents to the multicast consumer.
> Which really should be OK - but if you have switches which don't perform
> well with multicast you get problems.
... or a crappy driver->TCP stack implementation on your local machine
or cluster (cough cough ... vendor's name elided to protect those who
really ought to be exposed)
I think the major problem that people racking and stacking boxes in an
effort to build a cluster make is that they just don't grasp how things
scale, or haven't run into the scaling issue due to lack of experience.
Large scale out machines/codes/runs often have surprising (and sometimes
banal) failure modes. We have customers whom have run into some rather
surprising (for them) scale-up problems, in large part due to the
software they are using not taking into account *big* data sets. In
some cases (with source code) we could help fix the app. In others (no
source code) we could fix the underlying hardware cause. Usually you
don't hear about these things until you get the phone call about "things
not working", and you have to walk the cat back to where the problem
originated.
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423 x121
fax : +1 866 888 3112
cell : +1 734 612 4615
More information about the Beowulf
mailing list