[Beowulf] Re: Purdue Supercomputer
Mark Hahn
hahn at mcmaster.ca
Sat May 10 17:28:03 PDT 2008
> clusters.What if you have 1 of the systems in the cluster down or any
> network failures.Can make our cluster(2-5 sytems only) work properly.
normally, the cluster's management software will monitor and deal with
node failure. at least that means noticing a failure and ensuring that the
node isn't used (until fixed) and dealing with any jobs that involved the
node. it's also fairly common for server nodes (not just slave/compute
nodes) to have some failover/high-availability features. (HA can also be
done for compute jobs, but IMHO it's not worth considering in normal cases,
ie, infrequent node failures.)
> Also what about geographically distant cluster systems.Say 1 in USA
sure, there's nothing about clusters that really assumes locality,
though obviously geographic distribution has effects on achievable
performance for wide-area MPI or distant file access. wide-area
clustering seems more of a political stunt to me (yes, including grids.)
> and other in India.How do we manage our cluster in mishaps or
> difficult conditions.
I find that with IPMI and console redirection, it's very rarely necessary to
care about where your nodes are, at least from a sysadmin perspective.
you need to ask what the benefit is, though, in a wide-area cluster
(versus seprate, local ones.) I wouldn't assume that management would
be easier, and obviously only gratuitously parallel apps (sometimes called
embarassinly parallel) could use it.
> lastly, how about having beowulf cluster systems in space.putting 1 pc
> on each planet or celestial body that we want to track and the server
> in india.
just because it could be done doesn't mean it makes sense...
> is linux the best choice in such cases...
your choice of OS depends primarily on your preference and experience.
More information about the Beowulf
mailing list