[Beowulf] cluster flexibility vs. design specificity, cluster vs. supercomputing

Tue Aug 8 06:15:46 PDT 2006

Jerry,

First, let's make you a cluster expert. The answer to
all your questions is "It all depends on the application"
That, in a real sense, is all you really need to know.
It implies testing, benchmarking and understanding
your application, which by the way
has always been important in supercomputing.

There are plenty of applications that run "good enough"
using all commodity hardware (i.e. available from multiple sources).
"Good enough" is defined by your needs and "price-to-performance".
Almost every person who buys a cluster has a budget. The issue
is to get the best performance within the budget.

For instance, a $30,000 GigE cluster runs my application in 1 hour.
If I use a high performance interconnect, it will
run finish in 45 minutes. The additional cost of the interconnect
is $20,000? Is is worth it? Sure if you have the money,
if you don't then what about buying less nodes and
a better interconnect to stay withing the $30K budget.
What if I run these at night, and the extra 15 minutes
per run is not big deal? What if my time to solution
is critical, then I need to find more money.
It all depends on the the application.

Now about clusters and "vendor entanglement" (an interesting
way to describe it, not to be confused with quantum entanglement)

Vendor entanglement can be a little fuzzy. (so some quantum effect may be
indeed be present) There are several ways clusters get "entangled"
(there may be more, anyone?)

1) "closed motherboard" with commodity CPUs, memory and disk,
   but the motherboards or other key hardware are not from
   the commodity market. Users must buy additional/replacement
   hardware/maintenance from the sole vendor.

2) "closed interconnect" where the networking hardware
   can only be purchased from a single vendor, but the nodes
   can be from a variety of vendors. Users must deal
   with single interconnect vendor in terms of drivers
   and support.

3) "closed plumbing" - the underlying software cannot be
   altered or examined by the user.

In Case 1, the vendor has quite a bit of control. This may
be important to the customer as they generally have "one throat
to choke" when things go wrong. These systems generally are expensive
because the a small HPC cluster market must pay for the engineering.

In Case 2, users have much more freedom and much of software to
drive the network hardware is open. These solutions are often
more expensive, but will be getting less expensive because the HPC
interconnect space is becoming comoditized" to 10 GigE and Infiniband.
(i.e. consider that Myricom is migrating toward high performance
10GigE and Pathscale was purchased by Qlogic, both of these moves
allow commodity switches and standards to be used).

Case 3, really has no relevance in Linux clusters. The plumbing
is open (http://www.clustermonkey.net//content/view/24/33/)

In my view, there are will be two basic approaches,
closed cluster hardware, and open cluster hardware.
The need for *standard* high performance commodity interconnects
will increase due to multi-core servers and thus will
reduce Case 2 above. Cost will come down as well.

In summary, there is no requirement that a fast cluster must
be "vendor entangled". My advice is get entangled with
an integrator/consultant that understands cluster
hardware and software so they can help you get the best
price-to-performance for your application(s).

 --
 Doug

>
> Hi, All:
>
>    I know lots experts here in this forum...so I am just keep posting and
> hope
> somebody can give a response :-)
>
>    Here is a non-technical question but question regarding a strategy to
> manage
> growing cluster(s), it is about cluster flexibility vs. design
> specificity.
>
>    Myself, prefer every cluster is specifically designed with "rooted"
> firmware
> and even hardware design, and benchmarked with specific application. That
> is my
> understanding of the cluster for supercomputing in scientific community.
>
>     But, in real life, "cluster" diverges from "supercomputing", in order
> to
> expand the cluster without being tangled by the product provider, and, in
> order
> to managing a evolving cluster, we have to allow certain/great flexibility
> for
> the cluster. in my mind, that is not really a supercomputing or "high
> performance computing" for computing efficiency ((e.g. possible conflicts
> of
> mixtures of hardwares within the cluster?) Or, I am just plain wrong.
>
>    Any expert can clarify the confusion?  How can we independently (not
> being
> tangled by specific vendor) expand and keep evolving the cluster but
> without
> sacrificing the computing efficiency? Or, I just worry too much? Normally,
> if we
> mix different hardwares together (like traditional beowulf cluster), how
> bad it
> could be to affect the specifically designed cluster?
>
> Thanks!!
>
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>

--
Doug