[Beowulf] Heterogeneity in a tiny (two-system cluster)?

John Hearns hearnsj at googlemail.com
Fri Feb 16 00:39:06 PST 2018


Ted,
I would go for the more modern system. you say yourself the first system is
two years old. In one or two years it will be out of warranty, and if a
component breaks you will have to decide to buy that component or just junk
they system.


Actually, having said that you should look at the FVCOM model and see how
well it scales on a multi-core system.
Intel are increasign core counts, but not clock speeds. PAradoxically in
the past you used to be able to get dual-core parts at over 3Ghz, which
don;t have many cores competing for bandwith to RAM.
The counter example to this is Skylake which has more channels to RAM,
makign for a more balannced system.

I would go for a Skylake system, populate all the DIMM channels, and quite
honestly forget about runnign between two systems unless the size of your
models needs this.
Our latest Skylakes have 192Gbuytes of RAM for that reason. Int he last
generation this would sound like an unusual amount of RAM, but it makes
sense in the Skylake generation.









On 15 February 2018 at 14:20, Tad Slawecki <tslawecki at limno.com> wrote:

>
> Hello, list -
>
> We are at a point where we'd like to explore a tiny cluster of two systems
> to speed up execution of the FVCOM circulation model. We already have a
> two-year-old  system with two 14-core CPUs (Xeon E-2680), and I have budget
> to purchase another system at this point, which we plan to directly connect
> via Infiniband. Should I buy an exact match, or go with the most my budget
> can handle (for example 2xXeon Gold 1630, 16-cores) under the assumption
> that the two-system cluster will operate at about the same speed *and* I
> can reap the benefits of the added performance when running smaller
> simulations independently?
>
> Our list owner already provided some thoughts:
>
> > I've always preferred homgenous clusters, but what you say is,
> > I think, quite plausible.  The issue you will have though is
> > ensuring that the application is built for the earliest of the
> > architectures so you don't end up using instructions for a newer
> > CPU on the older one (which would result in illegal instruction
> > crashes).
> >
> > But there may be other gotchas that others think of!
>
> Thank you ...
>
> Tad
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20180216/e5a1253c/attachment.html>


More information about the Beowulf mailing list