[Beowulf] [EXTERNAL] Re: Deskside clusters

Jonathan Engwall engwalljonathanthereal at gmail.com
Tue Aug 24 22:42:47 UTC 2021


EMC offers dual socket 28 physical core processors. That's a lot of
computer.

On Tue, Aug 24, 2021, 1:33 PM Lux, Jim (US 7140) via Beowulf <
beowulf at beowulf.org> wrote:

> Yes, indeed.. I didn't call out Limulus, because it was mentioned earlier
> in the thread.
>
> And another reason why you might want your own.
> Every so often, the notice from JPL's HPC goes out to the users -
> "Halo/Gattaca/clustername will not be available because it is reserved for
> Mars {Year}"  While Mars landings at JPL are a *big deal*, not everyone is
> working on them (in fact, by that time, most of the Martians are now
> working on something else), and you want to get your work done.  I suspect
> other institutional clusters have similar "the 800 pound (363 kg) gorilla
> has requested" scenarios.
>
>
> On 8/24/21, 11:34 AM, "Douglas Eadline" <deadline at eadline.org> wrote:
>
>
>     Jim,
>
>     You are describing a lot of the design pathway for Limulus
>     clusters. The local (non-data center) power, heat, noise are all
>     minimized while performance is maximized.
>
>     A well decked out system is often less than $10K and
>     are on par with a fat multi-core workstations.
>     (and there are reasons a clustered approach performs better)
>
>     Another use case is where there is no available research data center
>     hardware because there is no specialized sysadmins/space/budget.
>     (Many smaller colleges and universities fall into this
>     group). Plus, often times, dropping something into a data center
>     means an additional cost to the researchers budget.
>
>     --
>     Doug
>
>
>     > I've been looking at "small scale" clusters for a long time (2000?)
> and
>     > talked a lot with the folks from Orion, as well as on this list.
>     > They fit in a "hard to market to" niche.
>     >
>     > My own workflow tends to have use cases that are a big "off-nominal"
> - one
>     > is the rapid iteration of a computational model while experimenting
> - That
>     > is, I have a python code that generates input to Numerical
>     > Electromagnetics Code (NEC), I run the model over a range of
> parameters,
>     > then look at the output to see if I'm getting what what I want. If
> not, I
>     > change the code (which essentially changes the antenna design),
> rerun the
>     > models, and see if it worked.  I'd love an iteration time of "a
> minute or
>     > two" for the computation, maybe a minute or two to plot the outputs
>     > (fiddling with the plot ranges, etc.).  For reference, for a radio
>     > astronomy array on the far side of the Moon, I was running 144
> cases, each
>     > at 380 frequencies: to run 1 case takes 30 seconds, so farming it
> out to
>     > 12 processors gave me a 6 minute run time, which is in the right
> range.
>     > Another model of interaction of antnenas on a spacecraft runs about
> 15
>     > seconds/case; and a third is about 120 seconds/case.
>     >
>     > To get "interactive development", then, I want the "cycle time" to
> be 10
>     > minutes - 30 minutes of thinking about how to change the design and
>     > altering the code to generate the new design, make a couple test
> runs to
>     > find the equivalent of "syntax errors", and then turn it loose - get
> a cup
>     > of coffee, answer a few emails, come back and see the results.  I
> could
>     > iterate maybe a half dozen shots a day, which is pretty productive.
>     > (Compared to straight up sequential - 144 runs at 30 seconds is more
> than
>     > an hour - and that triggers a different working cadence that
> devolves to
>     > sort of one shot a day) - The "10 minute" turnaround is also
> compatible
>     > with my job, which, unfortunately, has things other than computing -
>     > meetings, budgets, schedules.  At 10 minute runs, I can carve out a
> few
>     > hours and get into that "flow state" on the technical problem, before
>     > being disrupted by "a person from Porlock."
>     >
>     > So this is, I think, a classic example of  "I want local control" -
> sure,
>     > you might have access to a 1000 or more node cluster, but you're
> going to
>     > have to figure out how to use its batch management system (SLURM and
> PBS
>     > are two I've used) - and that's a bit different than "self managed
> 100%
>     > access". Or, AWS kinds of solutions for EP problems.   There's
> something
>     > very satisfying about getting an idea and not having to "ok, now I
> have to
>     > log in to the remote cluster with TFA, set up the tunnel, move my
> data,
>     > get the job spun up, get the data back" - especially for iterative
>     > development.  I did do that using JPLs and TACCs clusters, and
> "moving
>     > data" proved to be a barrier - the other thing was the "iterative
> code
>     > development" in between runs - Most institutional clusters discourage
>     > interactive development on the cluster (even if you're only sucking
> up one
>     > core).   If the tools were a bit more "transparent" and there were
> "shared
>     > disk" capabilities, this might be more attractive, and while
> everyone is
>     > exceedingly helpful, there are still barriers to making it "run it
> on my
>     > desktop"
>     >
>     > Another use case that I wind up designing for is the "HPC in places
>     > without good communications and limited infrastructure" -  The
> notional
>     > use case might be an archaeological expedition wanting to use HPC to
>     > process ground penetrating radar data or something like that.   (or,
> given
>     > that I work at JPL, you have a need for HPC on the surface of Mars)
> - So
>     > sending your data to a remote cluster isn't really an option.  And
> here,
>     > the "speedup" you need might well be a factor of 10-20 over a single
>     > computer, something doable in a "portable" configuration (check it as
>     > luggage, for instance). Just as for my antenna modeling problems,
> turning
>     > an "overnight" computation into a "10-20 minute"  computation would
> change
>     > the workflow dramatically.
>     >
>     >
>     > Another market is "learn how to cluster" - for which the RPi
> clusters work
>     > (or "packs" of Beagleboards) - they're fun, and in a classroom
>     > environment, I think they are an excellent cost effective solution to
>     > learning all the facets of "bringing up a cluster from scratch", but
> I'm
>     > not convinced they provide a good "MIPS/Watt" or "MIPS/liter" metric
> - in
>     > terms of convenience.  That is, rather than a cluster of 10 RPis, you
>     > might be better off just buying a faster desktop machine.
>     >
>     > Let's talk design desirements/constraints
>     >
>     > I've had a chance to use some "clusters in a box" over the last
> decades,
>     > and I'd suggest that while power is one constraint, another is noise.
>     > Just the other day, I was in a lab and someone commented that "those
>     > computers are amazingly fast, but you really need to put them in
> another
>     > room". Yes, all those 1U and 2U rack mounted boxes with tiny fans
>     > screaming is just not "office compatible"   And that kind of brings
> up
>     > another interesting constraint for "deskside" computing - heat.
> Sure you
>     > can plug in 1500W of computers (or even 3000W if you have two
> circuits),
>     > but can you live in your office with a 1500W space heater?
>     > Interestingly, for *my* workflow, that's probably ok - *my*
> computation
>     > has a 10-30% duty cycle - think for 30 minutes, compute for 5-10.
> But
>     > still, your office mate will appreciate if you keep the sound level
> down
>     > to 50dBA.
>     >
>     > GPUs - some codes can use them, some can't.  They tend, though, to be
>     > noisy (all that air flow for cooling). I don't know that GPU
> manufacturers
>     > spend a lot of time on this.  Sure, I've seen charts and specs that
> claim
>     > <50 dBA. But I think they're gaming the measurement, counting on the
> user
>     > to be a gamer wearing headphones or with a big sound system.  I will
> say,
>     > for instance, that the PS/4 positively roars when spun up unless
> you’ve
>     > got external forced ventilation to keep the inlet air temp low.
>     >
>     > Looking at GSA guidelines for office space - if it's "deskside" it's
> got
>     > to fit in the 50-80 square foot cubicle or your shared part of a 120
>     > square foot office.
>     >
>     > Then one needs to figure out the "refresh cycle time" for buying
> hardware
>     > - This has been a topic on this list forever - you have 2 years of
>     > computation to do: do you buy N nodes today at speed X, or do you
> wait a
>     > year, buy N/2 nodes at speed 4X, and finish your computation at the
> same
>     > time.
>     >
>     > Fancy desktop PCs with monitors, etc. come in at under $5k, including
>     > burdens and installation, but not including monthly service charges
> (in an
>     > institutional environment).  If you look at "purchase limits"
> there's some
>     > thresholds (usually around $10k, then increasing in factors of 10 or
> 100
>     > steps) for approvals.  So a $100k deskside box is going to be a tough
>     > sell.
>     >
>     >
>     >
>     > On 8/24/21, 6:07 AM, "Beowulf on behalf of Douglas Eadline"
>     > <beowulf-bounces at beowulf.org on behalf of deadline at eadline.org>
> wrote:
>     >
>     >     Jonathan
>     >
>     >     It is a real cluster, available in 4 and 8 node versions.
>     >     The design if for non-data center use. That is, local
>     >     office, lab, home where power, cooling, and noise
>     >     are important. More info here:
>     >
>     >
> https://urldefense.us/v3/__https://www.limulus-computing.com__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDpKU4fOA$
>     >
> https://urldefense.us/v3/__https://www.limulus-computing.com/Limulus-Manual__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XD7eWwVuM$
>     >
>     >     --
>     >     Doug
>     >
>     >
>     >
>     >     > Hi Doug,
>     >     >
>     >     > Not to derail the discussion, but a quick question you say desk
>     > side
>     >     > cluster is it a single machine that will run a vm cluster?
>     >     >
>     >     > Regards,
>     >     > Jonathan
>     >     >
>     >     > -----Original Message-----
>     >     > From: Beowulf <beowulf-bounces at beowulf.org> On Behalf Of
> Douglas
>     > Eadline
>     >     > Sent: 23 August 2021 23:12
>     >     > To: John Hearns <hearnsj at gmail.com>
>     >     > Cc: Beowulf Mailing List <beowulf at beowulf.org>
>     >     > Subject: Re: [Beowulf] List archives
>     >     >
>     >     > John,
>     >     >
>     >     > I think that was on twitter.
>     >     >
>     >     > In any case, I'm working with these processors right now.
>     >     >
>     >     > On the new Ryzens, the power usage is actually quite tunable.
>     >     > There are three settings.
>     >     >
>     >     > 1) Package Power Tracking: The PPT threshold is the allowed
> socket
>     > power
>     >     > consumption permitted across the voltage rails supplying the
>     > socket.
>     >     >
>     >     > 2) Thermal Design Current: The maximum current (TDC) (amps)
> that can
>     > be
>     >     > delivered by a specific motherboard's voltage regulator
>     > configuration in
>     >     > thermally-constrained scenarios.
>     >     >
>     >     > 3) Electrical Design Current: The maximum current (EDC) (amps)
> that
>     > can be
>     >     > delivered by a specific motherboard's voltage regulator
>     > configuration in a
>     >     > peak ("spike") condition for a short period of time.
>     >     >
>     >     > My goal is to tweak the 105W TDP R7-5800X so it draws power
> like
>     > the
>     >     > 65W-TDP R5-5600X
>     >     >
>     >     > This is desk-side cluster low power stuff.
>     >     > I am using extension cable-plug for Limulus blades that have an
>     > in-line
>     >     > current meter (normally used for solar panels).
>     >     > Now I can load them up and watch exactly how much current is
> being
>     > pulled
>     >     > across the 12V rails.
>     >     >
>     >     > If you need more info, let me know
>     >     >
>     >     > --
>     >     > Doug
>     >     >
>     >     >> The Beowulf list archives seem to end in July 2021.
>     >     >> I was looking for Doug Eadline's post on limiting AMD power
> and
>     > the
>     >     >> results on performance.
>     >     >>
>     >     >> John H
>     >     >> _______________________________________________
>     >     >> Beowulf mailing list, Beowulf at beowulf.org sponsored by
> Penguin
>     >     >> Computing To change your subscription (digest mode or
> unsubscribe)
>     >     >> visit
>     >     >>
> https://urldefense.us/v3/__https://link.edgepilot.com/s/9c656d83/pBaaRl2iV0OmLHAXqkoDZQ?u=https:*__;Lw!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDvUGSdHI$
>     >     >> /beowulf.org/cgi-bin/mailman/listinfo/beowulf
>     >     >>
>     >     >
>     >     >
>     >     > --
>     >     > Doug
>     >     >
>     >     > _______________________________________________
>     >     > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>     > Computing
>     >     > To change your subscription (digest mode or unsubscribe) visit
>     >     >
> https://urldefense.us/v3/__https://link.edgepilot.com/s/9c656d83/pBaaRl2iV0OmLHAXqkoDZQ?u=https:**Abeowulf.org*cgi-bin*mailman*listinfo*beowulf__;Ly8vLy8v!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDUP8JZUc$
>     >     >
>     >
>     >
>     >     --
>     >     Doug
>     >
>     >     _______________________________________________
>     >     Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>     > Computing
>     >     To change your subscription (digest mode or unsubscribe) visit
>     >
> https://urldefense.us/v3/__https://beowulf.org/cgi-bin/mailman/listinfo/beowulf__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDv6c1nNc$
>     >
>     >
>
>
>     --
>     Doug
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20210824/c829dd5b/attachment-0001.htm>


More information about the Beowulf mailing list