[Beowulf] [EXTERNAL] Re: Deskside clusters

Prentice Bisbal pbisbal at pppl.gov
Wed Aug 25 18:51:44 UTC 2021


Not anymore, at least not in the HPC realm.  We recently purchased 
quad-socket systems with a total of 96 Intel cores/node, and dual socket 
systems with 128 AMD cores/node.

With Intel now marking their "highly scalable" (or something like that) 
line of processors, and AMD, who was always pushing highr core-counts, 
back in the game, I think numbers like that will be common in HPC 
clusters puchased in the next year or so.

But, yeah, I guess 28 physical cores is more than the average desktop 
has these days.


Prentice

On 8/24/21 6:42 PM, Jonathan Engwall wrote:
> EMC offers dual socket 28 physical core processors. That's a lot of 
> computer.
>
> On Tue, Aug 24, 2021, 1:33 PM Lux, Jim (US 7140) via Beowulf 
> <beowulf at beowulf.org <mailto:beowulf at beowulf.org>> wrote:
>
>     Yes, indeed.. I didn't call out Limulus, because it was mentioned
>     earlier in the thread.
>
>     And another reason why you might want your own.
>     Every so often, the notice from JPL's HPC goes out to the users -
>     "Halo/Gattaca/clustername will not be available because it is
>     reserved for Mars {Year}"  While Mars landings at JPL are a *big
>     deal*, not everyone is working on them (in fact, by that time,
>     most of the Martians are now working on something else), and you
>     want to get your work done.  I suspect other institutional
>     clusters have similar "the 800 pound (363 kg) gorilla has
>     requested" scenarios.
>
>
>     On 8/24/21, 11:34 AM, "Douglas Eadline" <deadline at eadline.org
>     <mailto:deadline at eadline.org>> wrote:
>
>
>         Jim,
>
>         You are describing a lot of the design pathway for Limulus
>         clusters. The local (non-data center) power, heat, noise are all
>         minimized while performance is maximized.
>
>         A well decked out system is often less than $10K and
>         are on par with a fat multi-core workstations.
>         (and there are reasons a clustered approach performs better)
>
>         Another use case is where there is no available research data
>     center
>         hardware because there is no specialized sysadmins/space/budget.
>         (Many smaller colleges and universities fall into this
>         group). Plus, often times, dropping something into a data center
>         means an additional cost to the researchers budget.
>
>         --
>         Doug
>
>
>         > I've been looking at "small scale" clusters for a long time
>     (2000?)  and
>         > talked a lot with the folks from Orion, as well as on this list.
>         > They fit in a "hard to market to" niche.
>         >
>         > My own workflow tends to have use cases that are a big
>     "off-nominal" - one
>         > is the rapid iteration of a computational model while
>     experimenting - That
>         > is, I have a python code that generates input to Numerical
>         > Electromagnetics Code (NEC), I run the model over a range of
>     parameters,
>         > then look at the output to see if I'm getting what what I
>     want. If not, I
>         > change the code (which essentially changes the antenna
>     design), rerun the
>         > models, and see if it worked.  I'd love an iteration time of
>     "a minute or
>         > two" for the computation, maybe a minute or two to plot the
>     outputs
>         > (fiddling with the plot ranges, etc.).  For reference, for a
>     radio
>         > astronomy array on the far side of the Moon, I was running
>     144 cases, each
>         > at 380 frequencies: to run 1 case takes 30 seconds, so
>     farming it out to
>         > 12 processors gave me a 6 minute run time, which is in the
>     right range.
>         > Another model of interaction of antnenas on a spacecraft
>     runs about 15
>         > seconds/case; and a third is about 120 seconds/case.
>         >
>         > To get "interactive development", then, I want the "cycle
>     time" to be 10
>         > minutes - 30 minutes of thinking about how to change the
>     design and
>         > altering the code to generate the new design, make a couple
>     test runs to
>         > find the equivalent of "syntax errors", and then turn it
>     loose - get a cup
>         > of coffee, answer a few emails, come back and see the
>     results.  I could
>         > iterate maybe a half dozen shots a day, which is pretty
>     productive.
>         > (Compared to straight up sequential - 144 runs at 30 seconds
>     is more than
>         > an hour - and that triggers a different working cadence that
>     devolves to
>         > sort of one shot a day) - The "10 minute" turnaround is also
>     compatible
>         > with my job, which, unfortunately, has things other than
>     computing -
>         > meetings, budgets, schedules.  At 10 minute runs, I can
>     carve out a few
>         > hours and get into that "flow state" on the technical
>     problem, before
>         > being disrupted by "a person from Porlock."
>         >
>         > So this is, I think, a classic example of  "I want local
>     control" - sure,
>         > you might have access to a 1000 or more node cluster, but
>     you're going to
>         > have to figure out how to use its batch management system
>     (SLURM and PBS
>         > are two I've used) - and that's a bit different than "self
>     managed 100%
>         > access". Or, AWS kinds of solutions for EP problems.
>      There's something
>         > very satisfying about getting an idea and not having to "ok,
>     now I have to
>         > log in to the remote cluster with TFA, set up the tunnel,
>     move my data,
>         > get the job spun up, get the data back" - especially for
>     iterative
>         > development.  I did do that using JPLs and TACCs clusters,
>     and "moving
>         > data" proved to be a barrier - the other thing was the
>     "iterative code
>         > development" in between runs - Most institutional clusters
>     discourage
>         > interactive development on the cluster (even if you're only
>     sucking up one
>         > core).   If the tools were a bit more "transparent" and
>     there were "shared
>         > disk" capabilities, this might be more attractive, and while
>     everyone is
>         > exceedingly helpful, there are still barriers to making it
>     "run it on my
>         > desktop"
>         >
>         > Another use case that I wind up designing for is the "HPC in
>     places
>         > without good communications and limited infrastructure" - 
>     The notional
>         > use case might be an archaeological expedition wanting to
>     use HPC to
>         > process ground penetrating radar data or something like
>     that.   (or, given
>         > that I work at JPL, you have a need for HPC on the surface
>     of Mars) - So
>         > sending your data to a remote cluster isn't really an
>     option.  And here,
>         > the "speedup" you need might well be a factor of 10-20 over
>     a single
>         > computer, something doable in a "portable" configuration
>     (check it as
>         > luggage, for instance). Just as for my antenna modeling
>     problems, turning
>         > an "overnight" computation into a "10-20 minute" computation
>     would change
>         > the workflow dramatically.
>         >
>         >
>         > Another market is "learn how to cluster" - for which the RPi
>     clusters work
>         > (or "packs" of Beagleboards) - they're fun, and in a classroom
>         > environment, I think they are an excellent cost effective
>     solution to
>         > learning all the facets of "bringing up a cluster from
>     scratch", but I'm
>         > not convinced they provide a good "MIPS/Watt" or
>     "MIPS/liter" metric - in
>         > terms of convenience.  That is, rather than a cluster of 10
>     RPis, you
>         > might be better off just buying a faster desktop machine.
>         >
>         > Let's talk design desirements/constraints
>         >
>         > I've had a chance to use some "clusters in a box" over the
>     last decades,
>         > and I'd suggest that while power is one constraint, another
>     is noise.
>         > Just the other day, I was in a lab and someone commented
>     that "those
>         > computers are amazingly fast, but you really need to put
>     them in another
>         > room". Yes, all those 1U and 2U rack mounted boxes with tiny
>     fans
>         > screaming is just not "office compatible"   And that kind of
>     brings up
>         > another interesting constraint for "deskside" computing -
>     heat.  Sure you
>         > can plug in 1500W of computers (or even 3000W if you have
>     two circuits),
>         > but can you live in your office with a 1500W space heater?
>         > Interestingly, for *my* workflow, that's probably ok - *my*
>     computation
>         > has a 10-30% duty cycle - think for 30 minutes, compute for
>     5-10.  But
>         > still, your office mate will appreciate if you keep the
>     sound level down
>         > to 50dBA.
>         >
>         > GPUs - some codes can use them, some can't.  They tend,
>     though, to be
>         > noisy (all that air flow for cooling). I don't know that GPU
>     manufacturers
>         > spend a lot of time on this.  Sure, I've seen charts and
>     specs that claim
>         > <50 dBA. But I think they're gaming the measurement,
>     counting on the user
>         > to be a gamer wearing headphones or with a big sound
>     system.  I will say,
>         > for instance, that the PS/4 positively roars when spun up
>     unless you’ve
>         > got external forced ventilation to keep the inlet air temp low.
>         >
>         > Looking at GSA guidelines for office space - if it's
>     "deskside" it's got
>         > to fit in the 50-80 square foot cubicle or your shared part
>     of a 120
>         > square foot office.
>         >
>         > Then one needs to figure out the "refresh cycle time" for
>     buying hardware
>         > - This has been a topic on this list forever - you have 2
>     years of
>         > computation to do: do you buy N nodes today at speed X, or
>     do you wait a
>         > year, buy N/2 nodes at speed 4X, and finish your computation
>     at the same
>         > time.
>         >
>         > Fancy desktop PCs with monitors, etc. come in at under $5k,
>     including
>         > burdens and installation, but not including monthly service
>     charges (in an
>         > institutional environment).  If you look at "purchase
>     limits" there's some
>         > thresholds (usually around $10k, then increasing in factors
>     of 10 or 100
>         > steps) for approvals.  So a $100k deskside box is going to
>     be a tough
>         > sell.
>         >
>         >
>         >
>         > On 8/24/21, 6:07 AM, "Beowulf on behalf of Douglas Eadline"
>         > <beowulf-bounces at beowulf.org
>     <mailto:beowulf-bounces at beowulf.org> on behalf of
>     deadline at eadline.org <mailto:deadline at eadline.org>> wrote:
>         >
>         >     Jonathan
>         >
>         >     It is a real cluster, available in 4 and 8 node versions.
>         >     The design if for non-data center use. That is, local
>         >     office, lab, home where power, cooling, and noise
>         >     are important. More info here:
>         >
>         >
>     https://urldefense.us/v3/__https://www.limulus-computing.com__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDpKU4fOA$
>     <https://urldefense.us/v3/__https://www.limulus-computing.com__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDpKU4fOA$>
>         >
>     https://urldefense.us/v3/__https://www.limulus-computing.com/Limulus-Manual__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XD7eWwVuM$
>     <https://urldefense.us/v3/__https://www.limulus-computing.com/Limulus-Manual__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XD7eWwVuM$>
>         >
>         >     --
>         >     Doug
>         >
>         >
>         >
>         >     > Hi Doug,
>         >     >
>         >     > Not to derail the discussion, but a quick question you
>     say desk
>         > side
>         >     > cluster is it a single machine that will run a vm cluster?
>         >     >
>         >     > Regards,
>         >     > Jonathan
>         >     >
>         >     > -----Original Message-----
>         >     > From: Beowulf <beowulf-bounces at beowulf.org
>     <mailto:beowulf-bounces at beowulf.org>> On Behalf Of Douglas
>         > Eadline
>         >     > Sent: 23 August 2021 23:12
>         >     > To: John Hearns <hearnsj at gmail.com
>     <mailto:hearnsj at gmail.com>>
>         >     > Cc: Beowulf Mailing List <beowulf at beowulf.org
>     <mailto:beowulf at beowulf.org>>
>         >     > Subject: Re: [Beowulf] List archives
>         >     >
>         >     > John,
>         >     >
>         >     > I think that was on twitter.
>         >     >
>         >     > In any case, I'm working with these processors right now.
>         >     >
>         >     > On the new Ryzens, the power usage is actually quite
>     tunable.
>         >     > There are three settings.
>         >     >
>         >     > 1) Package Power Tracking: The PPT threshold is the
>     allowed socket
>         > power
>         >     > consumption permitted across the voltage rails
>     supplying the
>         > socket.
>         >     >
>         >     > 2) Thermal Design Current: The maximum current (TDC)
>     (amps) that can
>         > be
>         >     > delivered by a specific motherboard's voltage regulator
>         > configuration in
>         >     > thermally-constrained scenarios.
>         >     >
>         >     > 3) Electrical Design Current: The maximum current
>     (EDC) (amps) that
>         > can be
>         >     > delivered by a specific motherboard's voltage regulator
>         > configuration in a
>         >     > peak ("spike") condition for a short period of time.
>         >     >
>         >     > My goal is to tweak the 105W TDP R7-5800X so it draws
>     power like
>         > the
>         >     > 65W-TDP R5-5600X
>         >     >
>         >     > This is desk-side cluster low power stuff.
>         >     > I am using extension cable-plug for Limulus blades
>     that have an
>         > in-line
>         >     > current meter (normally used for solar panels).
>         >     > Now I can load them up and watch exactly how much
>     current is being
>         > pulled
>         >     > across the 12V rails.
>         >     >
>         >     > If you need more info, let me know
>         >     >
>         >     > --
>         >     > Doug
>         >     >
>         >     >> The Beowulf list archives seem to end in July 2021.
>         >     >> I was looking for Doug Eadline's post on limiting AMD
>     power and
>         > the
>         >     >> results on performance.
>         >     >>
>         >     >> John H
>         >     >> _______________________________________________
>         >     >> Beowulf mailing list, Beowulf at beowulf.org
>     <mailto:Beowulf at beowulf.org> sponsored by Penguin
>         >     >> Computing To change your subscription (digest mode or
>     unsubscribe)
>         >     >> visit
>         >     >>
>     https://urldefense.us/v3/__https://link.edgepilot.com/s/9c656d83/pBaaRl2iV0OmLHAXqkoDZQ?u=https:*__;Lw!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDvUGSdHI$
>     <https://urldefense.us/v3/__https://link.edgepilot.com/s/9c656d83/pBaaRl2iV0OmLHAXqkoDZQ?u=https:*__;Lw!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDvUGSdHI$>
>         >     >> /beowulf.org/cgi-bin/mailman/listinfo/beowulf
>     <http://beowulf.org/cgi-bin/mailman/listinfo/beowulf>
>         >     >>
>         >     >
>         >     >
>         >     > --
>         >     > Doug
>         >     >
>         >     > _______________________________________________
>         >     > Beowulf mailing list, Beowulf at beowulf.org
>     <mailto:Beowulf at beowulf.org> sponsored by Penguin
>         > Computing
>         >     > To change your subscription (digest mode or
>     unsubscribe) visit
>         >     >
>     https://urldefense.us/v3/__https://link.edgepilot.com/s/9c656d83/pBaaRl2iV0OmLHAXqkoDZQ?u=https:**Abeowulf.org*cgi-bin*mailman*listinfo*beowulf__;Ly8vLy8v!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDUP8JZUc$
>     <https://urldefense.us/v3/__https://link.edgepilot.com/s/9c656d83/pBaaRl2iV0OmLHAXqkoDZQ?u=https:**Abeowulf.org*cgi-bin*mailman*listinfo*beowulf__;Ly8vLy8v!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDUP8JZUc$>
>         >     >
>         >
>         >
>         >     --
>         >     Doug
>         >
>         >     _______________________________________________
>         >     Beowulf mailing list, Beowulf at beowulf.org
>     <mailto:Beowulf at beowulf.org> sponsored by Penguin
>         > Computing
>         >     To change your subscription (digest mode or unsubscribe)
>     visit
>         >
>     https://urldefense.us/v3/__https://beowulf.org/cgi-bin/mailman/listinfo/beowulf__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDv6c1nNc$
>     <https://urldefense.us/v3/__https://beowulf.org/cgi-bin/mailman/listinfo/beowulf__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDv6c1nNc$>
>         >
>         >
>
>
>         --
>         Doug
>
>
>     _______________________________________________
>     Beowulf mailing list, Beowulf at beowulf.org
>     <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
>     To change your subscription (digest mode or unsubscribe) visit
>     https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>     <https://beowulf.org/cgi-bin/mailman/listinfo/beowulf>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20210825/e227e389/attachment-0001.htm>


More information about the Beowulf mailing list