[Beowulf] [EXTERNAL] Re: Deskside clusters
Prentice Bisbal
pbisbal at pppl.gov
Wed Aug 25 18:51:44 UTC 2021
Not anymore, at least not in the HPC realm. We recently purchased
quad-socket systems with a total of 96 Intel cores/node, and dual socket
systems with 128 AMD cores/node.
With Intel now marking their "highly scalable" (or something like that)
line of processors, and AMD, who was always pushing highr core-counts,
back in the game, I think numbers like that will be common in HPC
clusters puchased in the next year or so.
But, yeah, I guess 28 physical cores is more than the average desktop
has these days.
Prentice
On 8/24/21 6:42 PM, Jonathan Engwall wrote:
> EMC offers dual socket 28 physical core processors. That's a lot of
> computer.
>
> On Tue, Aug 24, 2021, 1:33 PM Lux, Jim (US 7140) via Beowulf
> <beowulf at beowulf.org <mailto:beowulf at beowulf.org>> wrote:
>
> Yes, indeed.. I didn't call out Limulus, because it was mentioned
> earlier in the thread.
>
> And another reason why you might want your own.
> Every so often, the notice from JPL's HPC goes out to the users -
> "Halo/Gattaca/clustername will not be available because it is
> reserved for Mars {Year}" While Mars landings at JPL are a *big
> deal*, not everyone is working on them (in fact, by that time,
> most of the Martians are now working on something else), and you
> want to get your work done. I suspect other institutional
> clusters have similar "the 800 pound (363 kg) gorilla has
> requested" scenarios.
>
>
> On 8/24/21, 11:34 AM, "Douglas Eadline" <deadline at eadline.org
> <mailto:deadline at eadline.org>> wrote:
>
>
> Jim,
>
> You are describing a lot of the design pathway for Limulus
> clusters. The local (non-data center) power, heat, noise are all
> minimized while performance is maximized.
>
> A well decked out system is often less than $10K and
> are on par with a fat multi-core workstations.
> (and there are reasons a clustered approach performs better)
>
> Another use case is where there is no available research data
> center
> hardware because there is no specialized sysadmins/space/budget.
> (Many smaller colleges and universities fall into this
> group). Plus, often times, dropping something into a data center
> means an additional cost to the researchers budget.
>
> --
> Doug
>
>
> > I've been looking at "small scale" clusters for a long time
> (2000?) and
> > talked a lot with the folks from Orion, as well as on this list.
> > They fit in a "hard to market to" niche.
> >
> > My own workflow tends to have use cases that are a big
> "off-nominal" - one
> > is the rapid iteration of a computational model while
> experimenting - That
> > is, I have a python code that generates input to Numerical
> > Electromagnetics Code (NEC), I run the model over a range of
> parameters,
> > then look at the output to see if I'm getting what what I
> want. If not, I
> > change the code (which essentially changes the antenna
> design), rerun the
> > models, and see if it worked. I'd love an iteration time of
> "a minute or
> > two" for the computation, maybe a minute or two to plot the
> outputs
> > (fiddling with the plot ranges, etc.). For reference, for a
> radio
> > astronomy array on the far side of the Moon, I was running
> 144 cases, each
> > at 380 frequencies: to run 1 case takes 30 seconds, so
> farming it out to
> > 12 processors gave me a 6 minute run time, which is in the
> right range.
> > Another model of interaction of antnenas on a spacecraft
> runs about 15
> > seconds/case; and a third is about 120 seconds/case.
> >
> > To get "interactive development", then, I want the "cycle
> time" to be 10
> > minutes - 30 minutes of thinking about how to change the
> design and
> > altering the code to generate the new design, make a couple
> test runs to
> > find the equivalent of "syntax errors", and then turn it
> loose - get a cup
> > of coffee, answer a few emails, come back and see the
> results. I could
> > iterate maybe a half dozen shots a day, which is pretty
> productive.
> > (Compared to straight up sequential - 144 runs at 30 seconds
> is more than
> > an hour - and that triggers a different working cadence that
> devolves to
> > sort of one shot a day) - The "10 minute" turnaround is also
> compatible
> > with my job, which, unfortunately, has things other than
> computing -
> > meetings, budgets, schedules. At 10 minute runs, I can
> carve out a few
> > hours and get into that "flow state" on the technical
> problem, before
> > being disrupted by "a person from Porlock."
> >
> > So this is, I think, a classic example of "I want local
> control" - sure,
> > you might have access to a 1000 or more node cluster, but
> you're going to
> > have to figure out how to use its batch management system
> (SLURM and PBS
> > are two I've used) - and that's a bit different than "self
> managed 100%
> > access". Or, AWS kinds of solutions for EP problems.
> There's something
> > very satisfying about getting an idea and not having to "ok,
> now I have to
> > log in to the remote cluster with TFA, set up the tunnel,
> move my data,
> > get the job spun up, get the data back" - especially for
> iterative
> > development. I did do that using JPLs and TACCs clusters,
> and "moving
> > data" proved to be a barrier - the other thing was the
> "iterative code
> > development" in between runs - Most institutional clusters
> discourage
> > interactive development on the cluster (even if you're only
> sucking up one
> > core). If the tools were a bit more "transparent" and
> there were "shared
> > disk" capabilities, this might be more attractive, and while
> everyone is
> > exceedingly helpful, there are still barriers to making it
> "run it on my
> > desktop"
> >
> > Another use case that I wind up designing for is the "HPC in
> places
> > without good communications and limited infrastructure" -
> The notional
> > use case might be an archaeological expedition wanting to
> use HPC to
> > process ground penetrating radar data or something like
> that. (or, given
> > that I work at JPL, you have a need for HPC on the surface
> of Mars) - So
> > sending your data to a remote cluster isn't really an
> option. And here,
> > the "speedup" you need might well be a factor of 10-20 over
> a single
> > computer, something doable in a "portable" configuration
> (check it as
> > luggage, for instance). Just as for my antenna modeling
> problems, turning
> > an "overnight" computation into a "10-20 minute" computation
> would change
> > the workflow dramatically.
> >
> >
> > Another market is "learn how to cluster" - for which the RPi
> clusters work
> > (or "packs" of Beagleboards) - they're fun, and in a classroom
> > environment, I think they are an excellent cost effective
> solution to
> > learning all the facets of "bringing up a cluster from
> scratch", but I'm
> > not convinced they provide a good "MIPS/Watt" or
> "MIPS/liter" metric - in
> > terms of convenience. That is, rather than a cluster of 10
> RPis, you
> > might be better off just buying a faster desktop machine.
> >
> > Let's talk design desirements/constraints
> >
> > I've had a chance to use some "clusters in a box" over the
> last decades,
> > and I'd suggest that while power is one constraint, another
> is noise.
> > Just the other day, I was in a lab and someone commented
> that "those
> > computers are amazingly fast, but you really need to put
> them in another
> > room". Yes, all those 1U and 2U rack mounted boxes with tiny
> fans
> > screaming is just not "office compatible" And that kind of
> brings up
> > another interesting constraint for "deskside" computing -
> heat. Sure you
> > can plug in 1500W of computers (or even 3000W if you have
> two circuits),
> > but can you live in your office with a 1500W space heater?
> > Interestingly, for *my* workflow, that's probably ok - *my*
> computation
> > has a 10-30% duty cycle - think for 30 minutes, compute for
> 5-10. But
> > still, your office mate will appreciate if you keep the
> sound level down
> > to 50dBA.
> >
> > GPUs - some codes can use them, some can't. They tend,
> though, to be
> > noisy (all that air flow for cooling). I don't know that GPU
> manufacturers
> > spend a lot of time on this. Sure, I've seen charts and
> specs that claim
> > <50 dBA. But I think they're gaming the measurement,
> counting on the user
> > to be a gamer wearing headphones or with a big sound
> system. I will say,
> > for instance, that the PS/4 positively roars when spun up
> unless you’ve
> > got external forced ventilation to keep the inlet air temp low.
> >
> > Looking at GSA guidelines for office space - if it's
> "deskside" it's got
> > to fit in the 50-80 square foot cubicle or your shared part
> of a 120
> > square foot office.
> >
> > Then one needs to figure out the "refresh cycle time" for
> buying hardware
> > - This has been a topic on this list forever - you have 2
> years of
> > computation to do: do you buy N nodes today at speed X, or
> do you wait a
> > year, buy N/2 nodes at speed 4X, and finish your computation
> at the same
> > time.
> >
> > Fancy desktop PCs with monitors, etc. come in at under $5k,
> including
> > burdens and installation, but not including monthly service
> charges (in an
> > institutional environment). If you look at "purchase
> limits" there's some
> > thresholds (usually around $10k, then increasing in factors
> of 10 or 100
> > steps) for approvals. So a $100k deskside box is going to
> be a tough
> > sell.
> >
> >
> >
> > On 8/24/21, 6:07 AM, "Beowulf on behalf of Douglas Eadline"
> > <beowulf-bounces at beowulf.org
> <mailto:beowulf-bounces at beowulf.org> on behalf of
> deadline at eadline.org <mailto:deadline at eadline.org>> wrote:
> >
> > Jonathan
> >
> > It is a real cluster, available in 4 and 8 node versions.
> > The design if for non-data center use. That is, local
> > office, lab, home where power, cooling, and noise
> > are important. More info here:
> >
> >
> https://urldefense.us/v3/__https://www.limulus-computing.com__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDpKU4fOA$
> <https://urldefense.us/v3/__https://www.limulus-computing.com__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDpKU4fOA$>
> >
> https://urldefense.us/v3/__https://www.limulus-computing.com/Limulus-Manual__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XD7eWwVuM$
> <https://urldefense.us/v3/__https://www.limulus-computing.com/Limulus-Manual__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XD7eWwVuM$>
> >
> > --
> > Doug
> >
> >
> >
> > > Hi Doug,
> > >
> > > Not to derail the discussion, but a quick question you
> say desk
> > side
> > > cluster is it a single machine that will run a vm cluster?
> > >
> > > Regards,
> > > Jonathan
> > >
> > > -----Original Message-----
> > > From: Beowulf <beowulf-bounces at beowulf.org
> <mailto:beowulf-bounces at beowulf.org>> On Behalf Of Douglas
> > Eadline
> > > Sent: 23 August 2021 23:12
> > > To: John Hearns <hearnsj at gmail.com
> <mailto:hearnsj at gmail.com>>
> > > Cc: Beowulf Mailing List <beowulf at beowulf.org
> <mailto:beowulf at beowulf.org>>
> > > Subject: Re: [Beowulf] List archives
> > >
> > > John,
> > >
> > > I think that was on twitter.
> > >
> > > In any case, I'm working with these processors right now.
> > >
> > > On the new Ryzens, the power usage is actually quite
> tunable.
> > > There are three settings.
> > >
> > > 1) Package Power Tracking: The PPT threshold is the
> allowed socket
> > power
> > > consumption permitted across the voltage rails
> supplying the
> > socket.
> > >
> > > 2) Thermal Design Current: The maximum current (TDC)
> (amps) that can
> > be
> > > delivered by a specific motherboard's voltage regulator
> > configuration in
> > > thermally-constrained scenarios.
> > >
> > > 3) Electrical Design Current: The maximum current
> (EDC) (amps) that
> > can be
> > > delivered by a specific motherboard's voltage regulator
> > configuration in a
> > > peak ("spike") condition for a short period of time.
> > >
> > > My goal is to tweak the 105W TDP R7-5800X so it draws
> power like
> > the
> > > 65W-TDP R5-5600X
> > >
> > > This is desk-side cluster low power stuff.
> > > I am using extension cable-plug for Limulus blades
> that have an
> > in-line
> > > current meter (normally used for solar panels).
> > > Now I can load them up and watch exactly how much
> current is being
> > pulled
> > > across the 12V rails.
> > >
> > > If you need more info, let me know
> > >
> > > --
> > > Doug
> > >
> > >> The Beowulf list archives seem to end in July 2021.
> > >> I was looking for Doug Eadline's post on limiting AMD
> power and
> > the
> > >> results on performance.
> > >>
> > >> John H
> > >> _______________________________________________
> > >> Beowulf mailing list, Beowulf at beowulf.org
> <mailto:Beowulf at beowulf.org> sponsored by Penguin
> > >> Computing To change your subscription (digest mode or
> unsubscribe)
> > >> visit
> > >>
> https://urldefense.us/v3/__https://link.edgepilot.com/s/9c656d83/pBaaRl2iV0OmLHAXqkoDZQ?u=https:*__;Lw!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDvUGSdHI$
> <https://urldefense.us/v3/__https://link.edgepilot.com/s/9c656d83/pBaaRl2iV0OmLHAXqkoDZQ?u=https:*__;Lw!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDvUGSdHI$>
> > >> /beowulf.org/cgi-bin/mailman/listinfo/beowulf
> <http://beowulf.org/cgi-bin/mailman/listinfo/beowulf>
> > >>
> > >
> > >
> > > --
> > > Doug
> > >
> > > _______________________________________________
> > > Beowulf mailing list, Beowulf at beowulf.org
> <mailto:Beowulf at beowulf.org> sponsored by Penguin
> > Computing
> > > To change your subscription (digest mode or
> unsubscribe) visit
> > >
> https://urldefense.us/v3/__https://link.edgepilot.com/s/9c656d83/pBaaRl2iV0OmLHAXqkoDZQ?u=https:**Abeowulf.org*cgi-bin*mailman*listinfo*beowulf__;Ly8vLy8v!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDUP8JZUc$
> <https://urldefense.us/v3/__https://link.edgepilot.com/s/9c656d83/pBaaRl2iV0OmLHAXqkoDZQ?u=https:**Abeowulf.org*cgi-bin*mailman*listinfo*beowulf__;Ly8vLy8v!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDUP8JZUc$>
> > >
> >
> >
> > --
> > Doug
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> <mailto:Beowulf at beowulf.org> sponsored by Penguin
> > Computing
> > To change your subscription (digest mode or unsubscribe)
> visit
> >
> https://urldefense.us/v3/__https://beowulf.org/cgi-bin/mailman/listinfo/beowulf__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDv6c1nNc$
> <https://urldefense.us/v3/__https://beowulf.org/cgi-bin/mailman/listinfo/beowulf__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDv6c1nNc$>
> >
> >
>
>
> --
> Doug
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> <https://beowulf.org/cgi-bin/mailman/listinfo/beowulf>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20210825/e227e389/attachment-0001.htm>
More information about the Beowulf
mailing list