[Beowulf] [EXTERNAL] Re: Deskside clusters

Prentice Bisbal pbisbal at pppl.gov
Fri Sep 17 17:51:09 UTC 2021


We ran a number of apps on evaluation systems before determining the 
96-core Intel systems provided the best results. I was responsible for 
running HPL and HPCG benchmarks. We had researcher run their various 
simulation codes.

I agree withjust about everything you said, especially (1) - I think 
turbo frequencies are  irrelevant for HPC, since all the cores will 
typically be pinned during an HPC job. When I calculated the theoretical 
FLOPS for the evaluation systems, I had to look at the AVX512 
frequencies for the Intel processors, since that is yet another 
operating frequency. To Intel's credit, the different CPU frequency 
information and when those frequencies would be invoked, was available 
online for all but their newest processors (probably just hadn't been 
published yet), whereas I couldn't find any frquency stepping 
information for the AMDs.

I often point out to people that clock frequencies have been decreasing 
as core counts go up. I remember in the early 2000s that CPUs with a 
Clock of ~3.5 GHz were pretty common. Now it seems most processors have 
a baseclock below 3 GHz, and only go above that in "turbo mode".

Where I disagree with you is (3). Whether or not cache size is important 
depends on the size of the job. If your iterating through data-parallel 
loops over a large dataset that exceeds cache size, the opportunity to 
reread cached data is probably limited or nonexistent. As we often say 
here, "it depends". I'm sore someone with better low-level hardware 
knowledge will pipe in and tell me why I'm wrong (Cunningham's Law).

Prentice

On 9/14/21 1:34 PM, Douglas Eadline wrote:
> Here are the questions I am curious about. High core count is great,
> if it works for your performance goals.
>
> 1. Clock speed: All the turbo stuff is great for a low
> number of processes, but if you load all the cores, then you are
> now running at base clock speeds, which due to the large number
> of cores and the thermal envelope is often not that fast.
>
> 2. Memory BW: Take the number of memory channels multiply by
> the memory speed BW and divide by the number of cores. That
> is of course a worst case BW/core, however, memory hungry apps
> may not be able to use all the cores.
>
> 3. Cache Size: same idea as Memory, why do you think
> things like AMD 3D V-Cache will be landing very soon.
>
> IMO fat-core processors are designed and work best
> for cloud applications; shared use, bursty applications,
> containers. HPC tends to light everything up at once for
> long periods of time.
>
> --
> Doug
>
>
>
>> Not anymore, at least not in the HPC realm.  We recently purchased
>> quad-socket systems with a total of 96 Intel cores/node, and dual socket
>> systems with 128 AMD cores/node.
>>
>> With Intel now marking their "highly scalable" (or something like that)
>> line of processors, and AMD, who was always pushing highr core-counts,
>> back in the game, I think numbers like that will be common in HPC
>> clusters puchased in the next year or so.
>>
>> But, yeah, I guess 28 physical cores is more than the average desktop
>> has these days.
>>
>>
>> Prentice
>>
>> On 8/24/21 6:42 PM, Jonathan Engwall wrote:
>>> EMC offers dual socket 28 physical core processors. That's a lot of
>>> computer.
>>>
>>> On Tue, Aug 24, 2021, 1:33 PM Lux, Jim (US 7140) via Beowulf
>>> <beowulf at beowulf.org <mailto:beowulf at beowulf.org>> wrote:
>>>
>>>      Yes, indeed.. I didn't call out Limulus, because it was mentioned
>>>      earlier in the thread.
>>>
>>>      And another reason why you might want your own.
>>>      Every so often, the notice from JPL's HPC goes out to the users -
>>>      "Halo/Gattaca/clustername will not be available because it is
>>>      reserved for Mars {Year}"  While Mars landings at JPL are a *big
>>>      deal*, not everyone is working on them (in fact, by that time,
>>>      most of the Martians are now working on something else), and you
>>>      want to get your work done.  I suspect other institutional
>>>      clusters have similar "the 800 pound (363 kg) gorilla has
>>>      requested" scenarios.
>>>
>>>
>>>      On 8/24/21, 11:34 AM, "Douglas Eadline" <deadline at eadline.org
>>>      <mailto:deadline at eadline.org>> wrote:
>>>
>>>
>>>          Jim,
>>>
>>>          You are describing a lot of the design pathway for Limulus
>>>          clusters. The local (non-data center) power, heat, noise are
>>> all
>>>          minimized while performance is maximized.
>>>
>>>          A well decked out system is often less than $10K and
>>>          are on par with a fat multi-core workstations.
>>>          (and there are reasons a clustered approach performs better)
>>>
>>>          Another use case is where there is no available research data
>>>      center
>>>          hardware because there is no specialized
>>> sysadmins/space/budget.
>>>          (Many smaller colleges and universities fall into this
>>>          group). Plus, often times, dropping something into a data
>>> center
>>>          means an additional cost to the researchers budget.
>>>
>>>          --
>>>          Doug
>>>
>>>
>>>          > I've been looking at "small scale" clusters for a long time
>>>      (2000?)  and
>>>          > talked a lot with the folks from Orion, as well as on this
>>> list.
>>>          > They fit in a "hard to market to" niche.
>>>          >
>>>          > My own workflow tends to have use cases that are a big
>>>      "off-nominal" - one
>>>          > is the rapid iteration of a computational model while
>>>      experimenting - That
>>>          > is, I have a python code that generates input to Numerical
>>>          > Electromagnetics Code (NEC), I run the model over a range of
>>>      parameters,
>>>          > then look at the output to see if I'm getting what what I
>>>      want. If not, I
>>>          > change the code (which essentially changes the antenna
>>>      design), rerun the
>>>          > models, and see if it worked.  I'd love an iteration time
>>> of
>>>      "a minute or
>>>          > two" for the computation, maybe a minute or two to plot the
>>>      outputs
>>>          > (fiddling with the plot ranges, etc.).  For reference, for
>>> a
>>>      radio
>>>          > astronomy array on the far side of the Moon, I was running
>>>      144 cases, each
>>>          > at 380 frequencies: to run 1 case takes 30 seconds, so
>>>      farming it out to
>>>          > 12 processors gave me a 6 minute run time, which is in the
>>>      right range.
>>>          > Another model of interaction of antnenas on a spacecraft
>>>      runs about 15
>>>          > seconds/case; and a third is about 120 seconds/case.
>>>          >
>>>          > To get "interactive development", then, I want the "cycle
>>>      time" to be 10
>>>          > minutes - 30 minutes of thinking about how to change the
>>>      design and
>>>          > altering the code to generate the new design, make a couple
>>>      test runs to
>>>          > find the equivalent of "syntax errors", and then turn it
>>>      loose - get a cup
>>>          > of coffee, answer a few emails, come back and see the
>>>      results.  I could
>>>          > iterate maybe a half dozen shots a day, which is pretty
>>>      productive.
>>>          > (Compared to straight up sequential - 144 runs at 30 seconds
>>>      is more than
>>>          > an hour - and that triggers a different working cadence that
>>>      devolves to
>>>          > sort of one shot a day) - The "10 minute" turnaround is also
>>>      compatible
>>>          > with my job, which, unfortunately, has things other than
>>>      computing -
>>>          > meetings, budgets, schedules.  At 10 minute runs, I can
>>>      carve out a few
>>>          > hours and get into that "flow state" on the technical
>>>      problem, before
>>>          > being disrupted by "a person from Porlock."
>>>          >
>>>          > So this is, I think, a classic example of  "I want local
>>>      control" - sure,
>>>          > you might have access to a 1000 or more node cluster, but
>>>      you're going to
>>>          > have to figure out how to use its batch management system
>>>      (SLURM and PBS
>>>          > are two I've used) - and that's a bit different than "self
>>>      managed 100%
>>>          > access". Or, AWS kinds of solutions for EP problems.
>>>       There's something
>>>          > very satisfying about getting an idea and not having to "ok,
>>>      now I have to
>>>          > log in to the remote cluster with TFA, set up the tunnel,
>>>      move my data,
>>>          > get the job spun up, get the data back" - especially for
>>>      iterative
>>>          > development.  I did do that using JPLs and TACCs clusters,
>>>      and "moving
>>>          > data" proved to be a barrier - the other thing was the
>>>      "iterative code
>>>          > development" in between runs - Most institutional clusters
>>>      discourage
>>>          > interactive development on the cluster (even if you're only
>>>      sucking up one
>>>          > core).   If the tools were a bit more "transparent" and
>>>      there were "shared
>>>          > disk" capabilities, this might be more attractive, and while
>>>      everyone is
>>>          > exceedingly helpful, there are still barriers to making it
>>>      "run it on my
>>>          > desktop"
>>>          >
>>>          > Another use case that I wind up designing for is the "HPC in
>>>      places
>>>          > without good communications and limited infrastructure" -Â
>>>      The notional
>>>          > use case might be an archaeological expedition wanting to
>>>      use HPC to
>>>          > process ground penetrating radar data or something like
>>>      that.   (or, given
>>>          > that I work at JPL, you have a need for HPC on the surface
>>>      of Mars) - So
>>>          > sending your data to a remote cluster isn't really an
>>>      option.  And here,
>>>          > the "speedup" you need might well be a factor of 10-20 over
>>>      a single
>>>          > computer, something doable in a "portable" configuration
>>>      (check it as
>>>          > luggage, for instance). Just as for my antenna modeling
>>>      problems, turning
>>>          > an "overnight" computation into a "10-20 minute" computation
>>>      would change
>>>          > the workflow dramatically.
>>>          >
>>>          >
>>>          > Another market is "learn how to cluster" - for which the RPi
>>>      clusters work
>>>          > (or "packs" of Beagleboards) - they're fun, and in a
>>> classroom
>>>          > environment, I think they are an excellent cost effective
>>>      solution to
>>>          > learning all the facets of "bringing up a cluster from
>>>      scratch", but I'm
>>>          > not convinced they provide a good "MIPS/Watt" or
>>>      "MIPS/liter" metric - in
>>>          > terms of convenience.  That is, rather than a cluster of 10
>>>      RPis, you
>>>          > might be better off just buying a faster desktop machine.
>>>          >
>>>          > Let's talk design desirements/constraints
>>>          >
>>>          > I've had a chance to use some "clusters in a box" over the
>>>      last decades,
>>>          > and I'd suggest that while power is one constraint, another
>>>      is noise.
>>>          > Just the other day, I was in a lab and someone commented
>>>      that "those
>>>          > computers are amazingly fast, but you really need to put
>>>      them in another
>>>          > room". Yes, all those 1U and 2U rack mounted boxes with tiny
>>>      fans
>>>          > screaming is just not "office compatible"   And that kind
>>> of
>>>      brings up
>>>          > another interesting constraint for "deskside" computing -
>>>      heat.  Sure you
>>>          > can plug in 1500W of computers (or even 3000W if you have
>>>      two circuits),
>>>          > but can you live in your office with a 1500W space heater?
>>>          > Interestingly, for *my* workflow, that's probably ok - *my*
>>>      computation
>>>          > has a 10-30% duty cycle - think for 30 minutes, compute for
>>>      5-10.  But
>>>          > still, your office mate will appreciate if you keep the
>>>      sound level down
>>>          > to 50dBA.
>>>          >
>>>          > GPUs - some codes can use them, some can't.  They tend,
>>>      though, to be
>>>          > noisy (all that air flow for cooling). I don't know that GPU
>>>      manufacturers
>>>          > spend a lot of time on this.  Sure, I've seen charts and
>>>      specs that claim
>>>          > <50 dBA. But I think they're gaming the measurement,
>>>      counting on the user
>>>          > to be a gamer wearing headphones or with a big sound
>>>      system.  I will say,
>>>          > for instance, that the PS/4 positively roars when spun up
>>>      unless you’ve
>>>          > got external forced ventilation to keep the inlet air temp
>>> low.
>>>          >
>>>          > Looking at GSA guidelines for office space - if it's
>>>      "deskside" it's got
>>>          > to fit in the 50-80 square foot cubicle or your shared part
>>>      of a 120
>>>          > square foot office.
>>>          >
>>>          > Then one needs to figure out the "refresh cycle time" for
>>>      buying hardware
>>>          > - This has been a topic on this list forever - you have 2
>>>      years of
>>>          > computation to do: do you buy N nodes today at speed X, or
>>>      do you wait a
>>>          > year, buy N/2 nodes at speed 4X, and finish your computation
>>>      at the same
>>>          > time.
>>>          >
>>>          > Fancy desktop PCs with monitors, etc. come in at under $5k,
>>>      including
>>>          > burdens and installation, but not including monthly service
>>>      charges (in an
>>>          > institutional environment).  If you look at "purchase
>>>      limits" there's some
>>>          > thresholds (usually around $10k, then increasing in factors
>>>      of 10 or 100
>>>          > steps) for approvals.  So a $100k deskside box is going to
>>>      be a tough
>>>          > sell.
>>>          >
>>>          >
>>>          >
>>>          > On 8/24/21, 6:07 AM, "Beowulf on behalf of Douglas
>>> Eadline"
>>>          > <beowulf-bounces at beowulf.org
>>>      <mailto:beowulf-bounces at beowulf.org> on behalf of
>>>      deadline at eadline.org <mailto:deadline at eadline.org>> wrote:
>>>          >
>>>          >     Jonathan
>>>          >
>>>          >     It is a real cluster, available in 4 and 8 node
>>> versions.
>>>          >     The design if for non-data center use. That is, local
>>>          >     office, lab, home where power, cooling, and noise
>>>          >     are important. More info here:
>>>          >
>>>          >
>>>      https://urldefense.us/v3/__https://www.limulus-computing.com__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDpKU4fOA$
>>>      <https://urldefense.us/v3/__https://www.limulus-computing.com__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDpKU4fOA$>
>>>          >
>>>      https://urldefense.us/v3/__https://www.limulus-computing.com/Limulus-Manual__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XD7eWwVuM$
>>>      <https://urldefense.us/v3/__https://www.limulus-computing.com/Limulus-Manual__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XD7eWwVuM$>
>>>          >
>>>          >     --
>>>          >     Doug
>>>          >
>>>          >
>>>          >
>>>          >     > Hi Doug,
>>>          >     >
>>>          >     > Not to derail the discussion, but a quick question
>>> you
>>>      say desk
>>>          > side
>>>          >     > cluster is it a single machine that will run a vm
>>> cluster?
>>>          >     >
>>>          >     > Regards,
>>>          >     > Jonathan
>>>          >     >
>>>          >     > -----Original Message-----
>>>          >     > From: Beowulf <beowulf-bounces at beowulf.org
>>>      <mailto:beowulf-bounces at beowulf.org>> On Behalf Of Douglas
>>>          > Eadline
>>>          >     > Sent: 23 August 2021 23:12
>>>          >     > To: John Hearns <hearnsj at gmail.com
>>>      <mailto:hearnsj at gmail.com>>
>>>          >     > Cc: Beowulf Mailing List <beowulf at beowulf.org
>>>      <mailto:beowulf at beowulf.org>>
>>>          >     > Subject: Re: [Beowulf] List archives
>>>          >     >
>>>          >     > John,
>>>          >     >
>>>          >     > I think that was on twitter.
>>>          >     >
>>>          >     > In any case, I'm working with these processors
>>> right now.
>>>          >     >
>>>          >     > On the new Ryzens, the power usage is actually
>>> quite
>>>      tunable.
>>>          >     > There are three settings.
>>>          >     >
>>>          >     > 1) Package Power Tracking: The PPT threshold is the
>>>      allowed socket
>>>          > power
>>>          >     > consumption permitted across the voltage rails
>>>      supplying the
>>>          > socket.
>>>          >     >
>>>          >     > 2) Thermal Design Current: The maximum current
>>> (TDC)
>>>      (amps) that can
>>>          > be
>>>          >     > delivered by a specific motherboard's voltage
>>> regulator
>>>          > configuration in
>>>          >     > thermally-constrained scenarios.
>>>          >     >
>>>          >     > 3) Electrical Design Current: The maximum current
>>>      (EDC) (amps) that
>>>          > can be
>>>          >     > delivered by a specific motherboard's voltage
>>> regulator
>>>          > configuration in a
>>>          >     > peak ("spike") condition for a short period of
>>> time.
>>>          >     >
>>>          >     > My goal is to tweak the 105W TDP R7-5800X so it
>>> draws
>>>      power like
>>>          > the
>>>          >     > 65W-TDP R5-5600X
>>>          >     >
>>>          >     > This is desk-side cluster low power stuff.
>>>          >     > I am using extension cable-plug for Limulus blades
>>>      that have an
>>>          > in-line
>>>          >     > current meter (normally used for solar panels).
>>>          >     > Now I can load them up and watch exactly how much
>>>      current is being
>>>          > pulled
>>>          >     > across the 12V rails.
>>>          >     >
>>>          >     > If you need more info, let me know
>>>          >     >
>>>          >     > --
>>>          >     > Doug
>>>          >     >
>>>          >     >> The Beowulf list archives seem to end in July
>>> 2021.
>>>          >     >> I was looking for Doug Eadline's post on limiting
>>> AMD
>>>      power and
>>>          > the
>>>          >     >> results on performance.
>>>          >     >>
>>>          >     >> John H
>>>          >     >> _______________________________________________
>>>          >     >> Beowulf mailing list, Beowulf at beowulf.org
>>>      <mailto:Beowulf at beowulf.org> sponsored by Penguin
>>>          >     >> Computing To change your subscription (digest mode
>>> or
>>>      unsubscribe)
>>>          >     >> visit
>>>          >     >>
>>>      https://urldefense.us/v3/__https://link.edgepilot.com/s/9c656d83/pBaaRl2iV0OmLHAXqkoDZQ?u=https:*__;Lw!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDvUGSdHI$
>>>      <https://urldefense.us/v3/__https://link.edgepilot.com/s/9c656d83/pBaaRl2iV0OmLHAXqkoDZQ?u=https:*__;Lw!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDvUGSdHI$>
>>>          >     >> /beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>>      <http://beowulf.org/cgi-bin/mailman/listinfo/beowulf>
>>>          >     >>
>>>          >     >
>>>          >     >
>>>          >     > --
>>>          >     > Doug
>>>          >     >
>>>          >     > _______________________________________________
>>>          >     > Beowulf mailing list, Beowulf at beowulf.org
>>>      <mailto:Beowulf at beowulf.org> sponsored by Penguin
>>>          > Computing
>>>          >     > To change your subscription (digest mode or
>>>      unsubscribe) visit
>>>          >     >
>>>      https://urldefense.us/v3/__https://link.edgepilot.com/s/9c656d83/pBaaRl2iV0OmLHAXqkoDZQ?u=https:**Abeowulf.org*cgi-bin*mailman*listinfo*beowulf__;Ly8vLy8v!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDUP8JZUc$
>>>      <https://urldefense.us/v3/__https://link.edgepilot.com/s/9c656d83/pBaaRl2iV0OmLHAXqkoDZQ?u=https:**Abeowulf.org*cgi-bin*mailman*listinfo*beowulf__;Ly8vLy8v!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDUP8JZUc$>
>>>          >     >
>>>          >
>>>          >
>>>          >     --
>>>          >     Doug
>>>          >
>>>          >     _______________________________________________
>>>          >     Beowulf mailing list, Beowulf at beowulf.org
>>>      <mailto:Beowulf at beowulf.org> sponsored by Penguin
>>>          > Computing
>>>          >     To change your subscription (digest mode or
>>> unsubscribe)
>>>      visit
>>>          >
>>>      https://urldefense.us/v3/__https://beowulf.org/cgi-bin/mailman/listinfo/beowulf__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDv6c1nNc$
>>>      <https://urldefense.us/v3/__https://beowulf.org/cgi-bin/mailman/listinfo/beowulf__;!!PvBDto6Hs4WbVuu7!f3kkkCuq3GKO288fxeGGHi3i-bsSY5P83PKu_svOVUISu7dkNygQtSvIpxHkE0XDv6c1nNc$>
>>>          >
>>>          >
>>>
>>>
>>>          --
>>>          Doug
>>>
>>>
>>>      _______________________________________________
>>>      Beowulf mailing list, Beowulf at beowulf.org
>>>      <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
>>>      To change your subscription (digest mode or unsubscribe) visit
>>>      https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>>      <https://beowulf.org/cgi-bin/mailman/listinfo/beowulf>
>>>
>>>
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
>
> --
> Doug
>


More information about the Beowulf mailing list