[Beowulf] Rant on why HPC isn't as easy as I'd like it to be.

Mon Sep 20 21:42:57 UTC 2021

My dream is to use some sort of optimization software (I would try Genetic
Programing say) with a heterogeneous cluster (of mixed fat and light nodes,
even different network topologies in sub-clusters) to determine the optimal
configuration and optimal running parameters in an application domain for
itself. I have published (to a limited audience) on Genetic Algorithms
optimizing themselves recursively (there is converge behaviour).
Peter

On Mon, Sep 20, 2021 at 2:28 PM Lux, Jim (US 7140) via Beowulf <
beowulf at beowulf.org> wrote:

> The recent comments on compilers, caches, etc., are why HPC isn’t a bigger
> deal.  The infrastructure today is reminiscent of what I used in the 1970s
> on a big CDC or Burroughs or IBM machine, perhaps with a FPS box attached.
>
> I prepare a job, with some sort of job control structure, submit it to a
> batch queue, and get my results some time later.  Sure, I’m not dropping
> off a deck or tapes, and I’m not getting green-bar paper or a tape back,
> but really, it’s not much different – I drop a file and get files back
> either way.
>
>
>
> And just like back then, it’s up to me to figure out how best to arrange
> my code to run fastest (or me, wall clock time, but others it might be CPU
> time or cost or something else)
>
>
>
> It would be nice if the compiler (or run-time or infrastructure) figured
> out the whole “what’s the arrangement of cores/nodes/scratch storage for
> this application on this particular cluster”.
>
> I also acknowledge that this is a “hard” problem and one that doesn’t have
> the commercial value of, say, serving the optimum ads to me when I read the
> newspaper on line.
>
>
> Yeah, it’s not that hard to call library routines for matrix operations,
> and to put my trust in the library writers – I trust them more than I trust
> me to find the fastest linear equation solver, fft, etc. – but so far, the
> next level of abstraction up – “how many cores/nodes” is still left to me,
> and that means doing instrumentation, figuring out what the results mean,
> etc.
>
>
>
>
>
> *From: *Beowulf <beowulf-bounces at beowulf.org> on behalf of "
> beowulf at beowulf.org" <beowulf at beowulf.org>
> *Reply-To: *Jim Lux <james.p.lux at jpl.nasa.gov>
> *Date: *Monday, September 20, 2021 at 10:42 AM
> *To: *Lawrence Stewart <stewart at serissa.com>, Jim Cownie <
> jcownie at gmail.com>
> *Cc: *Douglas Eadline <deadline at eadline.org>, "beowulf at beowulf.org" <
> beowulf at beowulf.org>
> *Subject: *Re: [Beowulf] [EXTERNAL] Re: Deskside clusters
>
>
>
>
>
>
>
> *From: *Beowulf <beowulf-bounces at beowulf.org> on behalf of Lawrence
> Stewart <stewart at serissa.com>
> *Date: *Monday, September 20, 2021 at 9:17 AM
> *To: *Jim Cownie <jcownie at gmail.com>
> *Cc: *Lawrence Stewart <stewart at serissa.com>, Douglas Eadline <
> deadline at eadline.org>, "beowulf at beowulf.org" <beowulf at beowulf.org>
> *Subject: *Re: [Beowulf] [EXTERNAL] Re: Deskside clusters
>
>
>
> Well said.  Expanding on this, caches work because of both temporal
> locality and
>
> spatial locality.  Spatial locality is addressed by having cache lines be
> substantially
>
> larger than a byte or word.  These days, 64 bytes is pretty common.  Some
> prefetch schemes,
>
> like the L1D version that fetches the VA ^ 64 clearly affect spatial
> locality.  Streaming
>
> prefetch has an expanded notion of “spatial” I suppose!
>
>
>
> What puzzles me is why compilers seem not to have evolved much notion of
> cache management. It
>
> seems like something a smart compiler could do.  Instead, it is left to
> Prof. Goto and the folks
>
> at ATLAS and BLIS to figure out how to rewrite algorithms for efficient
> cache behavior. To my
>
> limited knowledge, compilers don’t make much use of PREFETCH or any
> non-temporal loads and stores
>
> either. It seems to me that once the programmer helps with RESTRICT and so
> forth, then compilers could perfectly well dynamically move parts of arrays
> around to maximize cache use.
>
>
>
> -L
>
>
>
> I suspect that there’s enough variability among cache implementation and
> the wide variety of algorithms that might use it that writing a
> smart-enough compiler is “hard” and “expensive”.
>
>
>
> Leaving it to the library authors is probably the best “bang for the
> buck”.
>
>
>
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20210920/7e9f613d/attachment-0001.htm>