[Beowulf] [External] Re: Rant on why HPC isn't as easy as I'd like it to be. [EXT]
Prentice Bisbal
pbisbal at pppl.gov
Mon Sep 27 15:11:35 UTC 2021
I'd be interested
Prentice
On 9/23/21 10:37 AM, Pizarro, Angel via Beowulf wrote:
>
> DISCLOSURE: I work for AWS HPC Developer Relations in the services
> team. We developer AWS Batch, AWS ParallelCluster, NICE DCV, etc.
>
> Lambda’s limits today are 128MB to 10,240MB (~10GB) and billed in 1MB
> per ms increments. 15 minute max runtime for the function invocation.
>
> Would you all be interested in a hands-on self-paced workshop on
> creating (or porting) an application to serverless environment? E.g.
> Monte-Carlo simulation, a genome alignment or variant call, or some
> other problem? We have some basic data processing documentation but
> nothing that speaks to real-world HPC use case and that is a something
> I want to fill the gap on if folks are interested in it.
>
> Dr. Denis Bauer at CSIRO is also doing interesting things with
> serverless.
>
> -angel
>
> --
>
> Angel Pizarro | Principal Developer Advocate, HPC @ AWS
>
> *From: *Beowulf <beowulf-bounces at beowulf.org> on behalf of Guy Coates
> <guy.coates at gmail.com>
> *Date: *Thursday, September 23, 2021 at 8:46 AM
> *To: *Tim Cutts <tjrc at sanger.ac.uk>
> *Cc: *Beowulf <beowulf at beowulf.org>
> *Subject: *RE: [EXTERNAL] [Beowulf] Rant on why HPC isn't as easy as
> I'd like it to be. [EXT]
>
> *CAUTION*: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender
> and know the content is safe.
>
> Out of interest, how large are the compute jobs (memory, runtime
> etc)? How easy to get them to fit into a serverless environment?
>
> Thanks,
>
>
> Guy
>
> On Tue, 21 Sept 2021 at 13:02, Tim Cutts <tjrc at sanger.ac.uk
> <mailto:tjrc at sanger.ac.uk>> wrote:
>
> I think that’s exactly the situation we’ve been in for a long
> time, especially in life sciences, and it’s becoming more
> entrenched. My experience is that the average user of our
> scientific computing systems has been becoming less technically
> savvy for many years now.
>
> The presence of the cloud makes that more acute, in particular
> because it makes it easy for the user to effectively throw more
> hardware at the problem, which reduces the incentive to make their
> code particularly fast or efficient. Cost is the only brake on
> it, and in many cases I’m finding the PI doesn’t actually care
> about that. They care that a result is being obtained (and it’s
> time to first result they care about, not time to complete all the
> analysis), and so they typically don’t have much time for those of
> us who are telling them they need to invest in time up front
> developing and optimising efficient code.
>
> And cost is not necessarily the brake I thought it was going to be
> anyway. One recent project we’ve done on AWS has impressed me a
> great deal. It’s not terribly CPU efficient, and would doubtless,
> with sufficient effort, run much more efficiently on premise. But
> it’s extremely elastic in its nature, and so a good fit for the
> cloud. Once a week, the project has to completely re-analyse the
> 600,000+ COVID genomes we’e sequenced so far, looking for new
> branches in the phylogenetic tree, and to complete that analysis
> inside 8 hours. Initial attempts to naively convert the HPC
> implementation to run on AWS looked as though they were going to
> be very expensive (~$50k per weekly run). But a fundamental
> reworking of the entire workflow to make it as cloud native as
> possible, by which I mean almost exclusively serverless, has
> succeeded beyond what I expected. The total cost is <$5,000 a
> month, and because there is essentially no statically configured
> infrastructure at all, the security is fairly easy to be
> comfortable about. And all of that was done with no detailed
> thinking about whether the actual algorithms running in the
> containers are at all optimised in a traditional HPC sense. It’s
> just not needed for this particular piece of work. Did it need
> software developers with hardcore knowledge of performance
> optimisation? No. Was it rapid to develop and deploy? Yes. Is
> the performance fast enough for UK national COVID variant
> surveillance? Yes. Is it cost effective? Yes. Sold! The one
> thing it did need was knowledgeable cloud architects, but the
> cloud providers can and do help with that.
>
> Tim
>
> --
>
> Tim Cutts
> Head of Scientific Computing
> Wellcome Sanger Institute
>
>
>
> On 21 Sep 2021, at 12:24, John Hearns <hearnsj at gmail.com
> <mailto:hearnsj at gmail.com>> wrote:
>
> Some points well made here. I have seen in the past job
> scripts passed on from graduate student to graduate student -
> the case I am thinking on was an Abaqus script for 8 core
> systems, being run on a new 32 core system. Why WOULD a
> graduate student question a script given to them - which
> works. They should be getting on with their science. I guess
> this is where Research Software Engineers come in.
>
> Another point I would make is about modern processor
> architectures, for instance AMD Rome/Milan. You can have
> different Numa Per Socket options, which affect performance.
> We set the preferred IO path - which I have seen myself to
> have an effect on latency of MPI messages. IF you are not
> concerned about your hardware layout you would just go ahead
> and run, missing a lot of performance.
>
> I am now going to be controversial and common that over in
> Julia land the pattern seems to be these days people develop
> on their own laptops, or maybe local GPU systems. There is a
> lot of microbenchmarking going on. But there seems to be not a
> lot of thought given to CPU pinning or shat happens with
> hyperthreading. I guess topics like that are part of HPC
> 'Black Magic' - though I would imagine the low latency crowd
> are hot on them.
>
> I often introduce people to the excellent lstopo/hwloc
> utilities which show the layout of a system. Most people are
> pleasantly surprised to find this.
>
> -- The Wellcome Sanger Institute is operated by Genome Research
> Limited, a charity registered in England with number 1021457 and a
> company registered in England with number 2742969, whose
> registered office is 215 Euston Road, London, NW1 2BE.
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> <https://beowulf.org/cgi-bin/mailman/listinfo/beowulf>
>
>
> --
>
> Dr. Guy Coates
> +44(0)7801 710224
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20210927/ff002c7c/attachment.htm>
More information about the Beowulf
mailing list