[Beowulf] [External] Re: Rant on why HPC isn't as easy as I'd like it to be. [EXT]

Mon Sep 27 15:11:35 UTC 2021

I'd be interested

Prentice

On 9/23/21 10:37 AM, Pizarro, Angel via Beowulf wrote:
>
> DISCLOSURE: I work for AWS HPC Developer Relations in the services 
> team. We developer AWS Batch, AWS ParallelCluster, NICE DCV, etc.
>
> Lambda’s limits today are 128MB to 10,240MB (~10GB) and billed in 1MB 
> per ms increments. 15 minute max runtime for the function invocation.
>
> Would you all be interested in a hands-on self-paced workshop on 
> creating (or porting) an application to serverless environment? E.g. 
> Monte-Carlo simulation, a genome alignment or variant call, or some 
> other problem? We have some basic data processing documentation but 
> nothing that speaks to real-world HPC use case and that is a something 
> I want to fill the gap on if folks are interested in it.
>
> Dr. Denis Bauer at CSIRO is also doing interesting things with 
> serverless.
>
> -angel
>
> -- 
>
> Angel Pizarro | Principal Developer Advocate, HPC @ AWS
>
> *From: *Beowulf <beowulf-bounces at beowulf.org> on behalf of Guy Coates 
> <guy.coates at gmail.com>
> *Date: *Thursday, September 23, 2021 at 8:46 AM
> *To: *Tim Cutts <tjrc at sanger.ac.uk>
> *Cc: *Beowulf <beowulf at beowulf.org>
> *Subject: *RE: [EXTERNAL] [Beowulf] Rant on why HPC isn't as easy as 
> I'd like it to be. [EXT]
>
> *CAUTION*: This email originated from outside of the organization. Do 
> not click links or open attachments unless you can confirm the sender 
> and know the content is safe.
>
> Out of interest, how large are the compute jobs (memory, runtime 
> etc)?  How easy to get them to fit into a serverless environment?
>
> Thanks,
>
>
> Guy
>
> On Tue, 21 Sept 2021 at 13:02, Tim Cutts <tjrc at sanger.ac.uk 
> <mailto:tjrc at sanger.ac.uk>> wrote:
>
>     I think that’s exactly the situation we’ve been in for a long
>     time, especially in life sciences, and it’s becoming more
>     entrenched.  My experience is that the average user of our
>     scientific computing systems has been becoming less technically
>     savvy for many years now.
>
>     The presence of the cloud makes that more acute, in particular
>     because it makes it easy for the user to effectively throw more
>     hardware at the problem, which reduces the incentive to make their
>     code particularly fast or efficient.  Cost is the only brake on
>     it, and in many cases I’m finding the PI doesn’t actually care
>     about that.  They care that a result is being obtained (and it’s
>     time to first result they care about, not time to complete all the
>     analysis), and so they typically don’t have much time for those of
>     us who are telling them they need to invest in time up front
>     developing and optimising efficient code.
>
>     And cost is not necessarily the brake I thought it was going to be
>     anyway.  One recent project we’ve done on AWS has impressed me a
>     great deal.  It’s not terribly CPU efficient, and would doubtless,
>     with sufficient effort, run much more efficiently on premise.  But
>     it’s extremely elastic in its nature, and so a good fit for the
>     cloud.   Once a week, the project has to completely re-analyse the
>     600,000+ COVID genomes we’e sequenced so far, looking for new
>     branches in the phylogenetic tree, and to complete that analysis
>     inside 8 hours.   Initial attempts to naively convert the HPC
>     implementation to run on AWS looked as though they were going to
>     be very expensive (~$50k per weekly run).  But a fundamental
>     reworking of the entire workflow to make it as cloud native as
>     possible, by which I mean almost exclusively serverless, has
>     succeeded beyond what I expected.  The total cost is <$5,000 a
>     month, and because there is essentially no statically configured
>     infrastructure at all, the security is fairly easy to be
>     comfortable about. And all of that was done with no detailed
>     thinking about whether the actual algorithms running in the
>     containers are at all optimised in a traditional HPC sense.  It’s
>     just not needed for this particular piece of work.  Did it need
>     software developers with hardcore knowledge of performance
>     optimisation? No.  Was it rapid to develop and deploy?  Yes.  Is
>     the performance fast enough for UK national COVID variant
>     surveillance?  Yes.  Is it cost effective? Yes.  Sold!  The one
>     thing it did need was knowledgeable cloud architects, but the
>     cloud providers can and do help with that.
>
>     Tim
>
>     -- 
>
>     Tim Cutts
>     Head of Scientific Computing
>     Wellcome Sanger Institute
>
>
>
>         On 21 Sep 2021, at 12:24, John Hearns <hearnsj at gmail.com
>         <mailto:hearnsj at gmail.com>> wrote:
>
>         Some points well made here. I have seen in the past job
>         scripts passed on from graduate student to graduate student -
>         the case I am thinking on was an Abaqus script for 8 core
>         systems, being run on a new 32 core system. Why WOULD a
>         graduate student question a script given to them - which
>         works. They should be getting on with their science. I guess
>         this is where Research Software Engineers come in.
>
>         Another point I would make is about modern processor
>         architectures, for instance AMD Rome/Milan. You can have
>         different Numa Per Socket options, which affect performance.
>         We set the preferred IO path - which I have seen myself to
>         have an effect on latency of MPI messages. IF you are not
>         concerned about your hardware layout you would just go ahead
>         and run, missing  a lot of performance.
>
>         I am now going to be controversial and common that over in
>         Julia land the pattern seems to be these days people develop
>         on their own laptops, or maybe local GPU systems. There is a
>         lot of microbenchmarking going on. But there seems to be not a
>         lot of thought given to CPU pinning or shat happens with
>         hyperthreading. I guess topics like that are part of HPC
>         'Black Magic' - though I would imagine the low latency crowd
>         are hot on them.
>
>         I often introduce people to the excellent lstopo/hwloc
>         utilities which show the layout of a system. Most people are
>         pleasantly surprised to find this.
>
>     -- The Wellcome Sanger Institute is operated by Genome Research
>     Limited, a charity registered in England with number 1021457 and a
>     company registered in England with number 2742969, whose
>     registered office is 215 Euston Road, London, NW1 2BE.
>
>     _______________________________________________
>     Beowulf mailing list, Beowulf at beowulf.org
>     <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
>     To change your subscription (digest mode or unsubscribe) visit
>     https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>     <https://beowulf.org/cgi-bin/mailman/listinfo/beowulf>
>
>
> -- 
>
> Dr. Guy Coates
> +44(0)7801 710224
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20210927/ff002c7c/attachment.htm>