[Beowulf] Clustering vs Hadoop/spark [EXT]

John Hearns hearnsj at gmail.com
Wed Nov 25 09:45:44 UTC 2020


Tim, that is really smart. Over on the Julia discourse forum I have blue
skyed about using Lambdas to run Julia functions (it is an inherently
functional language) (*)
Blue skying further, for exascale compute needs can we think of 'Science as
a Service'?
As in your example the scientist thinks about the analysis and how it is
performed. Then sends it off to be executed. Large chunks are run using
Lambda functions.
Crucially, if a Lambda (or whatever) fails the algorithm should be able to
continue. People building web scale applications think like this today
anyway.
Do you REALLY think you are connected to Amazon's single web server when
you make a purchase? But it looks that way.
Also if you are about to purchase something and your Wifi goes down - as a
customer you would be very angry if you were billed for this item.

(*) It is possible to insert your own 'payload' in a Lambda. There are
standard ones like Python obviously.
However at the time I looked there was a small size limit on the payload.

Re-reading my won response
https://discourse.julialang.org/t/lambda-or-cloud-functions-eventually-possible/39128/5
you CAN have a larger payload, but this has to be in an S3 bucket
https://docs.aws.amazon.com/lambda/latest/dg/nodejs-package.html

BTW, I am sure everyone knows this but if you have a home assistant such as
Alexa everytime you ask Alexa it is a lambda which is spun up







On Wed, 25 Nov 2020 at 09:27, Tim Cutts <tjrc at sanger.ac.uk> wrote:

>
>
> On 24 Nov 2020, at 18:31, Alex Chekholko via Beowulf <beowulf at beowulf.org>
> wrote:
>
> If you can run your task on just one computer, you should always do that
> rather than having to build a cluster of some kind and all the associated
> headaches.
>
>
> If you take on the cloud message, that of course isn’t necessarily the
> case.  If you use very high level cloud services like lambda, you don’t
> have to build that infrastructure.  It’s very unlikely to be anywhere near
> as efficient, of course, but throughput efficiency is not what your average
> scientist cares about.  What they care about is getting their answer
> quickly (and to a lesser extent, cheaply)
>
> I saw a recent example where someone took a fairly simple sequencing read
> alignment process, which normally runs on a single 16-core node in about 6
> hours, and split the input files small enough that the alignment code
> execution time and memory use would fit with AWS Lambda’s envelope.  The
> result executed in a couple of minutes, elapsed, but used about four times
> as many core-hours as the optimised single node version.  Of course, this
> is an embarrassingly parallel problem, so this is a relatively easy
> analysis to move to this sort of design.
>
> From the scientist’s point of view, which is better?  Getting their answer
> in 5 minutes or 6 hours?  Especially if they’ve also reduced their
> development time as well because they don’t have to worry so much about
> infrastructure and optimisation.
>
> The total value is hard to work out, many of these considerations are hard
> to put a dollar value on.  When I saw that article, I did ask the author
> how much the analysis actually cost, and she didn’t have a number.  But I
> don’t think we can dogmatically say that we should always run a task on a
> single machine if we can.
>
> Tim
> -- The Wellcome Sanger Institute is operated by Genome Research Limited, a
> charity registered in England with number 1021457 and a company registered
> in England with number 2742969, whose registered office is 215 Euston Road,
> London, NW1 2BE.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20201125/2fb451a5/attachment.htm>


More information about the Beowulf mailing list