<div dir="ltr">) Terabyte scale data movement into or out of the cloud is not scary in 2019. You can move data into and out of the cloud at basically the line rate of your internet connection as long as you take a little care in selecting and tuning your firewalls and inline security devices. Pushing 1TB/day etc. into the cloud these days is no big deal and that level of volume is now normal for a ton of different markets and industries. <br><div><br></div><div>Amazon will of course also send you a semi trailer full of hard drives to import your data... The web page says "Contact Sales for pricing"</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 26 Jul 2019 at 12:26, Chris Dagdigian <<a href="mailto:dag@sonsorol.org">dag@sonsorol.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF"><br>
Coming back late to this thread as yesterday was a travel/transit day
... some additional thoughts<br>
<br>
1) I also avoid the word "cloud bursting" these days because it's been
tarred by marketing smog and does not mean much. The blunt truth is that
from a technical perspective having a hybrid premise/cloud HPC is very
simple. The hard part is data -- either moving volumes back and forth or
trying to maintain a consistent shared file system at WAN-scale networking
distances. <br>
<br>
The only successful life science hybrid HPC environments I've really
seen repeatedly are the ones that are chemistry or modeling focused
because generally the chemistry folks have very small volumes of data to
move but very large CPU requirements and occasional GPU needs. Since
the data movement requirements are small for chemistry it's pretty easy
to make them happy on-prem, on the cloud or on a hybrid design<br>
<br>
Not to say full on cloud bursting HPC systems don't exist at all of
course but they are rare. I was talking with a pharma yesterday that
uses HTcondor to span on-premise HPC with on demand AWS nodes. I just
don't see that as often as I see distinct HPCs. <br>
<br>
My observed experience in this realm is that for life science we don't
do a lot of WAN-spanning grids because we get killed by the
gravitational pull of our data. We build HPC where the data resides and
we keep them relatively simple in scope and we attempt to limit WAN
scale data movement. For most this means that having onsite HPC and
cloud HPC and we simply direct the workload to whichever HPC resource is
closest to the data.<br>
<br>
So for Jörg -- based on what you have said I'd take a look at your
userbase, your application mix and how your filesystem is organized. You
may be able to set things up so that you can "burst" to the cloud for
just a special subset of your apps, user groups or data sets. That could
be your chemists or maybe you have a group of people who regularly
compute heavily against a data set or set of references that rarely
change -- in that case you may be able to replicate that part of your
GPFS over to a cloud and send just that workload remotely, thus freeing
up capacity on your local HPC for other work.<br>
<br>
<br>
<br>
<br>
2) Terabyte scale data movement into or out of the cloud is not scary in
2019. You can move data into and out of the cloud at basically the line
rate of your internet connection as long as you take a little care in
selecting and tuning your firewalls and inline security devices.
Pushing 1TB/day etc. into the cloud these days is no big deal and that
level of volume is now normal for a ton of different markets and
industries. It's basically a cost and budget exercises these days and
not a particularly hard IT or technology problem. <br>
<br>
There are two killer problems with cloud storage even though it gets
cheaper all the time<br>
<br>
2a) Cloud egress fees. You get charged real money for data traffic
leaving your cloud. In many environments these fees are so tiny as to be
unnoticeable noise in the monthly bill. But if you are regularly moving
terabyte or petabyte scale data into and out of a cloud provider then
you will notice the egress fees on your bill and they will be large
enough that you have to plan for them and optimize for cost<br>
<br>
2b) The monthly recurring cost for cloud storage can be hard to bear at
petascale unless you have solidly communicated all of the benefits /
capabilities and can compare them honestly to a full transparent list of
real world costs to do the same thing onsite. The monthly s3 storage
bill once you have a few petabytes in AWS is high enough that you start
to catch yourself doing math every once in a while along the lines of "<span style="font-style:italic">I could build a Lustre filesystem w/ 2x
capacity for just 2-months worth of our cloud storage o</span><span style="font-style:italic">pex budget!</span>" <br>
<br>
<br>
<br>
<br>
<br>
<br>
<blockquote type="cite" style="border:0px none">
<div class="gmail-m_-8467732868132661910__pbConvHr" style="margin:30px 25px 10px"><div style="width:100%;border-top:2px solid rgb(237,241,244);padding-top:10px"> <div style="display:inline-block;white-space:nowrap;vertical-align:middle;width:49%">
<a style="padding-right:6px;font-weight:500;color:rgb(72,86,100);text-decoration:none" href="mailto:beowulf@beowulf.org" target="_blank">INKozin
via Beowulf</a></div> <div style="display:inline-block;white-space:nowrap;vertical-align:middle;width:48%;text-align:right"> <font color="#909AA4"><span style="padding-left:6px">July
26, 2019 at 4:23 AM</span></font></div> </div></div>
<div class="gmail-m_-8467732868132661910__pbConvBody" style="color:rgb(144,154,164);margin-left:24px;margin-right:24px">
<div dir="auto">I'm very much in favour of personal or team clusters as Chris
has also mentioned. Then the contract between the user and the cloud is
explicit. The data can be uploaded/ pre staged to S3 in advance (at no
cost other than time) or copied directly as part of the cluster
creation process. It makes no sense to replicate in the cloud your
in-house infrastructure. However having a solid storage base in-house is
good. What you should look into is the cost of transfer back if you
really have to do it. The cost could be prohibitively high, eg if Bam
files need to be returned. I'm sure Tim has an opinion.</div><br>
<br><fieldset class="gmail-m_-8467732868132661910mimeAttachmentHeader"></fieldset><br><div>_______________________________________________<br>Beowulf
mailing list, <a class="gmail-m_-8467732868132661910moz-txt-link-abbreviated" href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>To
change your subscription (digest mode or unsubscribe) visit
<a class="gmail-m_-8467732868132661910moz-txt-link-freetext" href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf" target="_blank">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br></div>
</div>
<div class="gmail-m_-8467732868132661910__pbConvHr" style="margin:30px 25px 10px"><div style="width:100%;border-top:2px solid rgb(237,241,244);padding-top:10px"> <div style="display:inline-block;white-space:nowrap;vertical-align:middle;width:49%">
<a style="padding-right:6px;font-weight:500;color:rgb(72,86,100);text-decoration:none" href="mailto:joe.landman@gmail.com" target="_blank">Joe
Landman</a></div> <div style="display:inline-block;white-space:nowrap;vertical-align:middle;width:48%;text-align:right"> <font color="#909AA4"><span style="padding-left:6px">July
26, 2019 at 12:00 AM</span></font></div> </div></div>
<div class="gmail-m_-8467732868132661910__pbConvBody" style="color:rgb(144,154,164);margin-left:24px;margin-right:24px">
<br><br>
<br>The issue is bursting with large data sets. You might be able to
pre-stage some portion of the data set in a public cloud, and then burst
jobs from there. Data motion between sites is going to be the hard
problem in the mix. Not technically hard, but hard from a cost/time
perspective.
<br>
<br>
<br>
</div>
<div class="gmail-m_-8467732868132661910__pbConvHr" style="margin:30px 25px 10px"><div style="width:100%;border-top:2px solid rgb(237,241,244);padding-top:10px"> <div style="display:inline-block;white-space:nowrap;vertical-align:middle;width:49%">
<a style="padding-right:6px;font-weight:500;color:rgb(72,86,100);text-decoration:none" href="mailto:sassy-work@sassy.formativ.net" target="_blank">Jörg Saßmannshausen</a></div> <div style="display:inline-block;white-space:nowrap;vertical-align:middle;width:48%;text-align:right"> <font color="#909AA4"><span style="padding-left:6px">July
25, 2019 at 8:26 PM</span></font></div> </div></div>
<div class="gmail-m_-8467732868132661910__pbConvBody" style="color:rgb(144,154,164);margin-left:24px;margin-right:24px"><div>Dear all,
dear Chris,<br><br>thanks for the detailed explanation. We are
currently looking into cloud-<br>bursting so your email was very timely
for me as I am suppose to look into it. <br><br>One of the issues I can
see with our workload is simply getting data into the <br>cloud and back
out again. We are not talking about a few Gigs here, we are <br>talking
up to say 1 or more TB. For reference: we got 9 PB of storage (GPFS) <br>of
which we are currently using 7 PB and there are around 1000+ users <br>connected
to the system. So cloud bursting would only be possible in some <br>cases.
<br>Do you happen to have a feeling of how to handle the issue with the
file sizes <br>sensibly? <br><br>Sorry for hijacking the thread here a
bit.<br><br>All the best from a hot London<br><br>Jörg<br></div><div><br>_______________________________________________<br>Beowulf
mailing list, <a class="gmail-m_-8467732868132661910moz-txt-link-abbreviated" href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>To
change your subscription (digest mode or unsubscribe) visit
<a class="gmail-m_-8467732868132661910moz-txt-link-freetext" href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf" target="_blank">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br></div>
</div>
<div class="gmail-m_-8467732868132661910__pbConvHr" style="margin:30px 25px 10px"><div style="width:100%;border-top:2px solid rgb(237,241,244);padding-top:10px"> <div style="display:inline-block;white-space:nowrap;vertical-align:middle;width:49%">
<a style="padding-right:6px;font-weight:500;color:rgb(72,86,100);text-decoration:none" href="mailto:dag@sonsorol.org" target="_blank">Chris
Dagdigian</a></div> <div style="display:inline-block;white-space:nowrap;vertical-align:middle;width:48%;text-align:right"> <font color="#909AA4"><span style="padding-left:6px">July
22, 2019 at 2:14 PM</span></font></div> </div></div>
<div class="gmail-m_-8467732868132661910__pbConvBody" style="color:rgb(144,154,164);margin-left:24px;margin-right:24px">
<br>
A lot of production HPC runs on cloud systems. <br>
<br>
AWS is big for this via their AWS Parallelcluster stack which does
include lustre support via vfXT for lustre service although they are
careful to caveat it as staging/scratch space not suitable for
persistant storage. AWS has some cool node types now with 25gig, 50gig
and 100-gigabit network support. <br>
<br>
Microsoft Azure is doing amazing things now that they have the
cyclecomputing folks on board, integrated and able to call shots within
the product space. They actually offer bare metal HPC and infiniband
SKUs now and have some interesting parallel filesystem offerings as
well. <br>
<br>
Can't comment on google as I've not touched or used it professionally
but AWS and Azure for sure are real players now to consider if you have
an HPC requirement. <br>
<br>
<br>
That said, however, a sober cost accounting still shows on-prem or
"owned' HPC is best from a financial perspective if your workload is
24x7x365 constant. The cloud based HPC is best for capability, bursty
workloads, temporary workloads, auto-scaling, computing against
cloud-resident data sets or the neat new model where instead of on-prem
multi-user shared HPC you go out and decide to deliver individual
bespoke HPC clusters to each user or team on the cloud. <br>
<br>
The big paradigm shift for cloud HPC is that it does not make a lot of
sense to make a monolithic stack shared by multiple competing users and
groups. The automated provisioning and elasticity of the cloud make it
more sensible to build many clusters so that you can tune each cluster
specifically for the cluster or workload and then blow it up when the
work is done. <br>
<br>
My $.02 of course! <br>
<br>
Chris<br>
<br>
<span>
</span><br>
<br>
</div>
</blockquote>
<br>
</div>
_______________________________________________<br>
Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
To change your subscription (digest mode or unsubscribe) visit <a href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>
</blockquote></div>