<html><head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

</head><body text="#000000" bgcolor="#FFFFFF"><br>

Coming back late to this thread as yesterday was a travel/transit day 

... some additional thoughts<br>

<br>

1) I also avoid the word "cloud bursting" these days because it's been 

tarred by marketing smog and does not mean much. The blunt truth is that

 from a technical perspective having a hybrid premise/cloud HPC is very 

simple. The hard part is data -- either moving volumes back and forth or

 trying to maintain a consistent shared file system at WAN-scale networking

 distances. <br>

<br>

The only successful life science hybrid HPC environments I've really 

seen repeatedly are the ones that are chemistry or modeling focused 

because generally the chemistry folks have very small volumes of data to

 move but very large CPU requirements and occasional GPU needs. Since 

the data movement requirements are small for chemistry it's pretty easy 

to make them happy on-prem, on the cloud or on a hybrid design<br>

<br>

Not to say full on cloud bursting HPC systems don't exist at all of 

course but they are rare. I was talking with a pharma yesterday that 

uses HTcondor to span on-premise HPC with on demand AWS nodes. I just 

don't see that as often as I see distinct HPCs. <br>

<br>

My observed experience in this realm is that for life science we don't 

do a lot of WAN-spanning grids because we get killed by the 

gravitational pull of our data. We build HPC where the data resides and 

we keep them relatively simple in scope and we attempt to limit WAN 

scale data movement. For most this means that having onsite HPC and 

cloud HPC and we simply direct the workload to whichever HPC resource is

 closest to the data.<br>

<br>

So for Jörg -- based on what you have said I'd take a look at your 

userbase, your application mix and how your filesystem is organized. You

 may be able to set things up so that you can "burst" to the cloud for 

just a special subset of your apps, user groups or data sets. That could

 be your chemists or maybe you have a group of people who regularly 

compute heavily against a data set or set of references that rarely 

change -- in that case you may be able to replicate that part of your 

GPFS over to a cloud and send just that workload remotely, thus freeing 

up capacity on your local HPC for other work.<br>

<br>

<br>

<br>

<br>

2) Terabyte scale data movement into or out of the cloud is not scary in

 2019. You can move data into and out of the cloud at basically the line

 rate of your internet connection as long as you take a little care in 

selecting and tuning your firewalls and inline security devices.  

Pushing  1TB/day etc.  into the cloud these days is no big deal and that

 level of volume is now normal for a ton of different markets and 

industries.   It's basically a cost and budget exercises these days and 

not a particularly hard IT or technology problem. <br>

<br>

There are two killer problems with cloud storage even though it gets 

cheaper all the time<br>

<br>

2a) Cloud egress fees.  You get charged real money for data traffic 

leaving your cloud. In many environments these fees are so tiny as to be

 unnoticeable noise in the monthly bill. But if you are regularly moving

 terabyte or petabyte scale data into and out of a cloud provider then 

you will notice the egress fees on your bill and they will be large 

enough that you have to plan for them and optimize for cost<br>

<br>

2b) The monthly recurring cost for cloud storage can be hard to bear at 

petascale unless you have solidly communicated all of the benefits / 

capabilities and can compare them honestly to a full transparent list of

 real world costs to do the same thing onsite.  The monthly s3 storage 

bill once you have a few petabytes in AWS is high enough that you start 

to catch yourself doing math every once in a while along the lines of "<span

 style="font-style: italic;">I could build a Lustre filesystem w/ 2x 

capacity for just 2-months worth of our cloud storage o</span><span 

style="font-style: italic;">pex budget!</span>" <br>

<br>

<br>

<br>

<br>

<br>

<br>

<blockquote type="cite" 

cite="mid:CAPOouzCaDqhsqgcTp=HYfSeB+ZMOUv4--d=S+tDAPd23ebLY_w@mail.gmail.com"

 style="border: 0px none ! important;">

  <div xmlns="http://www.w3.org/1999/xhtml" class="__pbConvHr" 

style="margin:30px 25px 10px 25px;"><div 

style="width:100%;border-top:2px solid #EDF1F4;padding-top:10px;">   <div

style="display:inline-block;white-space:nowrap;vertical-align:middle;width:49%;">

        <a style="color:#485664 

!important;padding-right:6px;font-weight:500;text-decoration:none 

!important;" href="mailto:beowulf@beowulf.org" moz-do-not-send="true">INKozin

 via Beowulf</a></div>   <div 

style="display:inline-block;white-space:nowrap;vertical-align:middle;width:48%;text-align:

 right;">     <font color="#909AA4"><span style="padding-left:6px">July 

26, 2019 at 4:23 AM</span></font></div>    </div></div>

  <div xmlns="http://www.w3.org/1999/xhtml" class="__pbConvBody" 

__pbrmquotes="true" 

style="color:#909AA4;margin-left:24px;margin-right:24px;">

<meta http-equiv="content-type" content="text/html; charset=utf-8"><div 

dir="auto">I'm very much in favour of personal or team clusters as Chris

 has also mentioned. Then the contract between the user and the cloud is

 explicit. The data can be uploaded/ pre staged to S3 in advance (at no 

cost other than time) or copied directly as part of the cluster 

creation process. It makes no sense to replicate in the cloud your 

in-house infrastructure. However having a solid storage base in-house is

 good. What you should look into is the cost of transfer back if you 

really have to do it. The cost could be prohibitively high, eg if Bam 

files need to be returned. I'm sure Tim has an opinion.</div><br>

<br><fieldset class="mimeAttachmentHeader"></fieldset><br><div>_______________________________________________<br>Beowulf

 mailing list, <a class="moz-txt-link-abbreviated" href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>To 

change your subscription (digest mode or unsubscribe) visit 

<a class="moz-txt-link-freetext" href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br></div>

  </div>

  <div xmlns="http://www.w3.org/1999/xhtml" class="__pbConvHr" 

style="margin:30px 25px 10px 25px;"><div 

style="width:100%;border-top:2px solid #EDF1F4;padding-top:10px;">   <div

style="display:inline-block;white-space:nowrap;vertical-align:middle;width:49%;">

        <a style="color:#485664 

!important;padding-right:6px;font-weight:500;text-decoration:none 

!important;" href="mailto:joe.landman@gmail.com" moz-do-not-send="true">Joe

 Landman</a></div>   <div 

style="display:inline-block;white-space:nowrap;vertical-align:middle;width:48%;text-align:

 right;">     <font color="#909AA4"><span style="padding-left:6px">July 

26, 2019 at 12:00 AM</span></font></div>    </div></div>

  <div xmlns="http://www.w3.org/1999/xhtml" class="__pbConvBody" 

__pbrmquotes="true" 

style="color:#909AA4;margin-left:24px;margin-right:24px;">

<br><br>

<br>The issue is bursting with large data sets.  You might be able to 

pre-stage some portion of the data set in a public cloud, and then burst

jobs from there.  Data motion between sites is going to be the hard 

problem in the mix.  Not technically hard, but hard from a cost/time 

perspective.

<br>

<br>

<br>

  </div>

  <div xmlns="http://www.w3.org/1999/xhtml" class="__pbConvHr" 

style="margin:30px 25px 10px 25px;"><div 

style="width:100%;border-top:2px solid #EDF1F4;padding-top:10px;">   <div

style="display:inline-block;white-space:nowrap;vertical-align:middle;width:49%;">

        <a style="color:#485664 

!important;padding-right:6px;font-weight:500;text-decoration:none 

!important;" href="mailto:sassy-work@sassy.formativ.net" 

moz-do-not-send="true">Jörg Saßmannshausen</a></div>   <div 

style="display:inline-block;white-space:nowrap;vertical-align:middle;width:48%;text-align:

 right;">     <font color="#909AA4"><span style="padding-left:6px">July 

25, 2019 at 8:26 PM</span></font></div>    </div></div>

  <div xmlns="http://www.w3.org/1999/xhtml" class="__pbConvBody" 

__pbrmquotes="true" 

style="color:#909AA4;margin-left:24px;margin-right:24px;"><div>Dear all,

 dear Chris,<br><br>thanks for the detailed explanation. We are 

currently looking into cloud-<br>bursting so your email was very timely 

for me as I am suppose to look into it. <br><br>One of the issues I can 

see with our workload is simply getting data into the <br>cloud and back

 out again. We are not talking about a few Gigs here, we are <br>talking

 up to say 1 or more TB. For reference: we got 9 PB of storage (GPFS) <br>of

 which we are currently using 7 PB and there are around 1000+ users <br>connected

 to the system. So cloud bursting would only be possible in some <br>cases.

 <br>Do you happen to have a feeling of how to handle the issue with the

 file sizes <br>sensibly? <br><br>Sorry for hijacking the thread here a 

bit.<br><br>All the best from a hot London<br><br>Jörg<br></div><div><br>_______________________________________________<br>Beowulf

 mailing list, <a class="moz-txt-link-abbreviated" href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>To 

change your subscription (digest mode or unsubscribe) visit 

<a class="moz-txt-link-freetext" href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br></div>

  </div>

  <div xmlns="http://www.w3.org/1999/xhtml" class="__pbConvHr" 

style="margin:30px 25px 10px 25px;"><div 

style="width:100%;border-top:2px solid #EDF1F4;padding-top:10px;">   <div

style="display:inline-block;white-space:nowrap;vertical-align:middle;width:49%;">

        <a style="color:#485664 

!important;padding-right:6px;font-weight:500;text-decoration:none 

!important;" href="mailto:dag@sonsorol.org" moz-do-not-send="true">Chris

 Dagdigian</a></div>   <div 

style="display:inline-block;white-space:nowrap;vertical-align:middle;width:48%;text-align:

 right;">     <font color="#909AA4"><span style="padding-left:6px">July 

22, 2019 at 2:14 PM</span></font></div>    </div></div>

  <div xmlns="http://www.w3.org/1999/xhtml" class="__pbConvBody" 

__pbrmquotes="true" 

style="color:#909AA4;margin-left:24px;margin-right:24px;">

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

<br>

A lot of production HPC runs on cloud systems.  <br>

<br>

AWS is big for this via their AWS Parallelcluster stack which does 

include lustre support via vfXT for lustre service although they are 

careful to caveat it as staging/scratch space not suitable for 

persistant storage.  AWS has some cool node types now with 25gig, 50gig 

and 100-gigabit network support. <br>

<br>

Microsoft Azure is doing amazing things now that they have the 

cyclecomputing folks on board, integrated and able to call shots within 

the product space. They actually offer bare metal HPC and infiniband 

SKUs now and have some interesting parallel filesystem offerings as 

well. <br>

<br>

Can't comment on google as I've not touched or used it professionally  

but AWS and Azure for sure are real players now to consider if you have 

an HPC requirement. <br>

<br>

<br>

That said, however, a sober cost accounting still shows on-prem or 

"owned' HPC is best from a financial perspective if your workload is 

24x7x365 constant.  The cloud based HPC is best for capability,  bursty 

workloads, temporary workloads, auto-scaling, computing against 

cloud-resident data sets or the neat new model where instead of on-prem 

multi-user shared HPC you go out and decide to deliver individual 

bespoke HPC clusters to each user or team on the cloud.  <br>

<br>

The big paradigm shift for cloud HPC is that it does not make a lot of 

sense to make a monolithic stack shared by multiple competing users and 

groups. The automated provisioning and elasticity of the cloud make it 

more sensible to build many clusters so that you can tune each cluster 

specifically for the cluster or workload and then blow it up when the 

work is done. <br>

<br>

My $.02 of course! <br>

<br>

Chris<br>

<br>

<span>

</span><br>

<br>

  </div>

</blockquote>

<br>

</body></html>