<div dir="auto">I'm very much in favour of personal or team clusters as Chris has also mentioned. Then the contract between the user and the cloud is explicit. The data can be uploaded/ pre staged to S3 in advance (at no cost other than time) or copied directly as part of the cluster creation process. It makes no sense to replicate in the cloud your in-house infrastructure. However having a solid storage base in-house is good. What you should look into is the cost of transfer back if you really have to do it. The cost could be prohibitively high, eg if Bam files need to be returned. I'm sure Tim has an opinion.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 26 Jul 2019, 05:01 Joe Landman, <<a href="mailto:joe.landman@gmail.com">joe.landman@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

On 7/25/19 8:26 PM, Jörg Saßmannshausen wrote:<br>

> Dear all, dear Chris,<br>

><br>

> thanks for the detailed explanation. We are currently looking into cloud-<br>

> bursting so your email was very timely for me as I am suppose to look into it.<br>

><br>

> One of the issues I can see with our workload is simply getting data into the<br>

> cloud and back out again. We are not talking about a few Gigs here, we are<br>

> talking up to say 1 or more TB. For reference: we got 9 PB of storage (GPFS)<br>

> of which we are currently using 7 PB and there are around 1000+ users<br>

> connected to the system. So cloud bursting would only be possible in some<br>

> cases.<br>

> Do you happen to have a feeling of how to handle the issue with the file sizes<br>

> sensibly?<br>

<br>

The issue is bursting with large data sets.  You might be able to <br>

pre-stage some portion of the data set in a public cloud, and then burst <br>

jobs from there.  Data motion between sites is going to be the hard <br>

problem in the mix.  Not technically hard, but hard from a cost/time <br>

perspective.<br>

<br>

<br>

-- <br>

Joe Landman<br>

e: <a href="mailto:joe.landman@gmail.com" target="_blank" rel="noreferrer">joe.landman@gmail.com</a><br>

t: @hpcjoe<br>

w: <a href="https://scalability.org" rel="noreferrer noreferrer" target="_blank">https://scalability.org</a><br>

g: <a href="https://github.com/joelandman" rel="noreferrer noreferrer" target="_blank">https://github.com/joelandman</a><br>

l: <a href="https://www.linkedin.com/in/joelandman" rel="noreferrer noreferrer" target="_blank">https://www.linkedin.com/in/joelandman</a><br>

<br>

_______________________________________________<br>

Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank" rel="noreferrer">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>

To change your subscription (digest mode or unsubscribe) visit <a href="https://beowulf.org/cgi-bin/mailman/listinfo/beowulf" rel="noreferrer noreferrer" target="_blank">https://beowulf.org/cgi-bin/mailman/listinfo/beowulf</a><br>

</blockquote></div>