From novosirj at rutgers.edu Tue Jul 9 13:12:53 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Tue, 9 Jul 2019 20:12:53 +0000 Subject: [Beowulf] log_mtts_per_seg in mlx4 driver Message-ID: <697C3882-70F9-4F89-B104-FE041FA3CAA4@rutgers.edu> Hi all, There seems to be a whole lot of misinformation out there about the appropriate setting for the log_mtts_per_seg parameter in the mlx4 driver. Some folks suggest that it’s different between the RHEL/kernel.org provided driver and Mellanox OFED, some suggest it’s been fixed such that the default can assign 2x the total memory on a node since at least RHEL6.6 (and so one would assume even longer ago in OFED), and some places seem to still be carrying it forward because “maybe it matters.” Is anyone here sure of the current state? I’ll probably read the source code if not, but I’d like to spare myself the hassle. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' From novosirj at rutgers.edu Tue Jul 9 13:44:42 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Tue, 9 Jul 2019 20:44:42 +0000 Subject: [Beowulf] log_mtts_per_seg in mlx4 driver In-Reply-To: <697C3882-70F9-4F89-B104-FE041FA3CAA4@rutgers.edu> References: <697C3882-70F9-4F89-B104-FE041FA3CAA4@rutgers.edu> Message-ID: Read the source code/answered my own question. Likely bogus since 2012, confirmed at least in 3.10.0-957.21.3 — this was the relevant fix: /* * We want to scale the number of MTTs with the size of the * system memory, since it makes sense to register a lot of * memory on a system with a lot of memory. As a heuristic, * make sure we have enough MTTs to cover twice the system * memory (with PAGE_SIZE entries). * * This number has to be a power of two and fit into 32 bits * due to device limitations, so cap this at 2^31 as well. * That limits us to 8TB of memory registration per HCA with * 4KB pages, which is probably OK for the next few months. */ si_meminfo(&si); request->num_mtt = roundup_pow_of_two(max_t(unsigned, request->num_mtt, min(1UL << (31 - log_mtts_per_seg), si.totalram >> (log_mtts_per_seg - 1)))); > On Jul 9, 2019, at 4:12 PM, Ryan Novosielski wrote: > > Hi all, > > There seems to be a whole lot of misinformation out there about the appropriate setting for the log_mtts_per_seg parameter in the mlx4 driver. Some folks suggest that it’s different between the RHEL/kernel.org provided driver and Mellanox OFED, some suggest it’s been fixed such that the default can assign 2x the total memory on a node since at least RHEL6.6 (and so one would assume even longer ago in OFED), and some places seem to still be carrying it forward because “maybe it matters.” > > Is anyone here sure of the current state? I’ll probably read the source code if not, but I’d like to spare myself the hassle. > > -- > ____ > || \\UTGERS, |---------------------------*O*--------------------------- > ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark > `' > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf From novosirj at rutgers.edu Tue Jul 9 20:39:43 2019 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Wed, 10 Jul 2019 03:39:43 +0000 Subject: [Beowulf] log_mtts_per_seg in mlx4 driver In-Reply-To: References: <697C3882-70F9-4F89-B104-FE041FA3CAA4@rutgers.edu>, Message-ID: Yeah, I saw that too, which is part of why I was confused. It appears to me as if, at the very least, the kernel driver and OFED have two different recommendations. The CentOS kernel driver (from CentOS 7.6 at least, but likely longer) seems to set log_mtts_per_seg to 3 on all of my nodes. You can see from my earlier reply that this driver already takes total memory into account. OFED on the other hand (I was looking at a VM with OFED 4.5 on it — have to confirm the host version, etc.) seems to set that value to 0 by default. So I’m not a ton less confused, but reasonably confident the right move on the CentOS kernel driver is not to set that value as the defaults are sane. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' On Jul 9, 2019, at 21:25, Jonathan Engwall > wrote: Hello, They make a recommendation: https://community.mellanox.com/s/article/howto-increase-memory-size-used-by-mellanox-adapters interesting stuff. Jonathan Engwall On Tue, Jul 9, 2019 at 1:13 PM Ryan Novosielski > wrote: Hi all, There seems to be a whole lot of misinformation out there about the appropriate setting for the log_mtts_per_seg parameter in the mlx4 driver. Some folks suggest that it’s different between the RHEL/kernel.org provided driver and Mellanox OFED, some suggest it’s been fixed such that the default can assign 2x the total memory on a node since at least RHEL6.6 (and so one would assume even longer ago in OFED), and some places seem to still be carrying it forward because “maybe it matters.” Is anyone here sure of the current state? I’ll probably read the source code if not, but I’d like to spare myself the hassle. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf -------------- next part -------------- An HTML attachment was scrubbed... URL: From hearnsj at googlemail.com Thu Jul 18 20:45:16 2019 From: hearnsj at googlemail.com (John Hearns) Date: Fri, 19 Jul 2019 04:45:16 +0100 Subject: [Beowulf] Differentiable Programming with Julia Message-ID: Forgiveness is sought for my ongoing Julia fandom. We have seen a lot of articles recently on industry websites such asabout how machine learning workloads are being brought onto traditional HPC platforms. This paper on how Julia is bringing them together is I think significant https://arxiv.org/pdf/1907.07587.pdf (apology - I cannot cut and paste the abstract) ps. Doug Eadline - if you would like a blog post about this paper I could try. But my head will hurt trying to understand it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From engwalljonathanthereal at gmail.com Sun Jul 21 19:30:07 2019 From: engwalljonathanthereal at gmail.com (Jonathan Engwall) Date: Sun, 21 Jul 2019 19:30:07 -0700 Subject: [Beowulf] flatpack Message-ID: Hello Beowulf, Some distros will be glad to know Flatpack will load your software center with working downloads. First visit the website: https://flatpak.org/setup/ , choose your distro, then enable it for your installation. After that, your software center, this is for GNOME, will then quadruple, at least, in size if your has been lacking. Ubuntu, for instance, has always been loaded. This works for CentOS, Fedora, also therefore RedHat which I have never used. Rasbian and several others are on the page. Attached: In the screenshot attached below you can see I now have Godot Engine. You can see Visual Scripting of a simple 2d ui_left, right type game. It looks tedious, but it is so easy to redesign. Of course I had a problem, no CallNode. Probably because I enabled 3d features. So I tried one around then another. I was dragging nodes, piping physics, I ran the normalized vector through the update. I stuffed it all into Return, many various things. I nearly got through it too. At one point a clear error: Size was (1). But LoL I didn't watch the demos so I didn't know how I did that either. Jonathan Engwall Screenshot from 2019-07-20 18-32-30.png -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris at csamuel.org Mon Jul 22 10:10:39 2019 From: chris at csamuel.org (Christopher Samuel) Date: Mon, 22 Jul 2019 10:10:39 -0700 Subject: [Beowulf] flatpack In-Reply-To: References: Message-ID: <9db7f7c8-8c61-ca48-68df-0ee8ef0b225e@csamuel.org> On 7/21/19 7:30 PM, Jonathan Engwall wrote: > Some distros will be glad to know Flatpack will load your software > center with working downloads. Are you thinking of this as an alternative to container systems & tools like easybuild as a software delivery system for HPC systems? How widely supported is it? All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA From engwalljonathanthereal at gmail.com Mon Jul 22 10:46:41 2019 From: engwalljonathanthereal at gmail.com (Jonathan Engwall) Date: Mon, 22 Jul 2019 10:46:41 -0700 Subject: [Beowulf] flatpack In-Reply-To: References: <9db7f7c8-8c61-ca48-68df-0ee8ef0b225e@csamuel.org> Message-ID: It is funny, I looked at their credits Codethink seems quite serious, Barron's thinks highly of Fastly, but Mythic Beast is a dynamic DNS and Scaleway is a public/private cloud that give a 500 pound starting credit. So, if you act now all these great titles can be yours. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaquilina at eagleeyet.net Mon Jul 22 10:48:16 2019 From: jaquilina at eagleeyet.net (Jonathan Aquilina) Date: Mon, 22 Jul 2019 17:48:16 +0000 Subject: [Beowulf] Lustre on google cloud Message-ID: Hi Guys, I am looking at https://cloud.google.com/blog/products/storage-data-transfer/introducing-lustre-file-system-cloud-deployment-manager-scripts This basically allows you to deploy a lustre cluster on google cloud. In your HPC setups have you considered moving towards cloud based clusters? Regards, Jonathan -------------- next part -------------- An HTML attachment was scrubbed... URL: From engwalljonathanthereal at gmail.com Mon Jul 22 10:59:06 2019 From: engwalljonathanthereal at gmail.com (Jonathan Engwall) Date: Mon, 22 Jul 2019 10:59:06 -0700 Subject: [Beowulf] flatpack In-Reply-To: References: <9db7f7c8-8c61-ca48-68df-0ee8ef0b225e@csamuel.org> Message-ID: I shoul add that a "flatpack" is any software you install and use which does not affect your os or env. Think turbotax. You insert the disk, do the math, print the form, eject. Audacity, for example, is a software I use. On Mon, Jul 22, 2019, 10:46 AM Jonathan Engwall < engwalljonathanthereal at gmail.com> wrote: > It is funny, I looked at their credits Codethink seems quite serious, > Barron's thinks highly of Fastly, but Mythic Beast is a dynamic DNS and > Scaleway is a public/private cloud that give a 500 pound starting credit. > So, if you act now all these great titles can be yours. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dag at sonsorol.org Mon Jul 22 11:14:13 2019 From: dag at sonsorol.org (Chris Dagdigian) Date: Mon, 22 Jul 2019 14:14:13 -0400 Subject: [Beowulf] Lustre on google cloud In-Reply-To: References: Message-ID: <99a9e702-87d2-65a1-04f2-c36cac6580b0@sonsorol.org> A lot of production HPC runs on cloud systems. AWS is big for this via their AWS Parallelcluster stack which does include lustre support via vfXT for lustre service although they are careful to caveat it as staging/scratch space not suitable for persistant storage.  AWS has some cool node types now with 25gig, 50gig and 100-gigabit network support. Microsoft Azure is doing amazing things now that they have the cyclecomputing folks on board, integrated and able to call shots within the product space. They actually offer bare metal HPC and infiniband SKUs now and have some interesting parallel filesystem offerings as well. Can't comment on google as I've not touched or used it professionally but AWS and Azure for sure are real players now to consider if you have an HPC requirement. That said, however, a sober cost accounting still shows on-prem or "owned' HPC is best from a financial perspective if your workload is 24x7x365 constant.  The cloud based HPC is best for capability,  bursty workloads, temporary workloads, auto-scaling, computing against cloud-resident data sets or the neat new model where instead of on-prem multi-user shared HPC you go out and decide to deliver individual bespoke HPC clusters to each user or team on the cloud. The big paradigm shift for cloud HPC is that it does not make a lot of sense to make a monolithic stack shared by multiple competing users and groups. The automated provisioning and elasticity of the cloud make it more sensible to build many clusters so that you can tune each cluster specifically for the cluster or workload and then blow it up when the work is done. My $.02 of course! Chris > Jonathan Aquilina > July 22, 2019 at 1:48 PM > > Hi Guys, > > I am looking at > https://cloud.google.com/blog/products/storage-data-transfer/introducing-lustre-file-system-cloud-deployment-manager-scripts > > This basically allows you to deploy a lustre cluster on google cloud. > In your HPC setups have you considered moving towards cloud based > clusters? > > Regards, > > Jonathan > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris at csamuel.org Mon Jul 22 17:26:33 2019 From: chris at csamuel.org (Christopher Samuel) Date: Mon, 22 Jul 2019 17:26:33 -0700 Subject: [Beowulf] Lustre on google cloud In-Reply-To: References: Message-ID: <6b899094-f6d4-7ce3-3c4d-c7b91b1efbfc@csamuel.org> On 7/22/19 10:48 AM, Jonathan Aquilina wrote: > I am looking at > https://cloud.google.com/blog/products/storage-data-transfer/introducing-lustre-file-system-cloud-deployment-manager-scripts Amazon's done similar: https://aws.amazon.com/blogs/storage/building-an-hpc-cluster-with-aws-parallelcluster-and-amazon-fsx-for-lustre/ All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA From jaquilina at eagleeyet.net Mon Jul 22 22:12:37 2019 From: jaquilina at eagleeyet.net (Jonathan Aquilina) Date: Tue, 23 Jul 2019 05:12:37 +0000 Subject: [Beowulf] Lustre on google cloud In-Reply-To: <6b899094-f6d4-7ce3-3c4d-c7b91b1efbfc@csamuel.org> References: <6b899094-f6d4-7ce3-3c4d-c7b91b1efbfc@csamuel.org> Message-ID: Hi Chris, I am aware of that as I follow their youtube channel. I think my main query is compared to managing a cluster in house is this the way forward be it AWS or google cloud? Regards, Jonathan -----Original Message----- From: Beowulf On Behalf Of Christopher Samuel Sent: Tuesday, 23 July 2019 02:27 To: beowulf at beowulf.org Subject: Re: [Beowulf] Lustre on google cloud On 7/22/19 10:48 AM, Jonathan Aquilina wrote: > I am looking at > https://cloud.google.com/blog/products/storage-data-transfer/introduci > ng-lustre-file-system-cloud-deployment-manager-scripts Amazon's done similar: https://aws.amazon.com/blogs/storage/building-an-hpc-cluster-with-aws-parallelcluster-and-amazon-fsx-for-lustre/ All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf From jaquilina at eagleeyet.net Mon Jul 22 22:26:53 2019 From: jaquilina at eagleeyet.net (Jonathan Aquilina) Date: Tue, 23 Jul 2019 05:26:53 +0000 Subject: [Beowulf] flatpack In-Reply-To: References: <9db7f7c8-8c61-ca48-68df-0ee8ef0b225e@csamuel.org> Message-ID: Hi Guys, I think I might be a bit tardy to the party here, but the way you describe flatpack is equivalent to the portable apps on windows is my understanding correct? Regards, Jonathan From: Beowulf On Behalf Of Jonathan Engwall Sent: Monday, 22 July 2019 19:59 To: Christopher Samuel Cc: Beowulf Mailing List Subject: Re: [Beowulf] flatpack I shoul add that a "flatpack" is any software you install and use which does not affect your os or env. Think turbotax. You insert the disk, do the math, print the form, eject. Audacity, for example, is a software I use. On Mon, Jul 22, 2019, 10:46 AM Jonathan Engwall > wrote: It is funny, I looked at their credits Codethink seems quite serious, Barron's thinks highly of Fastly, but Mythic Beast is a dynamic DNS and Scaleway is a public/private cloud that give a 500 pound starting credit. So, if you act now all these great titles can be yours. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris at csamuel.org Mon Jul 22 22:30:10 2019 From: chris at csamuel.org (Chris Samuel) Date: Mon, 22 Jul 2019 22:30:10 -0700 Subject: [Beowulf] Lustre on google cloud In-Reply-To: References: <6b899094-f6d4-7ce3-3c4d-c7b91b1efbfc@csamuel.org> Message-ID: <5fb9c197-4582-10e7-baf0-effd437c8897@csamuel.org> On 22/7/19 10:12 pm, Jonathan Aquilina wrote: > I am aware of that as I follow their youtube channel. Fair enough, others may not. :-) > I think my main query is compared to managing a cluster in house is this the way forward be it AWS or google cloud? I think the answer there is likely "it depends". The reasons may not all be technical either, you may be an organisation from outside the US that cannot allow your data to reside offshore, or be held by a US company subject to US law even if data is not held in the US. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA From jaquilina at eagleeyet.net Mon Jul 22 22:31:30 2019 From: jaquilina at eagleeyet.net (Jonathan Aquilina) Date: Tue, 23 Jul 2019 05:31:30 +0000 Subject: [Beowulf] Lustre on google cloud In-Reply-To: <5fb9c197-4582-10e7-baf0-effd437c8897@csamuel.org> References: <6b899094-f6d4-7ce3-3c4d-c7b91b1efbfc@csamuel.org> <5fb9c197-4582-10e7-baf0-effd437c8897@csamuel.org> Message-ID: I am based in Europe so that answers my question partially. I am sure though that with the GUI side of things through the console I am sure it makes things a lot easier to setup and manage no? Regards, Jonathan -----Original Message----- From: Beowulf On Behalf Of Chris Samuel Sent: Tuesday, 23 July 2019 07:30 To: beowulf at beowulf.org Subject: Re: [Beowulf] Lustre on google cloud On 22/7/19 10:12 pm, Jonathan Aquilina wrote: > I am aware of that as I follow their youtube channel. Fair enough, others may not. :-) > I think my main query is compared to managing a cluster in house is this the way forward be it AWS or google cloud? I think the answer there is likely "it depends". The reasons may not all be technical either, you may be an organisation from outside the US that cannot allow your data to reside offshore, or be held by a US company subject to US law even if data is not held in the US. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf From chris at csamuel.org Mon Jul 22 22:38:17 2019 From: chris at csamuel.org (Chris Samuel) Date: Mon, 22 Jul 2019 22:38:17 -0700 Subject: [Beowulf] flatpack In-Reply-To: References: <9db7f7c8-8c61-ca48-68df-0ee8ef0b225e@csamuel.org> Message-ID: <3e4ae15f-2c23-6413-63ec-1136638e57ad@csamuel.org> On 22/7/19 10:26 pm, Jonathan Aquilina wrote: > Hi Guys, I think I might be a bit tardy to the party here, but the way > you describe flatpack is equivalent to the portable apps on windows is > my understanding correct? It seems that way, with an element of sandboxing to try and protect the user who is using these packages. The Debian/Ubuntu package describes it thus: Flatpak installs, manages and runs sandboxed desktop application bundles. Application bundles run partially isolated from the wider system, using containerization techniques such as namespaces to prevent direct access to system resources. Resources from outside the sandbox can be accessed via "portal" services, which are responsible for access control; for example, the Documents portal displays an "Open" dialog outside the sandbox, then allows the application to access only the selected file. . Each application uses a specified "runtime", or set of libraries, which is available as /usr inside its sandbox. This can be used to run application bundles with multiple, potentially incompatible sets of dependencies within the same desktop environment. . This package contains the services and executables needed to install and launch sandboxed applications, and the portal services needed to provide limited access to resources outside the sandbox. There's also more about it here: http://docs.flatpak.org/en/latest/basic-concepts.html The downside (from the HPC point of view) is that these binaries will need to be compiled for a relatively low common denominator of architecture (or with a compiler that can do optimisations selected at runtime depending on the architecture). All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA From jaquilina at eagleeyet.net Mon Jul 22 22:40:04 2019 From: jaquilina at eagleeyet.net (Jonathan Aquilina) Date: Tue, 23 Jul 2019 05:40:04 +0000 Subject: [Beowulf] flatpack In-Reply-To: <3e4ae15f-2c23-6413-63ec-1136638e57ad@csamuel.org> References: <9db7f7c8-8c61-ca48-68df-0ee8ef0b225e@csamuel.org> <3e4ae15f-2c23-6413-63ec-1136638e57ad@csamuel.org> Message-ID: So in a nut shell this is taking dockerization/ containerization and making it more for the every day Linux user instead of the HPC user? It would be interesting to have a distro built around such a setup. Regards, Jonathan -----Original Message----- From: Beowulf On Behalf Of Chris Samuel Sent: Tuesday, 23 July 2019 07:38 To: beowulf at beowulf.org Subject: Re: [Beowulf] flatpack On 22/7/19 10:26 pm, Jonathan Aquilina wrote: > Hi Guys, I think I might be a bit tardy to the party here, but the way > you describe flatpack is equivalent to the portable apps on windows is > my understanding correct? It seems that way, with an element of sandboxing to try and protect the user who is using these packages. The Debian/Ubuntu package describes it thus: Flatpak installs, manages and runs sandboxed desktop application bundles. Application bundles run partially isolated from the wider system, using containerization techniques such as namespaces to prevent direct access to system resources. Resources from outside the sandbox can be accessed via "portal" services, which are responsible for access control; for example, the Documents portal displays an "Open" dialog outside the sandbox, then allows the application to access only the selected file. . Each application uses a specified "runtime", or set of libraries, which is available as /usr inside its sandbox. This can be used to run application bundles with multiple, potentially incompatible sets of dependencies within the same desktop environment. . This package contains the services and executables needed to install and launch sandboxed applications, and the portal services needed to provide limited access to resources outside the sandbox. There's also more about it here: http://docs.flatpak.org/en/latest/basic-concepts.html The downside (from the HPC point of view) is that these binaries will need to be compiled for a relatively low common denominator of architecture (or with a compiler that can do optimisations selected at runtime depending on the architecture). All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf From chris at csamuel.org Mon Jul 22 22:41:32 2019 From: chris at csamuel.org (Chris Samuel) Date: Mon, 22 Jul 2019 22:41:32 -0700 Subject: [Beowulf] Lustre on google cloud In-Reply-To: References: <6b899094-f6d4-7ce3-3c4d-c7b91b1efbfc@csamuel.org> <5fb9c197-4582-10e7-baf0-effd437c8897@csamuel.org> Message-ID: <5b940c70-0643-2faa-27bd-3b494b0d1e0f@csamuel.org> On 22/7/19 10:31 pm, Jonathan Aquilina wrote: > I am sure though that with the GUI side of things through the console I am sure it makes things a lot easier to setup and manage no? You would hope so! Although I've got to say with my limited experience of Lustre when you're running it you pretty quickly end up poking through the entrails of the Linux kernel trying to figure out what's going on when it's not behaving right. :-) All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA From jaquilina at eagleeyet.net Mon Jul 22 22:44:16 2019 From: jaquilina at eagleeyet.net (Jonathan Aquilina) Date: Tue, 23 Jul 2019 05:44:16 +0000 Subject: [Beowulf] Lustre on google cloud In-Reply-To: <5b940c70-0643-2faa-27bd-3b494b0d1e0f@csamuel.org> References: <6b899094-f6d4-7ce3-3c4d-c7b91b1efbfc@csamuel.org> <5fb9c197-4582-10e7-baf0-effd437c8897@csamuel.org> <5b940c70-0643-2faa-27bd-3b494b0d1e0f@csamuel.org> Message-ID: Guess at some point I will need to fire up a test cluster and try things out. But usually from my experiences with the three major cloud providers you have access obviously with limited means to make modifications so I am curious to know how easy it would be to tweak settings. I have a bit of experience with Amazon EMR (Hadoop) and modifying the configuration of the Hadoop cluster wasn’t easy to be fair. -----Original Message----- From: Beowulf On Behalf Of Chris Samuel Sent: Tuesday, 23 July 2019 07:42 To: beowulf at beowulf.org Subject: Re: [Beowulf] Lustre on google cloud On 22/7/19 10:31 pm, Jonathan Aquilina wrote: > I am sure though that with the GUI side of things through the console I am sure it makes things a lot easier to setup and manage no? You would hope so! Although I've got to say with my limited experience of Lustre when you're running it you pretty quickly end up poking through the entrails of the Linux kernel trying to figure out what's going on when it's not behaving right. :-) All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf From chris at csamuel.org Mon Jul 22 22:47:30 2019 From: chris at csamuel.org (Chris Samuel) Date: Mon, 22 Jul 2019 22:47:30 -0700 Subject: [Beowulf] flatpack In-Reply-To: References: <9db7f7c8-8c61-ca48-68df-0ee8ef0b225e@csamuel.org> <3e4ae15f-2c23-6413-63ec-1136638e57ad@csamuel.org> Message-ID: <4f1d4a65-7f09-c814-3426-34c43284cc8d@csamuel.org> On 22/7/19 10:40 pm, Jonathan Aquilina wrote: > So in a nut shell this is taking dockerization/ containerization and > making it more for the every day Linux user instead of the HPC user? I don't think this goes as far as containers with isolation, as I think that's not what they're trying to do. But it does seem they're thinking along those lines. > It would be interesting to have a distro built around such a setup. I think this is targeting cross-distro applications. With all the duplication of libraries, etc, a distro using it would be quite bulky. Also may you have a similar security as containers have, whereby when a vulnerability is found and patched in an application or library you end up with lots of people out there still running the vulnerable version. This is why distros tend to discourage "vendoring" of libraries as that tends to fossilise vulnerabilities into an application whereas if people use the version provided in the distro the maintainers only need to fix it in that one package and everyone who links against it benefits. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA From hearnsj at googlemail.com Mon Jul 22 23:06:56 2019 From: hearnsj at googlemail.com (John Hearns) Date: Tue, 23 Jul 2019 07:06:56 +0100 Subject: [Beowulf] flatpack In-Reply-To: <4f1d4a65-7f09-c814-3426-34c43284cc8d@csamuel.org> References: <9db7f7c8-8c61-ca48-68df-0ee8ef0b225e@csamuel.org> <3e4ae15f-2c23-6413-63ec-1136638e57ad@csamuel.org> <4f1d4a65-7f09-c814-3426-34c43284cc8d@csamuel.org> Message-ID: Having used Snaps on Ubuntu - which seems to be their preferred method of distributing some applications, I have a slightly different take on the containerisation angle and would de-emphaise that. My take is that snaps/flatpak attack the "my distro ships with gcc version 4.1 but I need gcc version 8.0" By that I mean that you replace the distro shipped gcc version at your peril - as far as I am concerned tiknering with the tested/approved gcc and glibc will end you in a world of hurt. (old war story - changing bash to an upgraded version left a big SuSE system unbootable for me). So with snaps/flatpak you should be able to give your users and developers up to date applications without fooling with the core system utilities. And this is a Good Thing (TM) On Tue, 23 Jul 2019 at 06:47, Chris Samuel wrote: > On 22/7/19 10:40 pm, Jonathan Aquilina wrote: > > > So in a nut shell this is taking dockerization/ containerization and > > making it more for the every day Linux user instead of the HPC user? > > I don't think this goes as far as containers with isolation, as I think > that's not what they're trying to do. But it does seem they're thinking > along those lines. > > > It would be interesting to have a distro built around such a setup. > > I think this is targeting cross-distro applications. With all the > duplication of libraries, etc, a distro using it would be quite bulky. > > Also may you have a similar security as containers have, whereby when a > vulnerability is found and patched in an application or library you end > up with lots of people out there still running the vulnerable version. > > This is why distros tend to discourage "vendoring" of libraries as that > tends to fossilise vulnerabilities into an application whereas if people > use the version provided in the distro the maintainers only need to fix > it in that one package and everyone who links against it benefits. > > All the best, > Chris > -- > Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hearnsj at googlemail.com Tue Jul 23 00:35:45 2019 From: hearnsj at googlemail.com (John Hearns) Date: Tue, 23 Jul 2019 08:35:45 +0100 Subject: [Beowulf] flatpack In-Reply-To: References: <9db7f7c8-8c61-ca48-68df-0ee8ef0b225e@csamuel.org> <3e4ae15f-2c23-6413-63ec-1136638e57ad@csamuel.org> <4f1d4a65-7f09-c814-3426-34c43284cc8d@csamuel.org> Message-ID: Having just spouted on about snaps/flatpak I saw on the roadmap for AWS Firecracker that snap support is to be included. Sorry that I am conflating snap and flatpak. On Tue, 23 Jul 2019 at 07:06, John Hearns wrote: > Having used Snaps on Ubuntu - which seems to be their preferred method of > distributing some applications, > I have a slightly different take on the containerisation angle and would > de-emphaise that. > > My take is that snaps/flatpak attack the "my distro ships with gcc version > 4.1 but I need gcc version 8.0" > By that I mean that you replace the distro shipped gcc version at your > peril - as far as I am concerned tiknering > with the tested/approved gcc and glibc will end you in a world of hurt. > (old war story - changing bash to an upgraded version left a big SuSE > system unbootable for me). > > So with snaps/flatpak you should be able to give your users and developers > up to date applications without fooling with > the core system utilities. And this is a Good Thing (TM) > > > > > > > > On Tue, 23 Jul 2019 at 06:47, Chris Samuel wrote: > >> On 22/7/19 10:40 pm, Jonathan Aquilina wrote: >> >> > So in a nut shell this is taking dockerization/ containerization and >> > making it more for the every day Linux user instead of the HPC user? >> >> I don't think this goes as far as containers with isolation, as I think >> that's not what they're trying to do. But it does seem they're thinking >> along those lines. >> >> > It would be interesting to have a distro built around such a setup. >> >> I think this is targeting cross-distro applications. With all the >> duplication of libraries, etc, a distro using it would be quite bulky. >> >> Also may you have a similar security as containers have, whereby when a >> vulnerability is found and patched in an application or library you end >> up with lots of people out there still running the vulnerable version. >> >> This is why distros tend to discourage "vendoring" of libraries as that >> tends to fossilise vulnerabilities into an application whereas if people >> use the version provided in the distro the maintainers only need to fix >> it in that one package and everyone who links against it benefits. >> >> All the best, >> Chris >> -- >> Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ghenriks at gmail.com Tue Jul 23 06:39:09 2019 From: ghenriks at gmail.com (Gerald Henriksen) Date: Tue, 23 Jul 2019 09:39:09 -0400 Subject: [Beowulf] flatpack In-Reply-To: <4f1d4a65-7f09-c814-3426-34c43284cc8d@csamuel.org> References: <3e4ae15f-2c23-6413-63ec-1136638e57ad@csamuel.org> <4f1d4a65-7f09-c814-3426-34c43284cc8d@csamuel.org> Message-ID: On Mon, 22 Jul 2019 22:47:30 -0700, you wrote: >On 22/7/19 10:40 pm, Jonathan Aquilina wrote: > >> So in a nut shell this is taking dockerization/ containerization and >> making it more for the every day Linux user instead of the HPC user? > >I don't think this goes as far as containers with isolation, as I think >that's not what they're trying to do. But it does seem they're thinking >along those lines. Flatpack is aimed at the desktop, and as it requires assorted desktop technologies isn't meant to work on servers (which is in some ways unfortunate). As it is meant for desktop apps some of the isolation goals of something like Docker don't work so well, but the intent is to try and make things "safer" than just installing a binary and running it. >> It would be interesting to have a distro built around such a setup. > >I think this is targeting cross-distro applications. With all the >duplication of libraries, etc, a distro using it would be quite bulky. While it is cross-distro, there is a project from Fedora to build a desktop distribution around it called Silverblue https://silverblue.fedoraproject.org/ From i.n.kozin at googlemail.com Tue Jul 23 01:08:33 2019 From: i.n.kozin at googlemail.com (INKozin) Date: Tue, 23 Jul 2019 09:08:33 +0100 Subject: [Beowulf] flatpack In-Reply-To: References: Message-ID: Hi Jonathan, Thanks, good to know. I have tried running Krita on Centos recently but run into glibc issue using Appimage. Might as well try flatpack which seems to be available. Igor On Mon, 22 Jul 2019, 03:31 Jonathan Engwall, < engwalljonathanthereal at gmail.com> wrote: > Hello Beowulf, > Some distros will be glad to know Flatpack will load your software center > with working downloads. First visit the website: > https://flatpak.org/setup/ , choose your distro, then enable it for your > installation. > After that, your software center, this is for GNOME, will then quadruple, > at least, in size if your has been lacking. Ubuntu, for instance, has > always been loaded. > This works for CentOS, Fedora, also therefore RedHat which I have never > used. Rasbian and several others are on the page. > Attached: > In the screenshot attached below you can see I now have Godot Engine. You > can see Visual Scripting of a simple 2d ui_left, right type game. It looks > tedious, but it is so easy to redesign. > Of course I had a problem, no CallNode. Probably because I enabled 3d > features. So I tried one around then another. I was dragging nodes, piping > physics, I ran the normalized vector through the update. I stuffed it all > into Return, many various things. > I nearly got through it too. At one point a clear error: Size was (1). But > LoL I didn't watch the demos so I didn't know how I did that either. > Jonathan Engwall > > Screenshot from 2019-07-20 18-32-30.png > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: From John.Blaas at Colorado.EDU Tue Jul 23 08:33:11 2019 From: John.Blaas at Colorado.EDU (John) Date: Tue, 23 Jul 2019 09:33:11 -0600 Subject: [Beowulf] Reminder: Call for Papers: HPCSYSPROS Workshop @ SC19 Friday, November 22nd Message-ID: REMINDER that we are a month away from the deadline for submissions. We look forward to seeing a lot of great papers, presentations, and lightning talks. HPC Systems Professionals Workshop (HPCSYSPROS19) Call For Papers, Artifacts, and Lightning Talks --------------- HPCSYSPROS19 is held in conjunction with SC19: The International Conference on High Performance Computing, Networking, Storage and Analysis. http://sighpc-syspros.org/workshops/2019/ The workshop this year will be November 22nd (FRIDAY) from 9am to 12:30am. Please keep that in mind when making travel arrangements. Submission Deadline - August 23 (NO EXTENSIONS) Supercomputing systems present complex challenges to personnel who design, deploy and maintain them. Standing up these systems and keeping them running require novel solutions that are unique to high performance computing. The success of any supercomputing center depends on stable and reliable systems, and HPC Systems Professionals are crucial to that success. The Fourth Annual HPC Systems Professionals Workshop will bring together systems administrators, systems architects, and systems analysts in order to share best practices, discuss cutting-edge technologies, and advance the state-of-the-practice for HPC systems. This CFP requests that participants submit either papers, slide presentations, or 5-minute Lightning Talk proposals along with reproducible artifacts (code segments, test suites, configuration management templates) which can be made available to the community for use. Submissions website: https://submissions.supercomputing.org/?page=Submit&id=SC19WorkshopHPCSYSPROS19FirstSubmission&site=sc19 Topics of Interest --------------- Topics of interest include, but are not limited to: * Cluster, configuration, or software management * Performance tuning/Benchmarking * Resource manager and job scheduler configuration * Monitoring/Mean-time-to-failure/ROI/Resource utilization * Virtualization/Clouds * Designing and troubleshooting HPC interconnects * Designing and maintaining HPC storage solutions * Cybersecurity and data protection * Cluster storage All topics are expected to have an emphasis on HPC. Submission Information --------------- Authors are invited to submit original, high-quality papers, presentations, and artifacts with an emphasis on solutions that can be implemented by other members of the HPC systems community. All submissions should be in PDF format. Papers should be between 6 and 8 pages including tables, figures and appendices, but excluding references. Slide decks should consist of less than 30 slides. Lightning Talks should be submitted as a 1-2 paragraph abstract for a talk of approximately 5 minutes in length. Artifact descriptions should be described in 1-2 pages in length. All submissions should be formatted according to the SC Proceedings template. Per SC policy, margins and font sizes should not be modified. Papers submitted with an artifact are required to have an appended artifact descriptor as a part of the requirements for reproducibility. Some examples of artifacts are listed below. * Architecture Descriptions should include an interesting network, storage or system architecture, or a hybrid thereof at the data-center level. It should be documented by multiple architecture diagrams and a four page description of the architecture in the provided template. * Small Middleware or Systems Software should include an artifact of code, such as a BASH or Python Script. Additionally, there should be strong documentation that makes the artifact usable by the community. This documentation should be written in Markdown, and a two page abstract in the SC proceedings format is required. * System Configuration and Configuration Management should include configuration or configuration management, and/or the interactions between multiple configured applications. Examples of this might be a puppet module, a config file used in a unique way, or more likely, a group of config files and configuration management bundled together. Strong documentation for reproducing the artifact, as well as a two page abstract in the SC proceedings format, is required. This year we will once again have two review stages before we select the program material. During the first stage, all papers, artifacts, and talks will get reviewed and scored but there will be no decisions for inclusion in the program. The authors will then have three weeks to update their submission based on provided feedback, and resubmit. The second round will be reviewed and scored, and the highest-scoring papers, artifacts, and talks will be invited to be included within the workshop program. Proposals for different types of artifacts other than those listed above will also be accepted. Additionally, hybrids of these types of artifacts are acceptable. If you have a relevant, high-quality artifact, which has an emphasis on reproducibility and implementation, and is not included in the types above, please propose it to the committee (contact info below) . If the committee agrees, the CFP will be amended to reflect the new artifact type and its requirements. The up-to-date CFP will always be available at: http://sighpc-syspros.org/workshops/2019/HPCSYSPROS-CfP-2019.pdf All papers, abstracts and descriptions should be formatted according to the IEEE Conference Proceedings template and are required to include a modified reproducibility appendix from SC19. All submissions should be submitted electronically through the SC19 linklings instance (link included at the bottom). Please submit the main document (paper or abstract) in PDF form, as well as an accompanying zip (or gzip) file. All reviews and comments will be available on http://submissions.supercomputing.org. All accepted papers and artifacts will be published on GitHub and archived Important Dates -------------------- Submissions Open: NOW Submissions Closed: August 23rd (NO EXTENSIONS) First round reviews Sent and Resubmission Open: September 6th Resubmission Closed: September 20th Notifications of Papers and Accepted Artifacts: October 4th Final Abstracts for Program: October 10th Workshop Date: November 22nd Workshop Information -------------------- Committee Contact: contact at hpcsyspros.org Website: http://hpcsyspros.org Updated CFP: http://sighpc-syspros.org/workshops/2019/HPCSYSPROS -CfP-2019.pdf Submission Site: https://submissions.supercomputing.org/?page=Submit&id=SC19WorkshopHPCSYSPROS19FirstSubmission&site=sc19 -- John Blaas HPCSYSPROS19 Program Chair -------------- next part -------------- An HTML attachment was scrubbed... URL: From engwalljonathanthereal at gmail.com Tue Jul 23 09:17:26 2019 From: engwalljonathanthereal at gmail.com (Jonathan Engwall) Date: Tue, 23 Jul 2019 09:17:26 -0700 Subject: [Beowulf] flatpack Message-ID: /var/lib/flatpack->somewhere safer More about flatpacks: https://flatkill.org On July 23, 2019, at 1:08 AM, INKozin wrote: Hi Jonathan, Thanks, good to know. I have tried running Krita on Centos recently but run into glibc issue using Appimage. Might as well try flatpack which seems to be available. Igor On Mon, 22 Jul 2019, 03:31 Jonathan Engwall, wrote: Hello Beowulf, Some distros will be glad to know Flatpack will load your software center with working downloads. First visit the website: https://flatpak.org/setup/ , choose your distro, then enable it for your installation. After that, your software center, this is for GNOME, will then quadruple, at least, in size if your has been lacking. Ubuntu, for instance, has always been loaded. This works for CentOS, Fedora, also therefore RedHat which I have never used. Rasbian and several others are on the page. Attached: In the screenshot attached below you can see I now have Godot Engine. You can see Visual Scripting of a simple 2d ui_left, right type game. It looks tedious, but it is so easy to redesign. Of course I had a problem, no CallNode. Probably because I enabled 3d features. So I tried one around then another. I was dragging nodes, piping physics, I ran the normalized vector through the update. I stuffed it all into Return, many various things. I nearly got through it too. At one point a clear error: Size was (1). But LoL I didn't watch the demos so I didn't know how I did that either. Jonathan Engwall  Screenshot from 2019-07-20 18-32-30.png _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf -------------- next part -------------- An HTML attachment was scrubbed... URL: From sassy-work at sassy.formativ.net Thu Jul 25 17:26:47 2019 From: sassy-work at sassy.formativ.net (=?ISO-8859-1?Q?J=F6rg_Sa=DFmannshausen?=) Date: Fri, 26 Jul 2019 01:26:47 +0100 Subject: [Beowulf] Lustre on google cloud In-Reply-To: <99a9e702-87d2-65a1-04f2-c36cac6580b0@sonsorol.org> References: <99a9e702-87d2-65a1-04f2-c36cac6580b0@sonsorol.org> Message-ID: <6519110.rpbpuk141d@deepblue> Dear all, dear Chris, thanks for the detailed explanation. We are currently looking into cloud- bursting so your email was very timely for me as I am suppose to look into it. One of the issues I can see with our workload is simply getting data into the cloud and back out again. We are not talking about a few Gigs here, we are talking up to say 1 or more TB. For reference: we got 9 PB of storage (GPFS) of which we are currently using 7 PB and there are around 1000+ users connected to the system. So cloud bursting would only be possible in some cases. Do you happen to have a feeling of how to handle the issue with the file sizes sensibly? Sorry for hijacking the thread here a bit. All the best from a hot London Jörg Am Montag, 22. Juli 2019, 14:14:13 BST schrieb Chris Dagdigian: > A lot of production HPC runs on cloud systems. > > AWS is big for this via their AWS Parallelcluster stack which does > include lustre support via vfXT for lustre service although they are > careful to caveat it as staging/scratch space not suitable for > persistant storage. AWS has some cool node types now with 25gig, 50gig > and 100-gigabit network support. > > Microsoft Azure is doing amazing things now that they have the > cyclecomputing folks on board, integrated and able to call shots within > the product space. They actually offer bare metal HPC and infiniband > SKUs now and have some interesting parallel filesystem offerings as well. > > Can't comment on google as I've not touched or used it professionally > but AWS and Azure for sure are real players now to consider if you have > an HPC requirement. > > > That said, however, a sober cost accounting still shows on-prem or > "owned' HPC is best from a financial perspective if your workload is > 24x7x365 constant. The cloud based HPC is best for capability, bursty > workloads, temporary workloads, auto-scaling, computing against > cloud-resident data sets or the neat new model where instead of on-prem > multi-user shared HPC you go out and decide to deliver individual > bespoke HPC clusters to each user or team on the cloud. > > The big paradigm shift for cloud HPC is that it does not make a lot of > sense to make a monolithic stack shared by multiple competing users and > groups. The automated provisioning and elasticity of the cloud make it > more sensible to build many clusters so that you can tune each cluster > specifically for the cluster or workload and then blow it up when the > work is done. > > My $.02 of course! > > Chris > > > Jonathan Aquilina > > July 22, 2019 at 1:48 PM > > > > Hi Guys, > > > > I am looking at > > https://cloud.google.com/blog/products/storage-data-transfer/introducing-l > > ustre-file-system-cloud-deployment-manager-scripts > > > > This basically allows you to deploy a lustre cluster on google cloud. > > In your HPC setups have you considered moving towards cloud based > > clusters? > > > > Regards, > > > > Jonathan > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > > To change your subscription (digest mode or unsubscribe) visit > > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf From sassy-work at sassy.formativ.net Thu Jul 25 17:26:47 2019 From: sassy-work at sassy.formativ.net (=?ISO-8859-1?Q?J=F6rg_Sa=DFmannshausen?=) Date: Fri, 26 Jul 2019 01:26:47 +0100 Subject: [Beowulf] Lustre on google cloud In-Reply-To: <99a9e702-87d2-65a1-04f2-c36cac6580b0@sonsorol.org> References: <99a9e702-87d2-65a1-04f2-c36cac6580b0@sonsorol.org> Message-ID: <6519110.rpbpuk141d@deepblue> Dear all, dear Chris, thanks for the detailed explanation. We are currently looking into cloud- bursting so your email was very timely for me as I am suppose to look into it. One of the issues I can see with our workload is simply getting data into the cloud and back out again. We are not talking about a few Gigs here, we are talking up to say 1 or more TB. For reference: we got 9 PB of storage (GPFS) of which we are currently using 7 PB and there are around 1000+ users connected to the system. So cloud bursting would only be possible in some cases. Do you happen to have a feeling of how to handle the issue with the file sizes sensibly? Sorry for hijacking the thread here a bit. All the best from a hot London Jörg Am Montag, 22. Juli 2019, 14:14:13 BST schrieb Chris Dagdigian: > A lot of production HPC runs on cloud systems. > > AWS is big for this via their AWS Parallelcluster stack which does > include lustre support via vfXT for lustre service although they are > careful to caveat it as staging/scratch space not suitable for > persistant storage. AWS has some cool node types now with 25gig, 50gig > and 100-gigabit network support. > > Microsoft Azure is doing amazing things now that they have the > cyclecomputing folks on board, integrated and able to call shots within > the product space. They actually offer bare metal HPC and infiniband > SKUs now and have some interesting parallel filesystem offerings as well. > > Can't comment on google as I've not touched or used it professionally > but AWS and Azure for sure are real players now to consider if you have > an HPC requirement. > > > That said, however, a sober cost accounting still shows on-prem or > "owned' HPC is best from a financial perspective if your workload is > 24x7x365 constant. The cloud based HPC is best for capability, bursty > workloads, temporary workloads, auto-scaling, computing against > cloud-resident data sets or the neat new model where instead of on-prem > multi-user shared HPC you go out and decide to deliver individual > bespoke HPC clusters to each user or team on the cloud. > > The big paradigm shift for cloud HPC is that it does not make a lot of > sense to make a monolithic stack shared by multiple competing users and > groups. The automated provisioning and elasticity of the cloud make it > more sensible to build many clusters so that you can tune each cluster > specifically for the cluster or workload and then blow it up when the > work is done. > > My $.02 of course! > > Chris > > > Jonathan Aquilina > > July 22, 2019 at 1:48 PM > > > > Hi Guys, > > > > I am looking at > > https://cloud.google.com/blog/products/storage-data-transfer/introducing-l > > ustre-file-system-cloud-deployment-manager-scripts > > > > This basically allows you to deploy a lustre cluster on google cloud. > > In your HPC setups have you considered moving towards cloud based > > clusters? > > > > Regards, > > > > Jonathan > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > > To change your subscription (digest mode or unsubscribe) visit > > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf From jaquilina at eagleeyet.net Thu Jul 25 20:54:54 2019 From: jaquilina at eagleeyet.net (Jonathan Aquilina) Date: Fri, 26 Jul 2019 03:54:54 +0000 Subject: [Beowulf] Lustre on google cloud In-Reply-To: <6519110.rpbpuk141d@deepblue> References: <99a9e702-87d2-65a1-04f2-c36cac6580b0@sonsorol.org> <6519110.rpbpuk141d@deepblue> Message-ID: Hi Jorg, What kind of data are you dealing with Structured data or unstructured. Regards, Jonathan -----Original Message----- From: Jörg Saßmannshausen Sent: Friday, 26 July 2019 02:27 To: beowulf at beowulf.org; Jonathan Aquilina Subject: Re: [Beowulf] Lustre on google cloud Dear all, dear Chris, thanks for the detailed explanation. We are currently looking into cloud- bursting so your email was very timely for me as I am suppose to look into it. One of the issues I can see with our workload is simply getting data into the cloud and back out again. We are not talking about a few Gigs here, we are talking up to say 1 or more TB. For reference: we got 9 PB of storage (GPFS) of which we are currently using 7 PB and there are around 1000+ users connected to the system. So cloud bursting would only be possible in some cases. Do you happen to have a feeling of how to handle the issue with the file sizes sensibly? Sorry for hijacking the thread here a bit. All the best from a hot London Jörg Am Montag, 22. Juli 2019, 14:14:13 BST schrieb Chris Dagdigian: > A lot of production HPC runs on cloud systems. > > AWS is big for this via their AWS Parallelcluster stack which does > include lustre support via vfXT for lustre service although they are > careful to caveat it as staging/scratch space not suitable for > persistant storage. AWS has some cool node types now with 25gig, > 50gig and 100-gigabit network support. > > Microsoft Azure is doing amazing things now that they have the > cyclecomputing folks on board, integrated and able to call shots > within the product space. They actually offer bare metal HPC and > infiniband SKUs now and have some interesting parallel filesystem offerings as well. > > Can't comment on google as I've not touched or used it professionally > but AWS and Azure for sure are real players now to consider if you > have an HPC requirement. > > > That said, however, a sober cost accounting still shows on-prem or > "owned' HPC is best from a financial perspective if your workload is > 24x7x365 constant. The cloud based HPC is best for capability, > bursty workloads, temporary workloads, auto-scaling, computing against > cloud-resident data sets or the neat new model where instead of > on-prem multi-user shared HPC you go out and decide to deliver > individual bespoke HPC clusters to each user or team on the cloud. > > The big paradigm shift for cloud HPC is that it does not make a lot of > sense to make a monolithic stack shared by multiple competing users > and groups. The automated provisioning and elasticity of the cloud > make it more sensible to build many clusters so that you can tune each > cluster specifically for the cluster or workload and then blow it up > when the work is done. > > My $.02 of course! > > Chris > > > Jonathan Aquilina July 22, 2019 at > > 1:48 PM > > > > Hi Guys, > > > > I am looking at > > https://cloud.google.com/blog/products/storage-data-transfer/introdu > > cing-l ustre-file-system-cloud-deployment-manager-scripts > > > > This basically allows you to deploy a lustre cluster on google cloud. > > In your HPC setups have you considered moving towards cloud based > > clusters? > > > > Regards, > > > > Jonathan > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > > Computing To change your subscription (digest mode or unsubscribe) > > visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf From joe.landman at gmail.com Thu Jul 25 21:00:45 2019 From: joe.landman at gmail.com (Joe Landman) Date: Fri, 26 Jul 2019 00:00:45 -0400 Subject: [Beowulf] Lustre on google cloud In-Reply-To: <6519110.rpbpuk141d@deepblue> References: <99a9e702-87d2-65a1-04f2-c36cac6580b0@sonsorol.org> <6519110.rpbpuk141d@deepblue> Message-ID: <59ea1477-3236-28be-6628-9bf4c7dca901@gmail.com> On 7/25/19 8:26 PM, Jörg Saßmannshausen wrote: > Dear all, dear Chris, > > thanks for the detailed explanation. We are currently looking into cloud- > bursting so your email was very timely for me as I am suppose to look into it. > > One of the issues I can see with our workload is simply getting data into the > cloud and back out again. We are not talking about a few Gigs here, we are > talking up to say 1 or more TB. For reference: we got 9 PB of storage (GPFS) > of which we are currently using 7 PB and there are around 1000+ users > connected to the system. So cloud bursting would only be possible in some > cases. > Do you happen to have a feeling of how to handle the issue with the file sizes > sensibly? The issue is bursting with large data sets.  You might be able to pre-stage some portion of the data set in a public cloud, and then burst jobs from there.  Data motion between sites is going to be the hard problem in the mix.  Not technically hard, but hard from a cost/time perspective. -- Joe Landman e: joe.landman at gmail.com t: @hpcjoe w: https://scalability.org g: https://github.com/joelandman l: https://www.linkedin.com/in/joelandman From tjrc at sanger.ac.uk Fri Jul 26 00:36:27 2019 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Fri, 26 Jul 2019 07:36:27 +0000 Subject: [Beowulf] Lustre on google cloud [EXT] In-Reply-To: <59ea1477-3236-28be-6628-9bf4c7dca901@gmail.com> References: <99a9e702-87d2-65a1-04f2-c36cac6580b0@sonsorol.org> <6519110.rpbpuk141d@deepblue> <59ea1477-3236-28be-6628-9bf4c7dca901@gmail.com> Message-ID: I try to avoid the phrase “cloud bursting” now, for precisely this reason. Many of my users have heard the phrase, and think it means they’ll be able to instantly start work in the cloud, just because the local cluster is busy. On the compute side, yes, it’s pretty quick but as you say, getting the data out there is time consuming, and if you keep it out there all the time, expensive. Tim On 26 Jul 2019, at 05:00, Joe Landman > wrote: The issue is bursting with large data sets. You might be able to pre-stage some portion of the data set in a public cloud, and then burst jobs from there. Data motion between sites is going to be the hard problem in the mix. Not technically hard, but hard from a cost/time perspective. -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From i.n.kozin at googlemail.com Fri Jul 26 01:23:43 2019 From: i.n.kozin at googlemail.com (INKozin) Date: Fri, 26 Jul 2019 09:23:43 +0100 Subject: [Beowulf] Lustre on google cloud In-Reply-To: <59ea1477-3236-28be-6628-9bf4c7dca901@gmail.com> References: <99a9e702-87d2-65a1-04f2-c36cac6580b0@sonsorol.org> <6519110.rpbpuk141d@deepblue> <59ea1477-3236-28be-6628-9bf4c7dca901@gmail.com> Message-ID: I'm very much in favour of personal or team clusters as Chris has also mentioned. Then the contract between the user and the cloud is explicit. The data can be uploaded/ pre staged to S3 in advance (at no cost other than time) or copied directly as part of the cluster creation process. It makes no sense to replicate in the cloud your in-house infrastructure. However having a solid storage base in-house is good. What you should look into is the cost of transfer back if you really have to do it. The cost could be prohibitively high, eg if Bam files need to be returned. I'm sure Tim has an opinion. On Fri, 26 Jul 2019, 05:01 Joe Landman, wrote: > > On 7/25/19 8:26 PM, Jörg Saßmannshausen wrote: > > Dear all, dear Chris, > > > > thanks for the detailed explanation. We are currently looking into cloud- > > bursting so your email was very timely for me as I am suppose to look > into it. > > > > One of the issues I can see with our workload is simply getting data > into the > > cloud and back out again. We are not talking about a few Gigs here, we > are > > talking up to say 1 or more TB. For reference: we got 9 PB of storage > (GPFS) > > of which we are currently using 7 PB and there are around 1000+ users > > connected to the system. So cloud bursting would only be possible in some > > cases. > > Do you happen to have a feeling of how to handle the issue with the file > sizes > > sensibly? > > The issue is bursting with large data sets. You might be able to > pre-stage some portion of the data set in a public cloud, and then burst > jobs from there. Data motion between sites is going to be the hard > problem in the mix. Not technically hard, but hard from a cost/time > perspective. > > > -- > Joe Landman > e: joe.landman at gmail.com > t: @hpcjoe > w: https://scalability.org > g: https://github.com/joelandman > l: https://www.linkedin.com/in/joelandman > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dag at sonsorol.org Fri Jul 26 04:26:12 2019 From: dag at sonsorol.org (Chris Dagdigian) Date: Fri, 26 Jul 2019 07:26:12 -0400 Subject: [Beowulf] Lustre on google cloud In-Reply-To: References: <99a9e702-87d2-65a1-04f2-c36cac6580b0@sonsorol.org> <6519110.rpbpuk141d@deepblue> <59ea1477-3236-28be-6628-9bf4c7dca901@gmail.com> Message-ID: Coming back late to this thread as yesterday was a travel/transit day ... some additional thoughts 1) I also avoid the word "cloud bursting" these days because it's been tarred by marketing smog and does not mean much. The blunt truth is that from a technical perspective having a hybrid premise/cloud HPC is very simple. The hard part is data -- either moving volumes back and forth or trying to maintain a consistent shared file system at WAN-scale networking distances. The only successful life science hybrid HPC environments I've really seen repeatedly are the ones that are chemistry or modeling focused because generally the chemistry folks have very small volumes of data to move but very large CPU requirements and occasional GPU needs. Since the data movement requirements are small for chemistry it's pretty easy to make them happy on-prem, on the cloud or on a hybrid design Not to say full on cloud bursting HPC systems don't exist at all of course but they are rare. I was talking with a pharma yesterday that uses HTcondor to span on-premise HPC with on demand AWS nodes. I just don't see that as often as I see distinct HPCs. My observed experience in this realm is that for life science we don't do a lot of WAN-spanning grids because we get killed by the gravitational pull of our data. We build HPC where the data resides and we keep them relatively simple in scope and we attempt to limit WAN scale data movement. For most this means that having onsite HPC and cloud HPC and we simply direct the workload to whichever HPC resource is closest to the data. So for Jörg -- based on what you have said I'd take a look at your userbase, your application mix and how your filesystem is organized. You may be able to set things up so that you can "burst" to the cloud for just a special subset of your apps, user groups or data sets. That could be your chemists or maybe you have a group of people who regularly compute heavily against a data set or set of references that rarely change -- in that case you may be able to replicate that part of your GPFS over to a cloud and send just that workload remotely, thus freeing up capacity on your local HPC for other work. 2) Terabyte scale data movement into or out of the cloud is not scary in 2019. You can move data into and out of the cloud at basically the line rate of your internet connection as long as you take a little care in selecting and tuning your firewalls and inline security devices. Pushing  1TB/day etc.  into the cloud these days is no big deal and that level of volume is now normal for a ton of different markets and industries.   It's basically a cost and budget exercises these days and not a particularly hard IT or technology problem. There are two killer problems with cloud storage even though it gets cheaper all the time 2a) Cloud egress fees.  You get charged real money for data traffic leaving your cloud. In many environments these fees are so tiny as to be unnoticeable noise in the monthly bill. But if you are regularly moving terabyte or petabyte scale data into and out of a cloud provider then you will notice the egress fees on your bill and they will be large enough that you have to plan for them and optimize for cost 2b) The monthly recurring cost for cloud storage can be hard to bear at petascale unless you have solidly communicated all of the benefits / capabilities and can compare them honestly to a full transparent list of real world costs to do the same thing onsite.  The monthly s3 storage bill once you have a few petabytes in AWS is high enough that you start to catch yourself doing math every once in a while along the lines of "I could build a Lustre filesystem w/ 2x capacity for just 2-months worth of our cloud storage opex budget!" > INKozin via Beowulf > July 26, 2019 at 4:23 AM > I'm very much in favour of personal or team clusters as Chris has also > mentioned. Then the contract between the user and the cloud is > explicit. The data can be uploaded/ pre staged to S3 in advance (at no > cost other than time) or copied directly as part of the cluster > creation process. It makes no sense to replicate in the cloud your > in-house infrastructure. However having a solid storage base in-house > is good. What you should look into is the cost of transfer back if you > really have to do it. The cost could be prohibitively high, eg if Bam > files need to be returned. I'm sure Tim has an opinion. > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > Joe Landman > July 26, 2019 at 12:00 AM > > > > The issue is bursting with large data sets.  You might be able to > pre-stage some portion of the data set in a public cloud, and then > burst jobs from there.  Data motion between sites is going to be the > hard problem in the mix.  Not technically hard, but hard from a > cost/time perspective. > > > Jörg Saßmannshausen > July 25, 2019 at 8:26 PM > Dear all, dear Chris, > > thanks for the detailed explanation. We are currently looking into cloud- > bursting so your email was very timely for me as I am suppose to look > into it. > > One of the issues I can see with our workload is simply getting data > into the > cloud and back out again. We are not talking about a few Gigs here, we > are > talking up to say 1 or more TB. For reference: we got 9 PB of storage > (GPFS) > of which we are currently using 7 PB and there are around 1000+ users > connected to the system. So cloud bursting would only be possible in some > cases. > Do you happen to have a feeling of how to handle the issue with the > file sizes > sensibly? > > Sorry for hijacking the thread here a bit. > > All the best from a hot London > > Jörg > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > Chris Dagdigian > July 22, 2019 at 2:14 PM > > A lot of production HPC runs on cloud systems. > > AWS is big for this via their AWS Parallelcluster stack which does > include lustre support via vfXT for lustre service although they are > careful to caveat it as staging/scratch space not suitable for > persistant storage.  AWS has some cool node types now with 25gig, > 50gig and 100-gigabit network support. > > Microsoft Azure is doing amazing things now that they have the > cyclecomputing folks on board, integrated and able to call shots > within the product space. They actually offer bare metal HPC and > infiniband SKUs now and have some interesting parallel filesystem > offerings as well. > > Can't comment on google as I've not touched or used it professionally > but AWS and Azure for sure are real players now to consider if you > have an HPC requirement. > > > That said, however, a sober cost accounting still shows on-prem or > "owned' HPC is best from a financial perspective if your workload is > 24x7x365 constant.  The cloud based HPC is best for capability,  > bursty workloads, temporary workloads, auto-scaling, computing against > cloud-resident data sets or the neat new model where instead of > on-prem multi-user shared HPC you go out and decide to deliver > individual bespoke HPC clusters to each user or team on the cloud. > > The big paradigm shift for cloud HPC is that it does not make a lot of > sense to make a monolithic stack shared by multiple competing users > and groups. The automated provisioning and elasticity of the cloud > make it more sensible to build many clusters so that you can tune each > cluster specifically for the cluster or workload and then blow it up > when the work is done. > > My $.02 of course! > > Chris > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hearnsj at googlemail.com Fri Jul 26 04:46:56 2019 From: hearnsj at googlemail.com (John Hearns) Date: Fri, 26 Jul 2019 12:46:56 +0100 Subject: [Beowulf] Lustre on google cloud In-Reply-To: References: <99a9e702-87d2-65a1-04f2-c36cac6580b0@sonsorol.org> <6519110.rpbpuk141d@deepblue> <59ea1477-3236-28be-6628-9bf4c7dca901@gmail.com> Message-ID: ) Terabyte scale data movement into or out of the cloud is not scary in 2019. You can move data into and out of the cloud at basically the line rate of your internet connection as long as you take a little care in selecting and tuning your firewalls and inline security devices. Pushing 1TB/day etc. into the cloud these days is no big deal and that level of volume is now normal for a ton of different markets and industries. Amazon will of course also send you a semi trailer full of hard drives to import your data... The web page says "Contact Sales for pricing" On Fri, 26 Jul 2019 at 12:26, Chris Dagdigian wrote: > > Coming back late to this thread as yesterday was a travel/transit day ... > some additional thoughts > > 1) I also avoid the word "cloud bursting" these days because it's been > tarred by marketing smog and does not mean much. The blunt truth is that > from a technical perspective having a hybrid premise/cloud HPC is very > simple. The hard part is data -- either moving volumes back and forth or > trying to maintain a consistent shared file system at WAN-scale networking > distances. > > The only successful life science hybrid HPC environments I've really seen > repeatedly are the ones that are chemistry or modeling focused because > generally the chemistry folks have very small volumes of data to move but > very large CPU requirements and occasional GPU needs. Since the data > movement requirements are small for chemistry it's pretty easy to make them > happy on-prem, on the cloud or on a hybrid design > > Not to say full on cloud bursting HPC systems don't exist at all of course > but they are rare. I was talking with a pharma yesterday that uses HTcondor > to span on-premise HPC with on demand AWS nodes. I just don't see that as > often as I see distinct HPCs. > > My observed experience in this realm is that for life science we don't do > a lot of WAN-spanning grids because we get killed by the gravitational pull > of our data. We build HPC where the data resides and we keep them > relatively simple in scope and we attempt to limit WAN scale data movement. > For most this means that having onsite HPC and cloud HPC and we simply > direct the workload to whichever HPC resource is closest to the data. > > So for Jörg -- based on what you have said I'd take a look at your > userbase, your application mix and how your filesystem is organized. You > may be able to set things up so that you can "burst" to the cloud for just > a special subset of your apps, user groups or data sets. That could be your > chemists or maybe you have a group of people who regularly compute heavily > against a data set or set of references that rarely change -- in that case > you may be able to replicate that part of your GPFS over to a cloud and > send just that workload remotely, thus freeing up capacity on your local > HPC for other work. > > > > > 2) Terabyte scale data movement into or out of the cloud is not scary in > 2019. You can move data into and out of the cloud at basically the line > rate of your internet connection as long as you take a little care in > selecting and tuning your firewalls and inline security devices. Pushing > 1TB/day etc. into the cloud these days is no big deal and that level of > volume is now normal for a ton of different markets and industries. It's > basically a cost and budget exercises these days and not a particularly > hard IT or technology problem. > > There are two killer problems with cloud storage even though it gets > cheaper all the time > > 2a) Cloud egress fees. You get charged real money for data traffic > leaving your cloud. In many environments these fees are so tiny as to be > unnoticeable noise in the monthly bill. But if you are regularly moving > terabyte or petabyte scale data into and out of a cloud provider then you > will notice the egress fees on your bill and they will be large enough that > you have to plan for them and optimize for cost > > 2b) The monthly recurring cost for cloud storage can be hard to bear at > petascale unless you have solidly communicated all of the benefits / > capabilities and can compare them honestly to a full transparent list of > real world costs to do the same thing onsite. The monthly s3 storage bill > once you have a few petabytes in AWS is high enough that you start to catch > yourself doing math every once in a while along the lines of "I could > build a Lustre filesystem w/ 2x capacity for just 2-months worth of our > cloud storage opex budget!" > > > > > > > INKozin via Beowulf > July 26, 2019 at 4:23 AM > I'm very much in favour of personal or team clusters as Chris has also > mentioned. Then the contract between the user and the cloud is explicit. > The data can be uploaded/ pre staged to S3 in advance (at no cost other > than time) or copied directly as part of the cluster creation process. It > makes no sense to replicate in the cloud your in-house infrastructure. > However having a solid storage base in-house is good. What you should look > into is the cost of transfer back if you really have to do it. The cost > could be prohibitively high, eg if Bam files need to be returned. I'm sure > Tim has an opinion. > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > Joe Landman > July 26, 2019 at 12:00 AM > > > > The issue is bursting with large data sets. You might be able to > pre-stage some portion of the data set in a public cloud, and then burst > jobs from there. Data motion between sites is going to be the hard problem > in the mix. Not technically hard, but hard from a cost/time perspective. > > > Jörg Saßmannshausen > July 25, 2019 at 8:26 PM > Dear all, dear Chris, > > thanks for the detailed explanation. We are currently looking into cloud- > bursting so your email was very timely for me as I am suppose to look into > it. > > One of the issues I can see with our workload is simply getting data into > the > cloud and back out again. We are not talking about a few Gigs here, we are > talking up to say 1 or more TB. For reference: we got 9 PB of storage > (GPFS) > of which we are currently using 7 PB and there are around 1000+ users > connected to the system. So cloud bursting would only be possible in some > cases. > Do you happen to have a feeling of how to handle the issue with the file > sizes > sensibly? > > Sorry for hijacking the thread here a bit. > > All the best from a hot London > > Jörg > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > Chris Dagdigian > July 22, 2019 at 2:14 PM > > A lot of production HPC runs on cloud systems. > > AWS is big for this via their AWS Parallelcluster stack which does include > lustre support via vfXT for lustre service although they are careful to > caveat it as staging/scratch space not suitable for persistant storage. > AWS has some cool node types now with 25gig, 50gig and 100-gigabit network > support. > > Microsoft Azure is doing amazing things now that they have the > cyclecomputing folks on board, integrated and able to call shots within the > product space. They actually offer bare metal HPC and infiniband SKUs now > and have some interesting parallel filesystem offerings as well. > > Can't comment on google as I've not touched or used it professionally but > AWS and Azure for sure are real players now to consider if you have an HPC > requirement. > > > That said, however, a sober cost accounting still shows on-prem or "owned' > HPC is best from a financial perspective if your workload is 24x7x365 > constant. The cloud based HPC is best for capability, bursty workloads, > temporary workloads, auto-scaling, computing against cloud-resident data > sets or the neat new model where instead of on-prem multi-user shared HPC > you go out and decide to deliver individual bespoke HPC clusters to each > user or team on the cloud. > > The big paradigm shift for cloud HPC is that it does not make a lot of > sense to make a monolithic stack shared by multiple competing users and > groups. The automated provisioning and elasticity of the cloud make it more > sensible to build many clusters so that you can tune each cluster > specifically for the cluster or workload and then blow it up when the work > is done. > > My $.02 of course! > > Chris > > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joe.landman at gmail.com Fri Jul 26 09:01:49 2019 From: joe.landman at gmail.com (Joe Landman) Date: Fri, 26 Jul 2019 12:01:49 -0400 Subject: [Beowulf] Brief OT: Open positions Message-ID: Hi folks   A brief note, one of my colleagues has 2 open positions in the US; one in Houston TX, the other in Vicksburg MS.  These are hardware/software maintenance on a number of mid sized supercomputers, clusters, and storage.   I have some cloud HPC needs (compute, storage, networking) in my group as well.  More standard "cloudy" things there (yes, $dayjob does cloud!).   Please ping me on my email in .sig or at $dayjob.  Email there is my first initial + last name at cray dot com.  Thanks, and back to your regularly scheduled cluster/super ... :D -- Joe Landman e: joe.landman at gmail.com t: @hpcjoe w: https://scalability.org g: https://github.com/joelandman l: https://www.linkedin.com/in/joelandman From chris at csamuel.org Sat Jul 27 18:35:30 2019 From: chris at csamuel.org (Chris Samuel) Date: Sat, 27 Jul 2019 18:35:30 -0700 Subject: [Beowulf] Lustre on google cloud In-Reply-To: References: Message-ID: <2291540.I7FaeEySmv@quad> On Friday, 26 July 2019 4:46:56 AM PDT John Hearns via Beowulf wrote: > Terabyte scale data movement into or out of the cloud is not scary in 2019. > You can move data into and out of the cloud at basically the line rate of > your internet connection as long as you take a little care in selecting and > tuning your firewalls and inline security devices. Pushing 1TB/day etc. > into the cloud these days is no big deal and that level of volume is now > normal for a ton of different markets and industries. Whilst this is true as Chris points out this does not mean that there won't be data transport costs imposed by the cloud provider (usually for egress). All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA From jaquilina at eagleeyet.net Sat Jul 27 22:07:14 2019 From: jaquilina at eagleeyet.net (Jonathan Aquilina) Date: Sun, 28 Jul 2019 05:07:14 +0000 Subject: [Beowulf] Lustre on google cloud In-Reply-To: <2291540.I7FaeEySmv@quad> References: <2291540.I7FaeEySmv@quad> Message-ID: What would be the reason for getting such large data sets back on premise? Why not leave them in the cloud for example in an S3 bucket on amazon or google data store. Regards, Jonathan -----Original Message----- From: Beowulf On Behalf Of Chris Samuel Sent: Sunday, 28 July 2019 03:36 To: beowulf at beowulf.org Subject: Re: [Beowulf] Lustre on google cloud On Friday, 26 July 2019 4:46:56 AM PDT John Hearns via Beowulf wrote: > Terabyte scale data movement into or out of the cloud is not scary in 2019. > You can move data into and out of the cloud at basically the line rate > of your internet connection as long as you take a little care in > selecting and tuning your firewalls and inline security devices. Pushing 1TB/day etc. > into the cloud these days is no big deal and that level of volume is > now normal for a ton of different markets and industries. Whilst this is true as Chris points out this does not mean that there won't be data transport costs imposed by the cloud provider (usually for egress). All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf From chris at csamuel.org Sat Jul 27 22:53:39 2019 From: chris at csamuel.org (Chris Samuel) Date: Sat, 27 Jul 2019 22:53:39 -0700 Subject: [Beowulf] Lustre on google cloud In-Reply-To: References: <2291540.I7FaeEySmv@quad> Message-ID: <8572521.7sa3Cjtc1F@quad> On Saturday, 27 July 2019 10:07:14 PM PDT Jonathan Aquilina wrote: > What would be the reason for getting such large data sets back on premise? > Why not leave them in the cloud for example in an S3 bucket on amazon or > google data store. Provider independent backup? -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA From jaquilina at eagleeyet.net Sat Jul 27 22:56:08 2019 From: jaquilina at eagleeyet.net (Jonathan Aquilina) Date: Sun, 28 Jul 2019 05:56:08 +0000 Subject: [Beowulf] Lustre on google cloud In-Reply-To: <8572521.7sa3Cjtc1F@quad> References: <2291540.I7FaeEySmv@quad> <8572521.7sa3Cjtc1F@quad> Message-ID: Hi Chris, You kind of lost me. Arent you leaving the data lets say in an S3 bucket or HDFS if you need Hadoop with out removing it or are you saying move it back home to keep cloud costs down? Regards Jonathan -----Original Message----- From: Beowulf On Behalf Of Chris Samuel Sent: Sunday, 28 July 2019 07:54 To: beowulf at beowulf.org Subject: Re: [Beowulf] Lustre on google cloud On Saturday, 27 July 2019 10:07:14 PM PDT Jonathan Aquilina wrote: > What would be the reason for getting such large data sets back on premise? > Why not leave them in the cloud for example in an S3 bucket on amazon > or google data store. Provider independent backup? -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf From deadline at eadline.org Mon Jul 29 11:04:02 2019 From: deadline at eadline.org (Douglas Eadline) Date: Mon, 29 Jul 2019 14:04:02 -0400 Subject: [Beowulf] Lustre on google cloud In-Reply-To: References: <2291540.I7FaeEySmv@quad> Message-ID: <0ead2042098543daf5e29b6c2308c704.squirrel@mail.eadline.org> > What would be the reason for getting such large data sets back on premise? > Why not leave them in the cloud for example in an S3 bucket on amazon or > google data store. I think this touches on the ownership issue I have seen some people mention (I think Addison Snell or i360). That is, you own the data but not the infrastructure. To use the "data lake" analogy, you start out creating a swimming pool in the cloud. You own the water, but it is in someone else's pool. Manageable. At some point your little pool becomes a big lake. Moving the lake, for any number of reasons, become a really big issue and possibly unmanageable. "For any number of reasons" can be cost, performance, access, etc. and the issues you never imagined (a black swan as it were) Just like everything else, it all depends ... (and how risk adverse you are). -- Doug > > Regards, > Jonathan > > -----Original Message----- > From: Beowulf On Behalf Of Chris Samuel > Sent: Sunday, 28 July 2019 03:36 > To: beowulf at beowulf.org > Subject: Re: [Beowulf] Lustre on google cloud > > On Friday, 26 July 2019 4:46:56 AM PDT John Hearns via Beowulf wrote: > >> Terabyte scale data movement into or out of the cloud is not scary in >> 2019. >> You can move data into and out of the cloud at basically the line rate >> of your internet connection as long as you take a little care in >> selecting and tuning your firewalls and inline security devices. >> Pushing 1TB/day etc. >> into the cloud these days is no big deal and that level of volume is >> now normal for a ton of different markets and industries. > > Whilst this is true as Chris points out this does not mean that there > won't be data transport costs imposed by the cloud provider (usually for > egress). > > All the best, > Chris > -- > Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > -- Doug From engwalljonathanthereal at gmail.com Tue Jul 30 16:03:11 2019 From: engwalljonathanthereal at gmail.com (Jonathan Engwall) Date: Tue, 30 Jul 2019 16:03:11 -0700 Subject: [Beowulf] Lustre on google cloud In-Reply-To: <0ead2042098543daf5e29b6c2308c704.squirrel@mail.eadline.org> References: <2291540.I7FaeEySmv@quad> <0ead2042098543daf5e29b6c2308c704.squirrel@mail.eadline.org> Message-ID: AWS has a host of free tier sercives you should blend together. Elastic Beanstalk and Lambda (AWS proprietary lambda) can move lots of data below a cost level. Your volume will automatically cause billing obviously. I have a friend at AWS. Maybe something new is going on, I can check up with him. On Mon, Jul 29, 2019, 11:24 AM Douglas Eadline wrote: > > > What would be the reason for getting such large data sets back on > premise? > > Why not leave them in the cloud for example in an S3 bucket on amazon or > > google data store. > > I think this touches on the ownership issue I have seen some > people mention (I think Addison Snell or i360). That is, you own > the data but not the infrastructure. > > To use the "data lake" analogy, you start > out creating a swimming pool in the cloud. You own > the water, but it is in someone else's pool. Manageable. > At some point your little pool becomes a big lake. Moving the lake, > for any number of reasons, become a really big issue and possibly > unmanageable. > > "For any number of reasons" can be cost, performance, access, > etc. and the issues you never imagined (a black swan as it were) > > Just like everything else, it all depends ... (and how risk adverse > you are). > > -- > Doug > > > > > > > Regards, > > Jonathan > > > > -----Original Message----- > > From: Beowulf On Behalf Of Chris Samuel > > Sent: Sunday, 28 July 2019 03:36 > > To: beowulf at beowulf.org > > Subject: Re: [Beowulf] Lustre on google cloud > > > > On Friday, 26 July 2019 4:46:56 AM PDT John Hearns via Beowulf wrote: > > > >> Terabyte scale data movement into or out of the cloud is not scary in > >> 2019. > >> You can move data into and out of the cloud at basically the line rate > >> of your internet connection as long as you take a little care in > >> selecting and tuning your firewalls and inline security devices. > >> Pushing 1TB/day etc. > >> into the cloud these days is no big deal and that level of volume is > >> now normal for a ton of different markets and industries. > > > > Whilst this is true as Chris points out this does not mean that there > > won't be data transport costs imposed by the cloud provider (usually for > > egress). > > > > All the best, > > Chris > > -- > > Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > > To change your subscription (digest mode or unsubscribe) visit > > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > > To change your subscription (digest mode or unsubscribe) visit > > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > > > > > -- > Doug > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaquilina at eagleeyet.net Tue Jul 30 21:10:12 2019 From: jaquilina at eagleeyet.net (Jonathan Aquilina) Date: Wed, 31 Jul 2019 04:10:12 +0000 Subject: [Beowulf] Lustre on google cloud In-Reply-To: References: <2291540.I7FaeEySmv@quad> <0ead2042098543daf5e29b6c2308c704.squirrel@mail.eadline.org> Message-ID: Hi Jon, They now have Lustre through FSx or what ever AWS have called it. I am not sure you guys have heard about the capital one data breach but at times im still rather weary of the cloud. Regards, Jonathan From: Jonathan Engwall Sent: Wednesday, 31 July 2019 01:03 To: Douglas Eadline Cc: Jonathan Aquilina ; Beowulf Mailing List ; Chris Samuel Subject: Re: [Beowulf] Lustre on google cloud AWS has a host of free tier sercives you should blend together. Elastic Beanstalk and Lambda (AWS proprietary lambda) can move lots of data below a cost level. Your volume will automatically cause billing obviously. I have a friend at AWS. Maybe something new is going on, I can check up with him. On Mon, Jul 29, 2019, 11:24 AM Douglas Eadline > wrote: > What would be the reason for getting such large data sets back on premise? > Why not leave them in the cloud for example in an S3 bucket on amazon or > google data store. I think this touches on the ownership issue I have seen some people mention (I think Addison Snell or i360). That is, you own the data but not the infrastructure. To use the "data lake" analogy, you start out creating a swimming pool in the cloud. You own the water, but it is in someone else's pool. Manageable. At some point your little pool becomes a big lake. Moving the lake, for any number of reasons, become a really big issue and possibly unmanageable. "For any number of reasons" can be cost, performance, access, etc. and the issues you never imagined (a black swan as it were) Just like everything else, it all depends ... (and how risk adverse you are). -- Doug > > Regards, > Jonathan > > -----Original Message----- > From: Beowulf > On Behalf Of Chris Samuel > Sent: Sunday, 28 July 2019 03:36 > To: beowulf at beowulf.org > Subject: Re: [Beowulf] Lustre on google cloud > > On Friday, 26 July 2019 4:46:56 AM PDT John Hearns via Beowulf wrote: > >> Terabyte scale data movement into or out of the cloud is not scary in >> 2019. >> You can move data into and out of the cloud at basically the line rate >> of your internet connection as long as you take a little care in >> selecting and tuning your firewalls and inline security devices. >> Pushing 1TB/day etc. >> into the cloud these days is no big deal and that level of volume is >> now normal for a ton of different markets and industries. > > Whilst this is true as Chris points out this does not mean that there > won't be data transport costs imposed by the cloud provider (usually for > egress). > > All the best, > Chris > -- > Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > -- Doug _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ghenriks at gmail.com Wed Jul 31 17:45:37 2019 From: ghenriks at gmail.com (Gerald Henriksen) Date: Wed, 31 Jul 2019 20:45:37 -0400 Subject: [Beowulf] Lustre on google cloud In-Reply-To: References: <2291540.I7FaeEySmv@quad> <0ead2042098543daf5e29b6c2308c704.squirrel@mail.eadline.org> Message-ID: On Wed, 31 Jul 2019 04:10:12 +0000, you wrote: >They now have Lustre through FSx or what ever AWS have called it. I am not sure you guys have heard about the capital one data breach but at times im still rather weary of the cloud. Not sure what the Capital One data breach has to do with the cloud, it was (yet again?) misconfigured software that allowed the theft. From jaquilina at eagleeyet.net Wed Jul 31 22:05:48 2019 From: jaquilina at eagleeyet.net (Jonathan Aquilina) Date: Thu, 1 Aug 2019 05:05:48 +0000 Subject: [Beowulf] Lustre on google cloud In-Reply-To: References: <2291540.I7FaeEySmv@quad> <0ead2042098543daf5e29b6c2308c704.squirrel@mail.eadline.org> Message-ID: Hi Gerald, I think the question is how do these cloud providers let such misconfigurations get through to production systems. Arent audits carried out to ensure that this doesn’t happen? Regards, Jonathan -----Original Message----- From: Beowulf On Behalf Of Gerald Henriksen Sent: Thursday, 1 August 2019 02:46 To: Beowulf at beowulf.org Subject: Re: [Beowulf] Lustre on google cloud On Wed, 31 Jul 2019 04:10:12 +0000, you wrote: >They now have Lustre through FSx or what ever AWS have called it. I am not sure you guys have heard about the capital one data breach but at times im still rather weary of the cloud. Not sure what the Capital One data breach has to do with the cloud, it was (yet again?) misconfigured software that allowed the theft. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf From hearnsj at googlemail.com Wed Jul 31 22:05:45 2019 From: hearnsj at googlemail.com (John Hearns) Date: Thu, 1 Aug 2019 06:05:45 +0100 Subject: [Beowulf] Lustre on google cloud In-Reply-To: References: <2291540.I7FaeEySmv@quad> <0ead2042098543daf5e29b6c2308c704.squirrel@mail.eadline.org> Message-ID: The RadioFreeHPC crew are listening to this thread I think! A very relevant podcast https://insidehpc.com/2019/07/podcast-is-cloud-too-expensive-for-hpc/ Re Capital One, here is an article from the Register. I think this is going off topic. https://www.theregister.co.uk/2019/07/30/capital_one_hacked/ On Thu, 1 Aug 2019 at 01:45, Gerald Henriksen wrote: > On Wed, 31 Jul 2019 04:10:12 +0000, you wrote: > > >They now have Lustre through FSx or what ever AWS have called it. I am > not sure you guys have heard about the capital one data breach but at times > im still rather weary of the cloud. > > Not sure what the Capital One data breach has to do with the cloud, it > was (yet again?) misconfigured software that allowed the theft. > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: