[Beowulf] HPC workflows

Bogdan Costescu bcostescu at gmail.com
Wed Nov 28 03:32:33 PST 2018


On Mon, Nov 26, 2018 at 4:27 PM John Hearns via Beowulf <beowulf at beowulf.org>
wrote:

> I have come across this question in a few locations. Being specific, I am
> a fan of the Julia language. Ont he Juia forum a respected developer
> recently asked what the options were for keeping code developed on a laptop
> in sync with code being deployed on an HPC system.
>

In keeping with the rest of the buzzwords, where does CI/CD fit between
"code developed" and "code being deployed"? Once you have a mechanism for
this, can't this be used for the final deployment? Or even CD could
automatically take care of that final deployment?


> There was some discussion of having Git style repositories which can be
> synced to/from.
>

Yes, that would work fine. Why would git not be compatible with an HPC
setup? And why restrict yourself to git and not talk about distributed
version control systems in general?


> My suggestion was an ssh mount of the home directory on the HPC system,
> which I have configured effectively int he past when using remote HPC
> systems.
>

I don't quite parse the first part of the phrase - care to
reformulate/elaborate?


> Again their workflow is to develop on the laptop and upload code to Github
> type repositories. Then when running on a cloud service the software ids
> downloaded from the Repo.
>

The way I read it, this is very much restricted to code that can be run
immediately after download, i.e. using a scripting language. That might fit
your HPC universe, but the parallel one I live in still mostly runs code
built and maybe even optimized on the HPC system it runs on. This includes
software delivered in binary form from ISVs, open source code (f.e.
GROMACS), or code developed in-house - they all have in common using an
internode (f.e. MPI) or intranode (OpenMP, CUDA) communication and/or
control library directly, not through a deep stack.


> There are of course HPC services on the cloud, with gateways to access
> them.
>
> This leads me to ask - shoudl we be presenting HPC services as a 'cloud'
> service, no matter that it is a non-virtualised on-premise setup?
>

What's in a name? It's called cloud computing today, but it was called grid
computing 10-15 years ago...

For many years, before the cloud-craze began, scientists might have had
access to some HPC resources in their own institution, in other
institutions in the same city, country, continent or even across
continents. How is this different from having access to an on-premise
install of f.e. OpenStack or a cloud computing offer somewhere else also
using OpenStack? The only advantage in some cases is that the on-premise
stuff might be better integrated with the "home" setup (i.e. common file
systems, common user management, or - why not? - better documentation :)),
which improves the user experience, but the functionality is very similar
or the same.

To come back to your initial topic - a git repo can just as well be sync-ed
to a login node of a cluster (wherever that is located) or to a VM in the
AWS cloud (wherever that is located).


> I think out loud that many HPC codes depend crucially on a $HOME directory
> being presnet on the compute nodes as the codes look for dot files etc. in
> $HOME. I guess this can be dealt with by fake $HOMES which again sync back
> to the Repo.
>

I don't follow you here... $HOME, dot files, repo, syncing back? And why
"Repo" with capital letter, is it supposed to be a name or something
special?

In my HPC universe, people actually not only need code, but also data -
usually LOTS of data. Replicating the code (for scripting languages) or the
binaries (for compiled stuff) would be trivial, replicating the data would
not. Also pulling the data in or pushing it out (f.e. to/from AWS) on the
fly whenever the instance is brought up would be slow and costly. And by
the way this is in no way a new idea - queueing systems have for a long
time the concept of "pre" and "post" job stages, which could be used to
pull in code and/or data to the node(s) on which the node would be running
and clean up afterwards.

Cheers,
Bogdan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20181128/b1fcb5dd/attachment.html>


More information about the Beowulf mailing list