[Beowulf] Containers in HPC

Dernat Rémy remy.dernat at umontpellier.fr
Sun May 26 08:32:37 PDT 2019


Hi,

First of all, thanks for sharing all of these informations.

I also participated in a similar study in 2017 : 
https://arxiv.org/pdf/1709.10140.pdf

Eduardo is now working for Sylabs.

I also did some talks in France about containers in the HPC :

   eg.: http://devlog.cnrs.fr/_media/jdev2017/dev2017_p8_rdernat_short.pdf


Those technologies changed a lot since then, but conclusions are almost 
the same, even if this paper (https://arxiv.org/abs/1905.08415) focused 
on MPI.

I am using Singularity since version 1 and installed it in our HTC 
systems (in production) since march 2017 from version 2.2. It works 
well. I noticed some issues from time to time and submitted issues on 
github, or for new features. Sometimes, I even wrote some piece of code 
or doc for them or for the container ecosystem. The sylabs team is 
pretty active, and fixed all of those bugs, except one (CRIU / DMTCP 
[*]). I am happy to have installed it from all this time, since I am 
working mainly with a bioinformatic community of users. The 
bioinformatic software landspace is full of different languages with 
many dependencies. Before Singularity, I installed all softwares and 
packages, statically. At the end there were more than 150 applications, 
but the mixture of these applications was the most complicated part. 
Limitations of Singularity include MPI jobs where you still need to be 
very carefull in the MPI versions you use (container vs host).

I am also using nvidia-docker with the user namespace for GPUs, for 
about 2 years now. This is a standalone service, not connected to our 
clusters, mostly used for Deep Learning. However, I tried Singularity on 
it a while ago (using gpu4singularity code from NIH; before "--nv" 
option appears in Singularity) and it also works fine.

IMHO, the main downside of singularity, is that Sylabs are developping 
new releases too quickly, and most of the time, new releases are 
targeting security bugs... So, as an admin, you need to upgrade quite 
often, if your OS is still compatible with it... Alternatives are to use 
charliecloud, udocker or shifter. But we also chose Singularity b/c it 
was the most active project, with many contributors. Some people could 
also argue that Singularity is moving too fast to a "cloud model", 
allowing the use of K8s, but I not agree as HPC community is here from 
the beginning, for them, and I think (almost for sure) that they won't 
turn their back on them.

Note that some new big players could become interesting in the future 
b/c big companies are developing those products (Katacontainers, with a 
mix of advantages between VM and containers, supported by Intel, using a 
lightweight qemu, and Podman with RedHat from the atomic project). 
However, from this time, HPC is not in their priority list, and RedHat, 
as well as nvidia, already collaborated with Singularity (in many ways).

IMO all HPC systems should now allow users to launch "containerized" 
jobs on it (and not using Docker for obvious security reasons). We have 
now many scientific workflows which are designed to run either on 
HPC/HTC/Grids or clouds platforms. If, as a big HPC platform sysadmin, 
you don't allow that kind of jobs, the risk for your platform is to see 
many people leaving the platform to compute elsewhere (eg. in the cloud 
or another platform allowing to perform containerized tasks), even if 
you could argue that they won't benefit the best hardware performances 
(if users already have a pipeline or a single application using 
containers (many apps are now "containerized"), they won't be very 
excited in converting everything to get it work on your HPC system 
[**]). So you have to be proactive, and collaborate with your users, to 
provide the best and secure platform to run those jobs.

We are currently developing a WebUI that is generating docker and 
Singularity recipes [***]; every contributions are welcome (note that 
some issues descriptions are still in french). This is still a WIP, and 
recipes may be not secure for now.


Best regards,

Rémy.

[*] Since version 3.2, Singularity provides a way to stop/resume jobs 
with the OCI subcommands using cgroup freezer.

[**] Ok, many developers worked hard in the past, to get their software 
work on HPC systems (using OpenMPI, cuda, OpenACC, or whatsoever), and 
they will continue, for sure, but most of academic/scientific users did 
not have any easy access to a cloud (and now, many users have that kind 
of access... ?)... What would be the gain for their users between using 
an optimized app on a HPC system, and distributing their (sequentials ?) 
jobs among many clouds... (that is a real question, and I think there is 
no good answer, as it depends on the size and the type of the problem 
and the software used, but I can be wrong about it...) ?

[***] https://gitlab.mbb.univ-montp2.fr/jlopez/wicopa/

    Contributors need an account on our gitlab; you can email me or
    create an issue here
    (https://kimura.univ-montp2.fr/calcul/helpdesk_NewTicket.html) to
    get one.

Le 26/05/2019 à 14:17, Benjamin Redling a écrit :
> Good news.
>
> I'll try it out, again.
>
> Am 26. Mai 2019 13:57:05 MESZ schrieb INKozin <i.n.kozin at googlemail.com>:
>> for what it's worth, Singularity worked well for me last time I tried
>> it.
>> I think it was shortly after NVIDIA had announced support for it.
>>
>> On Sun, 26 May 2019 at 11:11, Benjamin Redling
>> <benjamin.rampe at uni-jena.de>
>> wrote:
>>
>>> On 23/05/2019 16.13, Loncaric, Josip via Beowulf wrote:
>>>> "Charliecloud" is a more secure approach to containers in HPC:
>>> I tried Singularity short before and during 2.3 with GPUs -- didn't
>>> work, documented issue, maybe solved. Stopped caring.
>>>
>>> Shortly afterwards I read about Charliecloud and tried it -- didn't
>>> work, too many issues. Stopped caring.
>>>
>>> So, "more secure" on paper (less lines of code) doesn't get any work
>> done.
>>> My advice to anyone with a working setup: try it out if time permits,
>>> but don't bother to much and definitively don't advertise it to third
>>> parties beforehand.
>>>
>>> Regards,
>>> Benjamin
>>> --
>>> FSU Jena | https://JULIELab.de/Staff/Redling/
>>> ☎  +49 3641 9 44323
>>> _______________________________________________
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>> Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>>
-- 
Dernat Rémy
Plateforme MBB - ISEM Montpellier

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20190526/98a1bd3d/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3334 bytes
Desc: Signature cryptographique S/MIME
URL: <http://beowulf.org/pipermail/beowulf/attachments/20190526/98a1bd3d/attachment-0001.bin>


More information about the Beowulf mailing list