[Beowulf] Docker vs KVM paper by IBM
Gavin W. Burris
bug at wharton.upenn.edu
Thu Jan 29 06:15:37 PST 2015
On 05:09PM Wed 01/28/15 -0800, Egan Ford wrote:
> On Wed, Jan 28, 2015 at 11:00 AM, Gavin W. Burris <bug at wharton.upenn.edu> wrote:
> > I guess I would have to ask a few questions of the developer considering
> > docker... WHY do you need to be outside of a self-contained directory?
>
> Given that this is mostly an HPC crowd, this answer may not be 100%
> relevant, but I'll try to answer anyway and throw in a few opinions
> along the way.
>
> I'm finding that more and more developers and open source projects
> have a heavy dependency on other services, libraries, and code. Ask
> any top Python programmer how to start a new project and it will start
> with "virtualenv"--basically a chroot type of solution for Python,
> including different Pythons and their libs/packages.
Yes, good stuff there. We have largely "standardized" on Python to
minimize the ground we have to cover, and virtualenv was the stuff.
Developers can install any-and-every module at any version per git repo
/ project. This is still a headache, though, when an application makes
the transition from dev to prod. There is no hard answer about how to
best keep things patched and updated in a production environment if devs
can go crazy with a dozen previously unseen modules. It is all in a
contained directory, but it still has the patching / updating, without
breaking, problem. Instead of one central module to maintain, there are
virtualenvs all over the place. I would say the goal is to centralize
those env dependencies on production. BUT, if it is forever
research/dev code, go crazy, in your own contained world. This seems to
be the promise of Docker.
>
> I'm sure Ruby and node.js has something similar.
>
> If the app uses an Flask or Unicorn and needs to be frontended with a
> web server, then you have the complexity of supporting many other
> components and trying to get them to play together nicely. Then
> there's the databases and trying to maintain all the different table
> spaces and security, etc...
>
> It's not an impossible problem, sys admins have been dealing with this
> for a very long time. The challenge is that the number of
> environments to deploy applications has exploded. So your admins have
> to know everything or limit what the users can develop. In my DevOps
> env. I have to deal with Ruby, Python, node.js, and Java. Each may
> require different versions.
The researcher/dev vs admin/ops dynamic is definitely at play here. My
stance is still that devs should try to target known modules, and admins
should be flexible to support additional ones with a reasonable and
generous time commitment. This should be true of all apps, not just
Python modules.
>
> VMs solve a lot of that problem, however at a greater cost. VM's
> usually have a static memory foot print (esp. in the cloud). It's
> possible to have 90% of your memory assigned to VMs, but not used by
> the applications in the VMs. Containers are just processes that use
> what they need (and can be limited). In my own experimentation I've
> been able to reduce 20 1GB VMs running 20 services into 20 containers
> on a single 8GB VM. 4GB of my RAM is still unused. Sure I could also
> spend 100s of hours getting all 20 services to play nice on a single
> OS, but one problem with one service can take down the others. I also
> have different admins assigned to different services. With containers
> I have them fenced off.
Thin provisioning goes a long way here for CPU, memory AND storage.
We've been pretty happy thin provisioning our VMs and our NFS shares.
>
> Because I have to pay for my VMs in the cloud, using containers has
> measurably reduced my cost.
>
> Other benefits has been time. I do not have to figure out how to get
> Flask and Unicorn to play nice with Nginx. When I need a new gitlab
> instance I just create a new set of three containers linked together
> and it does not impact my other instances.
>
> Containers are not perfect, but for me they reduce my costs and
> complexity while saving me a lot of time and hassle. Every container
> or cluster of containers is a single app. Makes life really easy.
> Yes you can do with VMs, but for me, it's too costly.
>
> Lastly for developer productivity I use Docker on my Mac. It's really
> a VirtualBox VM with Linux with Docker installed. I've got about 3-5
> containers running various services and tools that I will use later in
> upgrading my production environment. I've tried that in VMs before.
> It was slow, painful, and not as easy to automate. Docker is stupid
> simple to use with it's Python APIs. You can learn it in 5 min.
>
> Anyway to directly answer your question. Containers is how I put
> complexity into a self-contained directory with no limitations.
Docker has been our go-to solution for reproducibility of dev
environments, with virtualenvs inside. Will have to give containers a
hard look in this area, too. Thanks.
>
> Oh, let me close with, developers like to bring their own stack. It's
> not uncommon. In 2003-2004 I worked on the TeraGrid. Every week all
> four of the original sites got on the phone and debated the SW stacks.
> Only if they were the same could applications run across the grid.
> That inspired me to explore stateless provisioning. In 2005 I worked
> with Adaptive computing and we got Moab talking to xCAT so that we
> could provision any stateless OS/stack on demand on bare-metal.
> Bring-your-own-stack. We call that cloud now. There was demand for
> it then as there is now. Containers makes this really easy for both
> the admin and the developer. The admin can provide some constraints
> (it's not the free for all with VMs and BM where you the developer
> have to provide an entire OS image), and the developers get a bit a
> structure, but the freedom to be as lazy and dumb as they want to so
> that they can get results faster. And the admin does not have to be
> bothered with setting up libs, chroot, modules, etc... And if the
> admin has to provide a base, well Docker supports that to and you just
> put in a registry.
I think this is where I start getting anxious, opening the doors to
support any OS with any stack. I would much rather push it the other
way. The language environment should be cross-platform and well
supported, so that production can support one OS well. My inclination
is to Keep It Simple Stupid, and not add additional layers of
complexity.
>
> If you are a goal oriented admin/developer, then containers are your
> friends. :-)
Noted. Stop making SENSE, Egan.
>
> Cheers,
>
> Egan
Cheers.
--
Gavin W. Burris
Senior Project Leader for Research Computing
The Wharton School
University of Pennsylvania
More information about the Beowulf
mailing list