[Beowulf] Docker vs KVM paper by IBM

Wed Jan 28 12:05:40 PST 2015

On 01/28/2015 02:16 PM, Gavin W. Burris wrote:
> Didn't mean to upset you there, Ellis.  I'm talking about every other
> discipline that isn't CSE.  I encourage researchers to NOT be their own
> IT department, so that their time is freed up to do research.  Obviously
> if your research IS the system, that is the exception.

I've only been on one side of the fence, but here is my perspective on 
the computational sciences and sysadmin relationship:

There's effectively a bell-curve of users.  With perfectly average 
sysadmins, the bell-curve looks pretty normal.  On the far left-tail 
you've got your users whose programs and research operates PERFECTLY 
under the current regime...err...toolchain provided by said sysadmin.

The next quartile up, on the left side of perfect average, you have a 
good bulk of researchers who truly don't want to be sysadmins and who 
are willing to change their programs to fit into the available 
toolchain.  The cost in this case is put on the researcher to change her 
programs and spend hours all over the interwebs figuring out why such 
and such compilation failed.

Just over the line in the third quartile we have another bulk of the 
researchers who are just savvy enough to work around the toolchains of 
the sysadmins, either via homedir path and lib manipulation, chroots, or 
downright bribing/stealing root somehow and installing into public 
paths.  The cost generally manifests itself on the IT budget paying for 
sysadmins to fix this "just savvy enough to be dangerous" user's crap 
up, and on other researchers whose code now doesn't compile or run 
because the toolchain has been mucked with.

In the last, far right-most tiny quartile, we have those researchers who 
actually enjoy some amount of being sysadmins and are relatively as 
capable as the departmentally paid ones.  It's faster for them to just 
handle things themselves.  They WILL get around you, no matter what you 
do, they'll enjoy doing so, and they'll have the wherewithal to know if 
nobody knows all the better.  If you resist, they'll just make things 
painful for everyone, and no amount of stick-wielding will dissuade them.

On the two far tails the aggregate costs are generally low.  In the 
middle costs tend to be high.  Offering multiple toolchains on a single 
machine is non-trivial, and dealing with those who force multiple 
toolchains/drivers/kernels/whatever into such a setup is expensive to 
correct.

So, the obvious answer here is, provide your "standard operating 
environments" in the form of containerized/VM/whatever images quartiles 
1 and 2 can use, and allow quartiles 3 and 4 to spin up their own. 
Multiple environments means quartile 2 can probably just try their 
program A on environments X, Y, and Z, and find one that "just works." 
This reduces their time futzing with compilers or fixing other 
researcher's crappy code that breaks on GCC > 4.x.  Quartile 3 can spin 
up their own absolutely crap environment and think their L33t and not 
screw over their fellow researchers.  Quartiles 1 and 4 are basically 
untouched, since they were fine before as now.

Everybody wins, probably most of all the IT department.

Best,

ellis