[Beowulf] Definition of HPC

Joe Landman landman at scalableinformatics.com
Thu Apr 18 11:55:53 PDT 2013

On 04/18/2013 02:45 PM, Adam DeConinck wrote:
> Tying in another recent discussion on the list, "root access" is
> actually one of the places I've seen some success using Cloud for HPC.
> It costs more, it's virtualized, and you usually can't get
> HPC-specialized hardware, so it's obviously not a silver bullet for
> all kinds of systems research... but on the other hand, you're free of
> sysadmin tyranny and can experiment as much as you like. And none of the
> other users will scream at you when they learn you were responsible for
> killing their jobs, *again*.
> As a tyrannical sysadmin myself ;-) , I've helped some of my users

[I am not BOFH ... I am not BOFH ... I am not BOFH ...]

> build ephemeral cloud clusters and shared configuration management
> scripts so their cloud systems "feel" a lot like the in-house system.
> A few times this has even resulted in changes we've pulled back into
> the shared cluster, as sudo-able commands or job options.
> (I one had the "pleasure" of helping to clean up a small cluster where
> over a third of the users had unlimited sudo rights. Only time I've ever
> seen users made *happy* by the introduction of a ticketing system and
> change management...)

Heh ... Nothing quite draws people to praying to then cursing their 
favorite deity as when your system goes down at an inopportune moment.  
Usually right before something is due/needed/...

We get called when this happens, and I swear, we should get continuing 
education in counseling credits for all the listening we've done on why 
it is so very important that this not die, that it couldn't possibly be 
their fault (usually is), and if the damn hardware could just like not 
crash when 10000 NFS mounts suddenly decide to play whack a mole ... or 
its the OSes fault it couldn't survive an accidental rm -rf ~/work /dev 
(when they meant rm -rf ~/work/dev)...

Running with scissors rarely ends well.  People questioning why its bad 
using silly analogies is also symptomatic of the computing culture 
issues we collectively need to address.  Practice safe computing 
please!  Safe practices make for happy users.

And don't get me started on "I have RAID therefore I don't need a backup".

