[Beowulf] Definition of HPC

Lux, Jim (337C)
Wed Apr 17 17:44:20 PDT 2013

I think this gets back to the scalability and partitionability.

If you could run your experiment on a small cheap cluster *that isn't shared*, nobody would care whether you have root, any more than whether you have root on your desktop computer.

The problem is when the problem is
a) large enough to require a big expensive cluster -and-
b) infrequent enough that you can't justify buying that cluster just for you

And now you're into "how do you share, when you want bare metal access"

One approach is to have an air-gapped isolated cluster.  You come in, you do whatever, and when you're done it's wiped clean.  That has substantial setup and teardown costs.

On the other hand, if you want simultaneous sharing (user A gets 500 nodes, user B gets 500 nodes), I think that's fundamentally incompatible with bare metal/root access. 

IT shops are used to the "how do we simultaneously share a big expensive box among many users".. the processes and politics have been worked out over the last 50 years, along with gory details of chargebacks, dynamic pricing, etc.  And, it tends to be pretty regimented.. many users, especially in a "computing" environment, implies many diverse needs, so that big piece of iron will have lots of interfaces, lots of configurable whatsits, etc.   That makes it complex to administer, because when you reconfigure after user A leaves, but before user C starts up, and without interrupting user B, that is a challenging problem.

The default answer is always going to be "no".  Saying "No" makes one person unhappy, but keeps the other N-1 people happy.

I think ultimately, that "doing dangerous things requires dedicated facilities, and so it's expensive".   Rocket engine or Energetic Compound development is expensive partly because you need a place to test them, not because the engineering is any more difficult than other engineering. (look up C2N14.. 
"An Energetic and Highly Sensitive Binary Azidotetrazole"..  a blog quote: " Never forget, the biggest accomplishment in such work is not blowing out the lab windows.")

Jim Lux

From: Max R. Dechantsreiter
Sent: Wednesday, April 17, 2013 2:32 PM
To: landman at scalableinformatics.com
Cc: beowulf at beowulf.org
Subject: Re: [Beowulf] Definition of HPC


> So, I am sorry ... if you *require* root to perform your work on a 
> regular basis, chances are, you are one misstep from misfortune, and 
> its quite likely to be self-inflicted.

What about dropping page cache?  What about setting up to run in "turbo mode" (with Intel processors)?  There are a number of relatively minor functions accessible only via root (or sudo) that could be important for performance testing.

Administrators don't have to give root access to everybody, just because they give it to a few - it's not a democracy.
But I should be able to lobby for my needs, and expect my actions to be carefully scrutinized, knowing consequences of abuse would be serious and far-ranging.  You've heard of system logs, I presume....

> But back to the running with scissors down broken staircases, in the 
> dark, with low coefficients of friction on the stable steps, and many 
> missing or unstable steps ... that is running as root.  Make sure you 
> have good, recent backups, and you test that your backups are recent, 
> and correct, before you go break something important. And if you rely 
> upon external support, make darned sure they have a clue.

Running as root always makes me very nervous, which is why I avoid it at all costs.

In fact almost all of what I need could be wrapped either in sudo commands, or in special batch queues having the desired properties.  THE PROBLEM with shared resources is that their administrators are too hidebound to negotiate such needs, in far too many cases.

> Running as root?  Yeah, its that bad.  Just say no.

Are you setting yourself up as arbiter of who should and who should not run as root?  Please - respect those of us who have the capabilities, experience, and juice to do so (when cirumstances demand it).

