[Beowulf] Anybody using Redhat HPC Solution in their Beowulf

Purvis, Cameron ckpurvis at ua.edu
Thu Oct 28 08:51:25 PDT 2010

> You're asking the CS department (full of researchers wanting 
> to do novel research for their dissertation or to move them 
> towards tenure) to be sysadmins.  Being an SA is fun, once.
An IT guy here:  A challenge at my institution is that these systems are usually by faculty or by undergrad / grad students.  Students eventually leave, and here that (often) eventually ends up with IT closing the gap on management and support.  There's a lot of good to that, from my IT perspective.  I'd like to keep system administration 'out of the way' of faculty so they can focus on research.  Centrally managed HPC helps manage that;  here we just aren't able to deliver the same level of support to standalone clusters since there's a lot of variation in software, hardware, schedulers, et c.  We end up a mile wide and an inch deep with our departmental skills. 

> Yes, but that would mean more like "sharing a cluster" as 
> opposed to CS providing support and SA services.  And "sharing 
> a cluster" means that the cluster architecture has to be 
> appropriate for both people, which is challenging as has been 
> addressed here recently.  Then there's the "if you're getting 
> benefit, can you kick in part of the cash needed" which 
> gets into a whole other area of complexity.

For our shared system we're going to use good scheduler for fair-share based on user contributions to the system, but let non-contributors use the system just at a lower priority than the funding partners.  We're also managing different node types/builds into different node groups, though we have to limit the different node types to keep things manageable.

> The institution steps in and says, cease this wasteful squabbling, 
> henceforth all computing resources will be managed by the 
> institution: "to each according to their needs", and we'll come 
> up with a fair way to do the "from each according to their ability". 
> Just submit your computing resource request to the steering 
> committee and we'll decide what's best for the institution overall.
> Yes.. Local control of a "personal supercomputer" is a very, very nice thing.

We're doing exactly this.  Researchers have been rolling their own systems, and running them in separate labs, for years.  We are hitting power and cooling limitations in those spaces as cluster needs grow.  The centralized system (usually ) lets us get higher utilization but it inherently demands resource sharing - that's more a political problem than a technical one.  But if we can't build policies to make the shared platform useful, we haven't really created much value, have we?  
Waiting on a resource is annoying when you just want your job to run NOW, especially if that you put money into the system.  Physically hosting clusters in the data center requires some controls (defending against physical sprawl, mainly) but helps retain control of the system for the owner and get the data center facilities, but limits physical access to the hardware.  

Deskside clusters and single-user or departmental viz systems aren't really candidates for relocation because the user HAS to touch them.  We're hoping to focus on smaller jobs and test runs on those, with the bigger jobs in the data center.  

Of course, this isn't appealing to everyone.  Even with central IT support of HPC resources, I don't think it's realistic to centralize ALL cluster services.  I anticipate a shared system for our faculty who don't have a lot of money for HPC or who don't have the time or technical resources to administer it.  The heavy hitters who do a LOT of HPC will probably still have needs for their own clusters, especially if they have unique requirements that don't fit in a central system.

Cameron Purvis
University of Alabama Office of Information Technology 
Research Support

More information about the Beowulf mailing list