[Beowulf] first cluster

Tim Cutts tjrc at sanger.ac.uk
Mon Jul 19 01:54:28 PDT 2010


On 16 Jul 2010, at 6:11 pm, Douglas Guptill wrote:

> On Fri, Jul 16, 2010 at 12:51:49PM -0400, Steve Crusan wrote:
>> We use a PAM module (pam_torque) to stop this behavior. Basically, if you
>> your job isn't currently running on a node, you cannot SSH into a node.
>> 
>> 
>> http://www.rpmfind.net/linux/rpm2html/search.php?query=torque-pam
>> 
>> That way one is required to use the queuing system for jobs, so the cluster
>> isn't like the wild wild west...
> 
> Ah Ha!.  The key.

It's a very neat idea, but it has the disadvantage - unless I'm misunderstanding - that if the job fails, and leaves droppings in, say, /tmp on the cluster node, the user can't log in to diagnose things or clean up after themselves.

Tim

-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 




More information about the Beowulf mailing list