Linux cpusets and HPC (was Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?)

Paul Jackson pj at sgi.com
Thu Aug 14 01:56:02 PDT 2008


Chris wrote:
> The main purpose we're using them for is a quick and
> easy way to catch users who don't know better doing
> things like running an OpenMP code as a single CPU job
> and overloading a node (and causing chaos for other
> users) when it discovers 8 cores.

Let me see if I understand this.  Is the following right:

  Without the cpuset constraint, such a 'bad' job could tell the
  cluster management software (PBS or Torque or ...)  it needed just
  one CPU, which could end up putting it on a cluster node with
  say eight CPUs, along with some other jobs that expect to use the
  other seven CPUs.

  But then OpenMP code in that 'bad' job could notice it had eight
  CPUs, think to itself 'wow - cool', and proceed to hog all eight
  CPUs, messing up those other jobs.

  With the cpuset constraint, that 'bad' job -will- only be able to
  use that one CPU, and if OpenMP or other code in that job can't deal
  reasonably with that circumstance, well, tough, the owner of that
  job should fix something.  But at least the other jobs that were
  hoping to use the other seven CPUs won't be bothered much by this.

Did I say that right?


> http://www.supercluster.org/pipermail/torquedev/2007-November/000748.html
> http://www.supercluster.org/pipermail/torquedev/2008-January/000842.html
> http://www.clusterresources.com/wiki/doku.php?id=torque:3.5_linux_cpuset_support

Thanks for the links!

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj at sgi.com> 1.940.382.4214



More information about the Beowulf mailing list