Linux cpusets and HPC (was Re: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem?)
Paul Jackson
pj at sgi.com
Thu Aug 14 01:56:02 PDT 2008
Chris wrote:
> The main purpose we're using them for is a quick and
> easy way to catch users who don't know better doing
> things like running an OpenMP code as a single CPU job
> and overloading a node (and causing chaos for other
> users) when it discovers 8 cores.
Let me see if I understand this. Is the following right:
Without the cpuset constraint, such a 'bad' job could tell the
cluster management software (PBS or Torque or ...) it needed just
one CPU, which could end up putting it on a cluster node with
say eight CPUs, along with some other jobs that expect to use the
other seven CPUs.
But then OpenMP code in that 'bad' job could notice it had eight
CPUs, think to itself 'wow - cool', and proceed to hog all eight
CPUs, messing up those other jobs.
With the cpuset constraint, that 'bad' job -will- only be able to
use that one CPU, and if OpenMP or other code in that job can't deal
reasonably with that circumstance, well, tough, the owner of that
job should fix something. But at least the other jobs that were
hoping to use the other seven CPUs won't be bothered much by this.
Did I say that right?
> http://www.supercluster.org/pipermail/torquedev/2007-November/000748.html
> http://www.supercluster.org/pipermail/torquedev/2008-January/000842.html
> http://www.clusterresources.com/wiki/doku.php?id=torque:3.5_linux_cpuset_support
Thanks for the links!
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj at sgi.com> 1.940.382.4214
More information about the Beowulf
mailing list