Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Again about NUMA (numactl and taskset)

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Mark Hahn hahn at mcmaster.ca
Mon Jun 23 08:25:28 PDT 2008


> The questions are
> 1) Is there some way to distribute analogously the local memory of threads (I 
> assume that it have the same size for each thread) using "reasonable" NUMA 
> allocation ?

that is, not surprisingly, the default.  generally, on all NUMA machines,
the starting rule is that memory is allocated for a thread upon "first
touch".  that is, the first thread to touch it, causing a page fault and 
triggering the actual allocation.  (if you allocate memory but never 
touch it, it remains purely virtual, ignoring any book-keeping by your 
memory allocation library, if any.)

> 2) Is it right that using of numactl for applications may gives improvements 
> of performance for the following case:
> the number of application processes is equal to the number of cores of one 
> CPU *AND* the necessary (for application) RAM amount may be placed on one 
> node DIMMs (I assume that RAM is allocated "continously").

you certainly don't want to _deliberately_ create imbalances.
"numactl --hardware" is interesting to see the state of memory allocation.
of course, it reflects only size and free (where free means "wasted" to the
kernel, not the same as "freeable".)



> What will be w/performance (at numactl using) for the case if RAM size 
> required is higher than RAM available per one node, and therefore the program 
> will not use the possibility of (load balanced) simultaneous using of memory 
> controllers on both CPUs ?

non-local memory is modestly slower than local - not dramatically.

> (I also assume also that RAM is allocated 
> continously).

I'm not sure what that means - continuously in time?  or contiguously?
the latter is definitely not true - the allocated memory map for a task
will normally be pretty chopped up, and the virtual addresses will have 
little relation to physical addresses.

> 3) Is there some reason to use things like
> mpirun -np N /usr/bin/numactl <numactl_parameters>  my_application   ?

not that I know.

> 4) If I use malloc()  and don't use numactl, how to understand - from which 
> node Linux will begin the real memory allocation ? (I remember that I assume

if there is free memory on the node where the thread is running, 
that's where the physical page will be allocated.

> that all the RAM is free) And how to understand  where are placed the DIMMs 
> which will corresponds to higher RAM addresses or lower RAM addresses ?

I don't see why userspace would need to know that.  the main question is 
whether non-local allocations are allowed or not, and you set that policy
with numactl --localalloc (or override with --preferred, etc)

> 5) In which cases is it reasonable to switch on "Node memory interleaving" 
> (in BIOS) for the application which uses more memory than is presented on the 
> node ?

I leave it off, since numactl --interleave lets you get the same effect 
from user-space.  I'm not sure I've ever seen it be a win.

> And BTW: if I use taskset -c CPU1,CPU2, ... <program_file>
> and the program_file creates some new processes, will all this processes run 
> only on the same CPUs defined in taskset command ?

afaik, scheduler settings like this are indeed inherited across clone,
possibly also fork.



More information about the Beowulf mailing list