[Beowulf] DMA Memory Mapping Question

Chris Samuel csamuel at vpac.org
Wed Feb 21 16:45:53 PST 2007


Hi folks,

We've got an IBM Power5 cluster running SLES9 and using the GM drivers.

We occasionally get users who manage to use up all the DMA memory that is 
addressable by the Myrinet card through the Power5 hypervisor.

Through various firmware and driver tweaks (thanks to both IBM and Myrinet) 
we've gotten that limit up to almost 1GB and then we use an undocumented 
environment variable (GMPI_MAX_LOCKED_MBYTE) to say only use 248MB of that 
per process (as we've got 4 cores in each box), which we enforce through 
Torque.

The problems went away.  Or at least it did until just now. :-(

The characterstic error we get is:

[13]: alloc_failed, not enough memory (Fatal Error)
        Context: <(gmpi_init) gmpi_dma_alloc: dma_recv buffers>

Now Myrinet can handle running out of DMA memory once a process is running, 
but when it starts it must be able to allocate a (fairly trivial) amount of 
DMA memory otherwise you get that fatal error.

Looking at the node I can confirm that there are only 3 user processes 
running, so what I am after is a way of determining how much of that DMA 
memory a process has allocated.

I looked at /proc/${PID}/maps and saw this:

40028000-40029000 r--s 00002000 00:0c \ 
8483                               /dev/gm0

which to me looks like a memory mapping, but to my eyes that looks like just 
1,000 bytes..

Does anyone have any ideas at all ?

Oh - switching to the Myrinet MX drivers (which doesn't have this problem) is 
not an option, we have an awful lot of users, mostly (non-computer) 
scientists, who have their own codes and trying to persuade them to recompile 
would be very hard - which would be necessary as we've not been able to 
convince MPICH-GM to build shared libraries on Linux on Power with the IBM 
compilers. :-(

cheers,
Chris
-- 
 Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20070222/5bb05d3a/attachment.sig>


More information about the Beowulf mailing list