[Beowulf] /lib/tls/libc.so.6 or libc-2.3.5.so vmadump errors
scunacc at yahoo.com
Mon Feb 20 08:31:34 PST 2006
I am trying to establish a clustermatic 5 setup on a 2.6.9 custom built
kernel backported to a stock Mandriva 2006 build (with all of the latest
patches applied as of Saturday)
No problem on the headnode kernel or the CM5 host utils booting.
However, the slaves *intermittently* do not properly copy the libs over
and I get
vmadump: mmap failed: /lib/tls/libc.so.6
vmadump: mmap failed: /lib/tls/libc-2.3.5.so
Now, the first is a symlink to the other.
Also, strace on a simple binary (e.g. mkdir, shows that it is indeed
trying to load *that* version of the C lib 1st.)
I've messed around with taking that out of the path and linking
libc.so.6 to various other libc*so*'s in /usr/lib or /lib, with the same
results. It will sometimes boot, sometimes not.
This looks like a random library ordering issue.
Or, perhaps a timing issue where something that is being called in the C
lib is causing vmadump to burp.
It's happening in the node_up stage tho' if it happens.
*Sometimes* the nodes will boot OK.
Note: I have a happily running CM5 setup on several other machines with
FC4 as the core OS and basically the same custom CM5 kernel on top -
it's something funky with the M2006 C libraries AFAICS. Threading
perhaps? Not sure.
I have other reasons for going with M2006.
I didn't fancy backporting the basic bproc code to a 2.6.12* or 2.6.15
kernel, so I simply used (custom rebuilt the same as on the FC4
clusters) the 2.6.9 kernel from CM5.
Do let me know if you have any ideas.
More information about the Beowulf