[Beowulf] Python libraries slow to load across Scyld cluster

Skylar Thompson skylar.thompson at gmail.com
Sat Jan 17 10:59:05 PST 2015


On 01/16/2015 04:38 PM, Don Kirkby wrote:
> Thanks for the suggestions, everyone. I've used them to find more information, but I haven't found a solution yet. 
> 
> It looks like the time is spent opening the Python libraries, but my attempts to change the Beowulf configuration files have not made it run any faster. 
> 
> Skylar asked: 
> 
>> Do any of your search paths (PATH, PYTHONPATH, LD_LIBRARY_PATH, etc.) 
>> include a remote filesystem (i.e. NFS)? This sounds a lot like you're 
>> blocked on metadata lookups on NFS. Using "strace -c" will give you a 
>> histogram of system calls by count and latency, which can be helpful in 
>> tracking down the problem. 
> 
> Yes, the compute nodes mount from a network file system to a local RAM disk. When I look at mounted file systems, I can see that the Python libraries are on a network mount. The Python libraries are at /usr/local/lib/python2.7. 
> 
> $ bpsh 5 df 
> Filesystem 1K-blocks Used Available Use% Mounted on 
> [...others deleted...] 
> 192.168.1.1:/usr/local/lib
>                      926067424 797367296  80899808  91% /usr/local/lib
> 
> 
> I used strace as suggested and found that most of the time is spent in open(). 
> 
> $ bpsh 5 strace -c python2.7 cached_imports_decimal.py 
> started at 2015-01-16 14:29:45.543066 
> imported decimal at 0:00:21.719083 
> % time seconds usecs/call calls errors syscall 
> ------ ----------- ----------- --------- --------- ---------------- 
> 97.95 0.040600 44 932 822 open 
> [...others deleted...] 
> 
> 
> I also looked at the timing of the individual system calls to see which files were slow to open: 
> 
> bpsh 5 strace -r -o strace.txt python2.7 cached_imports_decimal.py 
> more strace.txt 
> [...] 
> 0.000063 open("/usr/local/lib/python2.7/lib-dynload/usercustomize.so", O_RDONLY) = -1 ENOENT (No such file or directory) 
> 0.000701 open("/usr/local/lib/python2.7/lib-dynload/usercustomizemodule.so", O_RDONLY) = -1 ENOENT (No such file or directory) 
> 0.127012 open("/usr/local/lib/python2.7/lib-dynload/usercustomize.py", O_RDONLY) = -1 ENOENT (No such file or directory) 
> 0.126985 open("/usr/local/lib/python2.7/lib-dynload/usercustomize.pyc", O_RDONLY) = -1 ENOENT (No such file or directory) 
> 0.127037 stat("/usr/local/lib/python2.7/site-packages/usercustomize", 0x7fff28a973f0) = -1 ENOENT (No such file or directory) 
> 0.000086 open("/usr/local/lib/python2.7/site-packages/usercustomize.so", O_RDONLY) = -1 ENOENT (No such file or directory) 
> 0.126963 open("/usr/local/lib/python2.7/site-packages/usercustomizemodule.so", O_RDONLY) = -1 ENOENT (No such file or directory) 
> [...] 

Do you have attribute caching (ac) setup for the NFS mount? Assuming
this is a mostly read-only NFS mount point, you might also consider
disabling closed-to-open cache coherence (nocto) which will
significantly increase NFS performance at the expense of breaking POSIX
compliance.

There's a good discussion of the implications in the nfs(5) man page in
the "DATA AND METADATA COHERENCE" section.

Skylar


More information about the Beowulf mailing list