[Beowulf] Python libraries slow to load across Scyld cluster

Skylar Thompson skylar.thompson at gmail.com
Thu Jan 15 17:34:10 PST 2015


Do any of your search paths (PATH, PYTHONPATH, LD_LIBRARY_PATH, etc.)
include a remote filesystem (i.e. NFS)? This sounds a lot like you're
blocked on metadata lookups on NFS. Using "strace -c" will give you a
histogram of system calls by count and latency, which can be helpful in
tracking down the problem.

Skylar

On 01/15/2015 03:22 PM, Don Kirkby wrote:
> I'm new to the list, so please let me know if I'm asking in the wrong place.
> 
> We're running a Scyld Beowulf cluster on CentOS 5.9, and I'm trying to
> run some Django admin commands on a compute node. The problem is that it
> can take three to five minutes to launch ten MPI processes across the
> four compute nodes and the head node. (We're using OpenMPI.)
> 
> I traced the delay to smaller and smaller parts of the code until I
> created two example scripts that just import a Python library and print
> out timing information. Here's the first script that imports the
> collections module:
> 
> from datetime import datetime
> t0 = datetime.now()
> print 'started at {}'.format(t0)
> 
> import collections
> print 'imported at {}'.format(datetime.now() - t0)
> 
> 
> 
> When I run that with mpirun -host n0 python
> cached_imports_collections.py, it initially takes about 10 seconds to
> import. However, repeated runs take less time, until it takes less than
> 0.01 seconds to import.
> 
> Running an equivalent script to import the decimal module takes about 30
> seconds, and never speeds up like that. It may be too large to get
> cached completely. I ran beostatus while the script was running, and I
> didn't see the network traffic go over 100 kBps. For comparison, running
> wc on a 100MB file on a compute node causes the network traffic to go
> over 3000 kBps.
> 
> I looked in the Scyld admin guide (PDF) and the reference guide (PDF),
> and found the bplib command that manages which libraries are cached and
> not transmitted with the job processes. However, its list of library
> directories already includes the Python installation.
> 
> Is there some way to increase the size of the bplib cache, or am I doing
> something inefficient in the way I launch my Python processes?
> 
> Thanks,
> Don Kirkby
> British Columbia Centre for Excellence in HIV/AIDS
> cfenet.ubc.ca
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 



More information about the Beowulf mailing list