data transfer and condor

Thomas R Boehme mail at thomas-boehme.de
Fri Aug 3 08:51:42 PDT 2001


Hi,

To really suggest a reasonable solution, you need to give a little more
detail. 

e.g.:

How many nodes do you have?
How large are the files read by each job?
How long does each job take? 
Are the jobs mainly IO limited or do they also require high computational
effort?
What network interconnects are you using on the nodes / NFS server?
Do you have a budget for hardware improvements?

In general, I don't think scheduling the data transfer really helps, as that
would basically mean, all the jobs wait for other jobs to finish the IO. It
doesn't really give you more throughput. 

The solution is probably to provide enough bandwidth to cope with the
traffic in a reasonable fashion. I would suggest looking into PVFS and
distributing the data across the nodes. 

The other solution would be to upgrade the file server to provide as much
throughput as possible (as I don't know what you have now, so I can't really
suggest anything).

Do you use USE_NFS = True for condor? I would test both true and false to
see what gives you the better throughput. I think the condor internal
transfer might be faster, but I can't tell as we only have 100 MBit networks
and both NFS and condor are achieving almost the maximum possible
throughput.


Bye, Thomas


> -----Original Message-----
> From: Steven Berukoff [mailto:steveb at aei-potsdam.mpg.de]
> Sent: Friday, August 03, 2001 8:36 AM
> To: beowulf at beowulf.org
> Subject: data transfer and condor
> 
> 
> Hi all,
> 
> We're looking at using Condor on our cluster for its checkpointing and
> job-handling abilities, as the routines we're running don't require much
> in the way of internode communication.  We have an NFS file server which
> contains our entire fileset (something on the order of 100s of GB), a
> master node for the cluster, and several nodes.  Outside of Condor, our
> algorithm requires that each of the nodes get some subset of the data (on
> the order of perhaps 100MB) and runs the analysis code on this data
> segment.  Obviously, each node must gather its share of the data from the
> NFS file server; this of course requires a large amount of network
> traffic.
> 
> Does anyone have a clever idea about scheduling the data
> transfers so that it is accomplished in a reasonable fashion?  We were
> hoping Condor provides this functionality to some degree, but it doesn't
> seem to.
> 
> Thanks
> Steve
> 
> 
> 
> =====
> Steve Berukoff					tel: 49-331-5677233
> Albert-Einstein-Institute			fax: 49-331-5677298
> Am Muehlenberg 1, D14477 Golm, Germany
email:steveb at aei.mpg.de
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf




More information about the Beowulf mailing list