[Beowulf] Torrents for HPC

Peter pc7 at sanger.ac.uk
Wed Jun 13 07:59:58 PDT 2012


On 12/06/12 18:56, Ellis H. Wilson III wrote:
> On 06/08/12 20:06, Bill Broadley wrote:
>> A new user on one of my GigE clusters submits batches of 500 jobs that
>> need to randomly read a 30-60GB dataset.  They aren't the only user of
>> said cluster so each job will be waiting in the queue with a mix of others.
> With a 160TB cluster and only a 30-60GB dataset, is there any reason why
> the user isn't simply storing their dataset in HDFS?  Does the data
> change frequently via a non-MapReduce framework such that it needs to be
> pulled from NFS before every job?  If the dataset is in a few dozen
> files and in HDFS in the cluster, there is no reason why MapReduce
> shouldn't spawn it's tasks directly "on" the data, without need (most of
> the time) for moving all of the data to every node as you mention.

 From experience this can have varied results and still requires careful 
management/thought. With HDFS if the replicate number is 3 (often the 
default case) and the 30 node cluster has 500 jobs then either an 
initial step is required to replicate the data to all other cluster 
nodes and then perform the analysis (this imposes the expected network / 
disk IO impact and job start up latency already in place).

Alternatively keep the replication at 3 (or a.n.other defined number) 
and limit the number of jobs to the available resources where the data 
replicates  pre-exist. The challenge is finding the sweet spot for the 
work in progress and as always nothing is ever free.

So HDFS does not remove the replication process although it helps to 
hide the processes involved.

The other joy encountered with HDFS is that we found it can be less than 
stable in a multi user environment, this has been confirmed by various 
others so as always care is required during testing.

There are alternatives to HDFS which can be used in conjunction with 
Hadoop but I'm afraid I'm not able to recommend any in particular as 
it's been a while since I last kicked the tyres. Is this something that 
others have more recent experience with and can recommend an alternative ?

Pete


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



More information about the Beowulf mailing list