[Beowulf] Performance characterising a HPC application

Tue Mar 20 05:32:12 PDT 2007

Hi,

Thanks for your reply, apologies for the delay in responding - St 
Patrick's day celebrations temporarily got in the way :)

Michael Will wrote:
> This is a very interesting topic.
> 
> First off it's interesting how different head and compute node are, and 
> that cpu utilisation is relatively  low.
> .
> What is the runtime of one run ?

The model is a forecasting model so it runs indefinitely - the model 
generates a "history file" of current state every half hour or so 
(varies depending on the specific modelling scenario that is being 
forecasted).

> Have you tried running it only on compute nodes? (mpirun -nolocal)

I tried running it with -np 0 on the head node (which I'm guessing 
should have the same affect as -nolocal which I was unaware of), it 
didn't seem to make any significant difference.

> Have you experimented with the impact of running two threads per node 
> versus four and half the amount of nodes to understand if a quadcore 
> system could give you an advantage (more mpi io within the node) or 
> disadvantage (more mpi io squeezing through interconnect bottleneck) ?

I did try doubling up the number of processes running on each node to 
see what the effect would be. Doing this is complicated by the fact that 
the model itself has input parameters (the size of a tile that is 
analysed) which need to be proportional to the number of processes 
running the model - but doubling the number of processing from 4 to 8 on 
each node (i.e. 2 per core) resulted in a slow-down of the model on the 
order of 10% or so.

> Infiniband will be more valuable on the quadcore I presume.
> 
> Does the app use any scratchspace at runtime over NFS?

Nope, the only node doing significant I/O is the headnode (I'll go and 
verify this with some tests but I'm pretty sure as it stands).

> 
> What size are input and output files and how much time is spent reading 
> / writing them ?

The history files are generated every 30 minutes or so and tend to be 
about 700-800MB in size (depends again on the model). I don't believe it 
does a lot of I/O outside of that but I'll need to verify that also.

Thanks,

-stephen

-- 
Stephen Mulcahy, Applepie Solutions Ltd, Innovation in Business Center,
    GMIT, Dublin Rd, Galway, Ireland.      http://www.aplpi.com