[Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

Wed Aug 13 07:55:19 PDT 2008

Joe Landman wrote:
> Craig Tierney wrote:
>> Chris Samuel wrote:
>>> ----- "I Kozin (Igor)" <i.kozin at dl.ac.uk> wrote:
>>>
>>>>> Generally speaking, MPI programs will not be fetching/writing data
>>>>> from/to storage at the same time they are doing MPI calls so there
>>>>> tends to not be very much contention to worry about at the node
>>>>> level.
>>>> I tend to agree with this. 
>>>
>>> But that assumes you're not sharing a node with other
>>> jobs that may well be doing I/O.
>>>
>>> cheers,
>>> Chris
>>
>> I am wondering, who shares nodes in cluster systems with
>> MPI codes?  We never have shared nodes for codes that need
> 
> The vast majority of our customers/users do.  Limited resources, they 
> have to balance performance against cost and opportunity cost.
> 
> Sadly not every user has an infinite budget to invest in contention free 
> hardware (nodes, fabrics, or disks).  So they have to maximize the 
> utilization of what they have, while (hopefully) not trashing the 
> efficiency too badly.
> 
>> multiple cores since be built our first SMP cluster
>> in 2001.  The contention for shared resources (like memory
>> bandwidth and disk IO) would lead to unpredictable code performance.
> 
> Yes it does.  As does OS jitter and other issues.
> 
>> Also, a poorly behaved program can cause the other codes on
>> that node to crash (which we don't want).
> 
> Yes this happens as well, but some users simply have no choice.
> 
>>
>> Even at TACC (62000+ cores) with 16 cores per node, nodes
>> are dedicated to jobs.
> 
> I think every user would love to run on a TACC like system.  I think 
> most users have a budget for something less than 1/100th the size.   Its 
> easy to forget how much resource (un)availability constrains actions 
> when you have very large resources to work with.
> 

TACC probably wasn't a good example for the "rest of us".  It hasn't been
difficult to dedicate nodes to jobs when the number of cores was 2 or 4.
We now have some 8 core nodes, and we are wondering if the policy of
not sharing nodes is going to continue, or at least modified to minimize
waste.

Craig

> Joe
> 
> 

-- 
Craig Tierney (craig.tierney at noaa.gov)