[Beowulf] Can one Infiniband net support MPI and a parallel filesystem?

Mon Sep 1 21:38:52 PDT 2008

Craig Tierney wrote:
> Joe Landman wrote:
>> Craig Tierney wrote:
>>> Chris Samuel wrote:
>>>> ----- "I Kozin (Igor)" <i.kozin at dl.ac.uk> wrote:
>>>>
>>>>>> Generally speaking, MPI programs will not be fetching/writing data
>>>>>> from/to storage at the same time they are doing MPI calls so there
>>>>>> tends to not be very much contention to worry about at the node
>>>>>> level.
>>>>> I tend to agree with this. 
>>>>
>>>> But that assumes you're not sharing a node with other
>>>> jobs that may well be doing I/O.
>>>>
>>>> cheers,
>>>> Chris
>>>
>>> I am wondering, who shares nodes in cluster systems with
>>> MPI codes?  We never have shared nodes for codes that need
>>
>> The vast majority of our customers/users do.  Limited resources, they 
>> have to balance performance against cost and opportunity cost.
>>
>> Sadly not every user has an infinite budget to invest in contention 
>> free hardware (nodes, fabrics, or disks).  So they have to maximize 
>> the utilization of what they have, while (hopefully) not trashing the 
>> efficiency too badly.
>>
>>> multiple cores since be built our first SMP cluster
>>> in 2001.  The contention for shared resources (like memory
>>> bandwidth and disk IO) would lead to unpredictable code performance.
>>
>> Yes it does.  As does OS jitter and other issues.
>>
>>> Also, a poorly behaved program can cause the other codes on
>>> that node to crash (which we don't want).
>>
>> Yes this happens as well, but some users simply have no choice.
>>
>>>
>>> Even at TACC (62000+ cores) with 16 cores per node, nodes
>>> are dedicated to jobs.
>>
>> I think every user would love to run on a TACC like system.  I think 
>> most users have a budget for something less than 1/100th the size.   
>> Its easy to forget how much resource (un)availability constrains 
>> actions when you have very large resources to work with.
>>
> 
> TACC probably wasn't a good example for the "rest of us".  It hasn't been
> difficult to dedicate nodes to jobs when the number of cores was 2 or 4.
> We now have some 8 core nodes, and we are wondering if the policy of
> not sharing nodes is going to continue, or at least modified to minimize
> waste.

Last time I asked (recently...) TACC intends to continue scheduling 
per-node, even with 16 cores/node.

Sorry to be late with this but the hurricane season is getting 
interesting and e-mail's taken a bit of a hit.

-- 
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University	
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843