[Beowulf] Can one Infiniband net support MPI and a parallel filesystem?
Gerry Creager
gerry.creager at tamu.edu
Mon Sep 1 21:38:52 PDT 2008
Craig Tierney wrote:
> Joe Landman wrote:
>> Craig Tierney wrote:
>>> Chris Samuel wrote:
>>>> ----- "I Kozin (Igor)" <i.kozin at dl.ac.uk> wrote:
>>>>
>>>>>> Generally speaking, MPI programs will not be fetching/writing data
>>>>>> from/to storage at the same time they are doing MPI calls so there
>>>>>> tends to not be very much contention to worry about at the node
>>>>>> level.
>>>>> I tend to agree with this.
>>>>
>>>> But that assumes you're not sharing a node with other
>>>> jobs that may well be doing I/O.
>>>>
>>>> cheers,
>>>> Chris
>>>
>>> I am wondering, who shares nodes in cluster systems with
>>> MPI codes? We never have shared nodes for codes that need
>>
>> The vast majority of our customers/users do. Limited resources, they
>> have to balance performance against cost and opportunity cost.
>>
>> Sadly not every user has an infinite budget to invest in contention
>> free hardware (nodes, fabrics, or disks). So they have to maximize
>> the utilization of what they have, while (hopefully) not trashing the
>> efficiency too badly.
>>
>>> multiple cores since be built our first SMP cluster
>>> in 2001. The contention for shared resources (like memory
>>> bandwidth and disk IO) would lead to unpredictable code performance.
>>
>> Yes it does. As does OS jitter and other issues.
>>
>>> Also, a poorly behaved program can cause the other codes on
>>> that node to crash (which we don't want).
>>
>> Yes this happens as well, but some users simply have no choice.
>>
>>>
>>> Even at TACC (62000+ cores) with 16 cores per node, nodes
>>> are dedicated to jobs.
>>
>> I think every user would love to run on a TACC like system. I think
>> most users have a budget for something less than 1/100th the size.
>> Its easy to forget how much resource (un)availability constrains
>> actions when you have very large resources to work with.
>>
>
> TACC probably wasn't a good example for the "rest of us". It hasn't been
> difficult to dedicate nodes to jobs when the number of cores was 2 or 4.
> We now have some 8 core nodes, and we are wondering if the policy of
> not sharing nodes is going to continue, or at least modified to minimize
> waste.
Last time I asked (recently...) TACC intends to continue scheduling
per-node, even with 16 cores/node.
Sorry to be late with this but the hurricane season is getting
interesting and e-mail's taken a bit of a hit.
--
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
More information about the Beowulf
mailing list