[Beowulf] Infiniband: MPI and I/O?
Mark Hahn
hahn at mcmaster.ca
Thu May 26 13:13:07 PDT 2011
>>> Wondering if anyone out there is doing both I/O to storage as well as
>>> MPI over the same IB fabric.
>>>
>>
>> I would say that is the norm. we certainly connect local storage (Lustre)
>> to nodes via the same fabric as MPI. gigabit is completely
>> inadequate for modern nodes, so the only alternatives would be 10G
>> or a secondary IB fabric, both quite expensive propositions, no?
>>
>> I suppose if your cluster does nothing but IO-light serial/EP jobs,
>> you might think differently.
>>
> Really? I'm surprised by that statement. Perhaps I'm just way behind on the
> curve though. It is typical here to have local node storage, local
> lustre/pvfs storage, local NFS storage, and global GPFS storage running over
> the GigE network.
sure, we use Gb as well, but only as a crutch, since it's so slow.
or does each node have, say, a 4x bonded Gb for this traffic?
or are we disagreeing on whether Gb is "slow"? 80-ish MB/s seems pretty
slow to me, considering that's less than any single disk on the market...
>> how much inter-chassis MPI do you do? how much IO do you do?
>> IB has a small MTU, so I don't really see why mixed traffic would be a big
>> problem. of course, IB also doesn't do all that wonderfully
>> with hotspots. but isn't this mostly an empirical question you can
>> answer by direct measurement?
>>
> How would I measure by direct measurement?
I meant collecting the byte counters from nics and/or switches
while real workloads are running. that tells you the actual data rates,
and should show how close you are to creating hotspots.
> My question really was twofold: 1) is anyone doing this successfully and 2)
> does anyone have an idea of how loudly my users will scream when their MPI
> jobs suddenly degrade. You've answered #1 and seem to believe that for #2,
> no one will notice.
we've always done it, though our main experience is with clusters that have
full-bisection fabrics. our two more recent clusters have half-bisection
fabrics, but I suspect that most users are not looking closely enough at
performance to notice and/or complain.
More information about the Beowulf
mailing list