[Beowulf] how large can we go with 1GB Ethernet? / Re: how large of an installation have people used NFS, with?

Geoff Jacobs gdjacobs at gmail.com
Thu Mar 11 10:16:29 PST 2010


psc wrote:
> Thank you all for the answers.  Would you guys please share with me some
> good brands of those
> 200+  1GB Ethernet switches? I think I'll leave our current clusters
> alone , but the new cluster I
> will design for about 500 to 1000 nodes --- I don't think that we will
> go much above since for big jobs
> our scientists using outside resources. We do all our calculations and
> analysis on the nodes and only the final produce
> we sent to the frontend , also we don't run jobs across the nodes , so I
> don't need to get too much creative with the network
> beside being sure that I can expand the cluster without having the
> switches as a limitation (our current situation)
> 
> thank you again!
> 
> 
> Henning Fehrmann wrote:
>> Hi
>>
>> On Wed, Sep 09, 2009 at 03:23:30PM -0400, psc wrote:
>>   
>>> I wonder what would be the sensible biggest cluster possible based on
>>> 1GB Ethernet network .
>>>     
>> Hmmm, may I cheat and use a 10Gb core switch?
>>
>> If you setup a cluster with few thousand nodes you have to ask yourself
>> whether this network should be non-blocking or not.
>>
>> For a non blocking network you need the right core-switch technology.
>> Unfortunately, there are not many vendors out there which provide
>> non-blocking Ethernet based core switches but I am aware of at least
>> two. One provides or will provide 144 10Gb Ethernet ports. Another one
>> sells switches with more than 1000 1 GB ports.
>> You could buy edge-switches with 4 10Gb uplinks and 48 1GB ports. If
>> you just use 40 of them you end up with a 1440 non-blocking 1Gb ports.
>>
>> It might be also possible to cross connect two of these core-switches
>> with the help of some smaller switches so that one ends up with 288
>> 10Gb ports and, in principle, one might connect 2880 nodes in a 
>> non-blocking way, but we did not have the possibility to test it
>> successfully yet. One of problems is that the internal hash table can
>> not store that many mac addresses. Anyway, one probably needs to change
>> the mac addresses of the nodes to avoid an overflow of the hash tables.
>> An overflow might cause arp storms.
>>
>> Once this works one runs into some smaller problems. One of them is the arp
>> cache of the nodes. It should be adjusted to hold as many mac addresses
>> as you have nodes in the cluster.
>>
>>
>>   
>>> And especially how would you connect those 1GB
>>> switches together -- now we have (on one of our four clusters) Two 48
>>> ports gigabit switches connected together with 6 patch cables and I just
>>> ran out of ports for expansion and wonder where to go from here as we
>>> already have four clusters and it would be great to stop adding cluster
>>> and start expending them beyond number of outlets on the switch/s ....
>>> NFS and 1GB Ethernet works great for us and we want to stick with it ,
>>> but we would love to find a way how to overcome the current "switch
>>> limitation".   
>>>     
>> With NFS you can nicely test the setup. Use one NFS server and let all
>> nodes write different files into it and look what happens.
>>
>> Cheers,
>> Henning
>>   
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

It looks like Allied Telesis makes chassis switches now too.

-- 
Geoffrey D. Jacobs



More information about the Beowulf mailing list