[Beowulf] choosing a high-speed interconnect
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduTue Oct 12 15:39:19 PDT 2004
- Previous message: [Beowulf] choosing a high-speed interconnect
- Next message: [Beowulf] choosing a high-speed interconnect
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, 12 Oct 2004, Chris Sideroff wrote: > On Tue, 2004-10-12 at 16:48, Joe Landman wrote: > > First questions first: > > > > Why do you think you need a faster network, and what aspect of fast do > > you think you need? Low latency? High bandwidth? > > To tell you the truth I can't answer that with more than, "I have a > gut feeling". I am in the process of profiling the performance of our > current cluster with our programs. Any suggestions ??? Analyze the applications, preferrably at the code level. If they exchange a few, big messages then they are likely bandwidth limited. If they exchange many, small messages then they are likely latency limited. If you don't have access to the code, then run a tool such as xmlsysd/wulfstat that lets you watch the (ether)net on a whole cluster at once as it runs your applications and take note on e.g. packet counts per second per node, net data throughput per second per node. Joe's question is dead on the money. Until you do this, you cannot be sure that your application is choking due to a network that is "slow" in any dimension. Even if it IS slow due the network, it may not be slow in a sense that can be substantively fixed by changing networks, if you're already using gigE. gigE's latency isn't great, but its bandwidth should be at least comparable (within a factor of 1-3) of the faster networks. Sometimes, also, the problem is the network but not at the physical layer; rather in the way the code itself is organized and uses the network. If the code is YOUR code, then a trip through e.g. Ian Foster's book on parallel programming and algorithms (there are several others with good reputations) is indicated before investing a LOT of money in a new network. If the code is somebody else's code, then the list is a great place to get actual feedback on what the essential bottlenecks are and to learn of actual clusters that are successful designs. It sounds (below) like you have a bit of both -- good luck finding Fluent users or a Fluent-savvy consultant on the list (both seem pretty likely). Before departing, I'd suggest working with vendors to arrange a loaner network and prototyping it with your programs before finally buying it. These networks are a substantial investment, as the companies that sell them well know. The companies are quite competitive and want your business. They are usually pretty willing to let their hardware "speak for itself" so you aren't investing $1-2K/node only to learn afterwards that it doesn't speed your code up at all. That is an outcome that benefits nobody, really, not even the network vendor (as you'll doubtless later poison their reputation in this very competitive and reputation-sensitive marketplace). rgb > > > Then... > > > > What codes are you running? Across how many CPUS? Have you done a > > performance analysis on your system to observe "slow" runs in progress, > > and are you convinced that the network is the issue? > > We run exclusively computation fluid dynamics on it. One program is > Fluent the other is an in-house turbo-machinery code. My experiences so > far have led me to believe Fluent is much more sensitive to the > network's performance than the in-house program. Thus my inquiry into a > higher performance network. > > > We have done lots of tuning bits for customers where the issues wound up > > being something else than what they had thought. It is worth at least > > looking into for your code/problems, and identifying the bottleneck (if > > you haven't already done so). > > Do you have more information on this 'tuning for customers'. I am > interested in your results. Again any suggestions on how to go about > this are welcomed. > > Thanks, Chris > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: [Beowulf] choosing a high-speed interconnect
- Next message: [Beowulf] choosing a high-speed interconnect
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
