[Beowulf] Three notes from ISC 2006
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Kevin Ball kball at pathscale.comWed Jun 28 11:41:59 PDT 2006
- Previous message: [Beowulf] Three notes from ISC 2006
- Next message: [Beowulf] Three notes from ISC 2006
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Patrick, Thank you for the rapid and thoughtful response, On Wed, 2006-06-28 at 11:23, Patrick Geoffray wrote: > Hi Kevin, > > Kevin Ball wrote: > > Patrick, > > > >> > >> From you flawed white papers, you compared your own results against > >> numbers picked from the web, using older interconnect with unknown > >> software versions. > > > > I have spent many hours searching to try to find application results > > with newer Myrinet and Mellanox interconnects. I would be thrilled (and > > I suspect others might as well, but I'm only speaking for myself) if you > > would take these white papers as a challenge and publish application > > results with the latest and greatest hardware and software. > > Believe it or not, but I really want to do that. I don't think it's > appropriate to compare results from other vendors though: in Europe, > it's forbidden to do comparative advertisement (ie the soap X washes > more white that the brand Y) and I completely agree with the rationale. This is interesting. I did not know this; I do see the rationale, though it complicates things as well: Knowing what performance is good and what is not is very difficult without something to compare to. > > However, there is nothing wrong into publishing applications numbers > versus plain Ethernet for example, and let people put curves side by > side if they want. Or submit to application specific websites like > Fluent. I will do that as soon as I get a decent sized cluster for me (I > have a lot of small ones of various cpus/chipsets but my 64 nodes > cluster is getting old). Time is also a big problem right now, but I > will have more manpower in a couple of months. Time is really the > expensive resource. I agree on the time point, and appreciate the difficulty in finding both time and a large enough cluster to be interesting. > > Most integrators have their own testbed and they do comparisons, but you > will never get these results, and even if you could have it, you could > not published it. > > Recently, I have been thinking about something that you may like. With > motherboards with 4 good PCIE slots coming on the marketing (driven by > SLI and such), it could be doable to have a reasonably sized machine, > let's say 64 nodes, with 4 different interconnects in it. If Intel or > AMD (or any good will) would donate the nodes, and the interconnect > vendors would donate NICs + switch + cables, and a academic or > governmental entity would volunteer to host it, you could have a testbed > accessible by people to do benchmarking. The deal would be: you can use > the test bed but you have to allow your benchmark code to be available > to everyone and the code will be run on all interconnects and the > results public. I like this idea a lot. In fact, I've been pushing for us to get such a cluster internally, but again the time and money questions come into play, and an internal cluster has similar problems to the integrator testbeds you mention. I have two large concerns. One is that finding a software stack that works with the latest interconnect products may or may not correlate well with what end users are interested in. For some protocols (particularly MPI) this doesn't seem to make a huge difference, though we have seen some effect. For others (particularly TCP/IP), there is a humongous difference between different Linux kernels and distros. Depending on what software was decided upon, this might bias either for or against a solution that does TCP offload, as compared to one that uses the standard Linux stack. The second concern is keeping up with N different release cycles in terms of having things at the latest stable software version, and firmware version (for products with firmware), and hardware... and how this would interact with the above question of having a single supported software stack. So in short... yes, I like the idea a lot, and I think it could potentially get us into a better place than we are now in terms of vendors and customers knowing how things compare. However, there are potential difficulties that without doing more research, I don't know how much of a limitation they would put on the final result. > What do you think of that ? I'd support such an effort... I do wonder what would happen in terms of marketing and/or vendor support if a situation like the last 3 years of AMD/Intel were to arise for Interconnects. If some vendors became clearly technically inferior, would they withdraw support of the project? -Kevin > > Patrick
- Previous message: [Beowulf] Three notes from ISC 2006
- Next message: [Beowulf] Three notes from ISC 2006
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
