Kidger's comments on Quadric's design and performance
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joachim Worringen joachim at lfbs.RWTH-Aachen.DETue Apr 23 01:01:18 PDT 2002
- Previous message: Kidger's comments on Quadric's design and performance
- Next message: Kidger's comments on Quadric's design and performance
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Richard Fryer wrote: > > On Fri, 19 Apr 2002 14:06:00 +0100 > Daniel Kidger <Daniel.Kidger at quadrics.com> wrote: > > > after all as well as having the fastest line-speed, the Quadrics > > interconnect sends all data as virtual addresses (the NIC has its > > own MMU and TLB). That way any process can read and write > > the memory of any other node without any CPU overhead. > > I appreciate getting a bit of technical detail on Quadrics interfaces. Is > there a web location that might provide more information - comparative > benchmarks or protocol information or ??? Of course www.quadrics.com, and Fabrizio Petrini is doing a lot of evaluation work (http://www.c3.lanl.gov/~fabrizio, esp. http://www.c3.lanl.gov/~fabrizio/quadrics.html). > This message also reminded me to ask if a long-held opinion is valid - and > that opinion is "that a cache coherent interconnect would offer performance > enhancement when applications are at the 'more tightly coupled' end of the > spectrum." I know that present PCI based interfaces can't do that without > invoking software overhead and latencies. Anyone have data - or an argument > for invalidating this opinion? You would need another programming model than MPI for that (see below), maybe OpenMP as you basically have the characteristics of a SMP system with cc-NUMA architecture. > I did recently read that the AMD 'HyperTransport' interfaces ARE capable of > cache coherent transactions. This would appear to allow protocols (such as > SCI) that support cache coherence to operate in that mode. But I wonder if > it matters to the MPI world. Seems to me that it would be a factor in > improving scalability (providing that other interconnect issues such as > bandwidth bottlenecks) don't prevent it. My recollection is that the SCI > simulations I saw required very little added traffic to maintain coherency. This is true (for an introduction, see http://www.SCIzzL.com/HowSCIcohWorks.html). However, for MPI, cache-coherence would not really add a performance benefit. MPI is designed to be efficient with "write-only" protocols. One-sided communication may benefit from it, but other techniques like Cray SHMEM do the same w/o cache-coherence. And I do not expect anybody except AMD or chipset designers to design network adapters / bus bridges for something propietary like HyperTransport... > Also a brief note about the Dolphin product line, since the issue of link > saturation has come up: - they DO also sell switches - or at least offer > them. And if you check the SCI specification, you'll see that there are > some elaborate discussions of fabric architectures that the protocol > supports and switches enable. What I DO NOT know is if the SCALI software > supports switch-based operation, and also don't know what the impact is on > the system cost per node. My 'inexperienced' assessment of the appeal in > the Dolphin family is that you can start without the switch and later add it > if the performance benefit warrents. That's what I'd say if I were selling > them anyway - and didn't know otherwise. :-) The "external" switches are not designed for large-scale HPC applications (although they scale quite well inside the range of their supported number of nodes), but for high-performance, high-availabitlity small-scale cluster or embedded applications, as i.e. Sun sells. With ext. switches, you don't have to do anything to keep the network up if a node fails (and also nothing if it comes back as SCI is not source-routed). In torus topologies, re-routing needs to be applied to bypass bad nodes (Scali does this on-the-fly). Scali does not support external switches AFAIK (at least doesn't sell such systems any longer), which is less a technical issue but more a design-issue as the topology is fully transparent for the nodes accessing the network (they did use switches in the past, see http://www.scali.com/whitepaper/ehpc97/slide_9.html). For large scale applications, distributed switches as in torus topologies scale better and more cost-efficient (see http://www.scali.com/whitepaper/scieurope98/scale_paper.pdf and other resources). With switches, you need *a lot* of cables and switches (which doesn't hinder Quadrics to do so - resulting in an impressive 14 miles of cables for a recent system (IIRC) with single cables being up to 25m in length). It would need to be verified if such a system build with a Quadrics-like fat-tree topologie using Dolphins 8-port switches would scale better than the equivalent torus topologie for different communication patterns. I doubt it. At least, the interconect would cost a lot more (at least twice, or even more depending on the dimension of the tree). SCI-MPICH, can be used with arbitraries SCI topologies (because it uses the SISCI interface and thus runs with Scali or Dolphin SCI drivers). It is not that closely coupled to the SCI drivers as ScaMPI is. Joachim -- | _ RWTH| Joachim Worringen |_|_`_ | Lehrstuhl fuer Betriebssysteme, RWTH Aachen | |_)(_`| http://www.lfbs.rwth-aachen.de/~joachim |_)._)| fon: ++49-241-80.27609 fax: ++49-241-80.22339
- Previous message: Kidger's comments on Quadric's design and performance
- Next message: Kidger's comments on Quadric's design and performance
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
