[Beowulf] RDMA NICs and future beowulfs
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Vincent Diepeveen diep at xs4all.nlMon Apr 25 16:26:46 PDT 2005
- Previous message: [Beowulf] Cluster for Finite Element Analysis
- Next message: [Beowulf] RDMA NICs and future beowulfs
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
At 06:02 PM 4/25/2005 -0400, Mark Hahn wrote: >> Would anyone on this list have pointers to >> which network cards on market support >> RDMA (Remote Direct Memory Access)? > >ammasso seems to have real products. afaikt, you link with their >RDMA-enabled MPI library and get O(15) microsecond latencies. >to me, it's hard to see why this would be worth writing home... > >> Would anyone have hands on experience >> with performance, usability, and cost aspects >> of this new RDMA technology? > >they work, but it's very unclear where their natural niche is. >if you want high bandwidth, you don't want gigabit. >if you want low latency, you don't want gigabit, even RDMA-gigabit. > >the truth is that reducing networking overhead is always a bit of a >hard sell. consider the appeal of avoiding context switches - >it sounds great, right? how much of that appeal is based on the >(mistaken) impression that context switches are expensive? > >lower CPU overhead also sounds great, but it made a lot more sense >5-10 years ago when systems had .3 GB/s memory bandwidth (<5% of today). Not really, I prefer shipping a 100 MB/s to the other side of the cupper wire, and receive 100 MB a second. Practical i ship a megabyte or 40 a second, so that's 80MB/s in total. In gigabit that loses a full cpu simply as it eats all its bandwidth. Newer machines, new memory controllers and so on, i prefer shipping more than that. Needs newer nics, 10 Gb nics that use the cpu for that will also be too slow then. The problem is simply always there. At every system. So the need for a highend network is *always* there. >afaikt, gigabit-RDMA folks are really pinning their hopes on 10 Gb. CPU's and memory bandwidth is also a lot bigger then, causing the cpu to eat more data, again giving the same problem... In reality the bandwidth/latency hunger gets even bigger in future when the cell type processors arrive. Correct me if i'm wrong, it really needs a branch prediction table for my branch intensive integer code, but even then such a processor is kicking butt. I mean 8 processing help units (SPE's) at 1 cpu and a main power pc processor. For floating point that's like 250 Gflop or so practical to their avail. That *really* will make the networks the weakest chain. Now for the 'www.diep3d.com' chess program i have, it is just integer code loaded with a 100k+ branches or so. Each year growing in Artificial intelligence logics. What mankind, well at least the top programmers, have learned is that the only way to approach human nature is by creating a vaste amount of logic rules and putting it in a single program, the whole then will behave 'semi intelligent', if you combine it with a huge search that searches billions of possibilities with that complex logics. So obviously cell processor is kind of a step back for such software, but even then we can see a single cell 4.0Ghz probably like a 8 processor 2.8Ghz Xeon MP machine. That means immense speedups even for software not intended to run on such processors. Intel and AMD obviously need an answer to such awesome processing power the cell approach promises. Note ideal would be a processor with say a core or 8 with 512KB cache (256KB is a bit tiny) for each processing element. Of course a tiny branch prediction table of say 2048 entries will do miracles. 18 cycles misprediction penatly is no problem, as long as there is a branch prediction table of some size. I hope those cell monsters get cheap and available for everyone, that will simply FORCE the other manufacturers to produce their own 8 core cpu's :) In either case, the data that 1 processor will generate and wants to communicate to the world will increase incredible thanks to the additional processing power and networks just barely keep up with it. Whatever good physical reasons there are for that being the case, it makes networks only a weaker chain. Just look to processors. I ran in 2003 at 500Mhz MIPS processors, delivering 1 gflop (origin3800, 512 processors, www.sara.nl). Now i jump soon from that 1 gflop to 0.3 Tflop speed. Factor 300 increase in processing power within a few years. In theory a node having 2 'highend' cell processors, each delivering 0.5 tflop; eating a bit more power than the 50-80 watt estimated for the 4Ghz version. So in total delivering in total 1 tflop a node, it will generate in a small box quite some data. Correct me if i'm wrong; it will generate 4 terabyte data a second, when talking about matrix multiplications. How are beowulfs in future going to stream that away over the network? > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > >
- Previous message: [Beowulf] Cluster for Finite Element Analysis
- Next message: [Beowulf] RDMA NICs and future beowulfs
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
