[Beowulf] ethernet bonding performance comparison "802.3ad" vs Adaptive Load Balancing
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Rahul Nabar rpnabar at gmail.comWed Sep 17 14:18:39 PDT 2008
- Previous message: [Beowulf] ethernet bonding performance comparison "802.3ad" vs Adaptive Load Balancing
- Next message: [Beowulf] Monitoring crashing machines
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, Sep 17, 2008 at 4:05 PM, Eric Thibodeau <kyron at neuralbs.com> wrote: >Well, apart from the fact that ssh is compressed and, as Digo pointed out and that 47 MB/sec is probably your HDD's transfer capacity as >Shannon pointed out, also keep in mind your bus's capacity ( http://en.wikipedia.org/wiki/List_of_device_bandwidths is a nice list). So, >unless you've got both NICs on PCI-E (or independant PCI channels, which I've only heard of in high-end Compaq servers with hotswap PCI >interfaces) you're saturating your bus. Thanks for all those responses guys! Eric; I'll check my bus speed; my server is not very high end. These are Dell Power Edge 1435's. But after I first posted I did a couple more debugs and diagnostics: (1) As Shannon pointed earlier, I did give netperf a shot now. Funny resut is this: If I netperf from Machine A to B I get only 1Gbps. If I start two netperfs on A and try to talk to B ; each gets 0.5Gbps. Thus aggregate of still 1 Gbps BUT if I start two netperfs on A and one talks to B and another to C each gets 1 Gbps. Thus I got an aggregate of 2 Gbps out [desired result] In the last situation if I disable one link then I fall back to 0.5 Gbps each. So this is my (almost) perfect situation. Forces me to conclude that I am _not_ disk, bus nor I/O limited. What do you think? The sad thing though is this: I could never get a peer-to-peer (A talks to B alone) mode that would give me a 2 Gbps aggregated. This is frustrating. These are 8 cpu/node servers and frequently even a 16 cpu job will span across only 2 compute-nodes. Then if I cannot use both the eth cards it seems an awful waste of capacity. Just think about this: If two-processes talk from A-to-B I get 1 Gbps aggregate. But if I have two processes and just route one through a passive-forwarding-machine C (thus A-to-B and A-to-C-to-B) then I will end up with an aggregate of 2 Gbps. This seems a very strange, non-intuitive and undesirable outcome of the current bonding setup , I feel. I might have to actually _force_ jobs to span more than two servers just to be able to use both my eth cards! Feels very strange to me. I tried both modes 4 and 6. Rick Jones, the netperf maintainer gave me a very promising suggestion that I might be able to modify my bonding hash algorithm so that it bonds traffic coming from two different processes originating on the same node. Currently I cannot. Anybody else has given this a shot? I'm eager to hear any other comments people might have. -- Rahul
- Previous message: [Beowulf] ethernet bonding performance comparison "802.3ad" vs Adaptive Load Balancing
- Next message: [Beowulf] Monitoring crashing machines
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
