[Beowulf] Help with inconsistent network performance
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joe Landman landman at scalableinformatics.comTue Dec 18 09:32:36 PST 2007
- Previous message: [Beowulf] Help with inconsistent network performance
- Next message: [Beowulf] Help with inconsistent network performance
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Brendan: Brendan Moloney wrote: > I have a cluster of 8 Linux machines connected with gigabit > ethernet (full duplex) to a HP Procurve 2848 switch. I am using the > machines to do interactive distributed rendering. I have noticed that the > final gather stage (where the intermediate images from the render nodes are > sent back to the viewing node) has "hiccups" in the performance. These How are they sent? NFS? Sockets? ... > hiccups occur with as few as two render nodes, and become more common as I > add more render nodes. With a 512x512 image the final gather usually takes > a few milliseconds for each frame, but when the hiccups occur it is more > like 200+ milliseconds. Is this "real time" rendering so that frame rate isthe most important aspect? > Since it is a full duplex switched network, there should not be any > collisions happening. Since the image is less than 1 MB total, I don't There could be blocking ... if one unit grabs the single network pipe of the display node while the another node tries to send data, then the late node will back off (well with TCP it will) in a pre-determined manner. > think I am saturating the switch. I have checked the contents of > /sbin/ifconfig and there are zero erroneous packets being reported. At this You wouldn't see it there. It would be on the switch, and even then it wouldn't term it a collision. It is a switch behaving normally. > point I am really at a loss as to what is causing this. Any input on things > to check would be greatly appreciated. I assume you have a single gigabit from the display node to the switch. As you scale up the number of render nodes, you notice more of these "hiccups" scaling about linearly with the number of nodes. This suggests resource contention. Each image would be fragmented into units of 175 1500-byte packets. This assumes 8 bit images. If you are using 8 bits per color, 3 colors and an alpha channel, then this is ~700 packets. Each 1500 byte packet takes about 11us to transmit, and has a non-trivial latency associated with it. I will estimate the latency at 30us (this is switch latency of ~ 5us + network stack latency on each side of about 12.5us). So for each packet, you have about 41us to transfer it. If you have 8 bit images, then this corresponds to 7.2 ms. There may be some other caching effects that I am missing, or mis-computed. For 32 bits (3x 8bit color channels + 1 alpha channel), this is looking like 28.8 ms for each image. Best case you could do with this is about 34.7 frames per second. If on the other hand, you used jumbo frames with 9000 byte packets, you would need 30 to transfer each image, which would require 67.1us to move, and still 30 us of latency, for 97.1us per packet. For 30 packets, this is 2.9ms. For the 32 bit version as indicated previously (3x 8 bit color channels, and one alpha channel) this would be about 11.6ms. Or 85.9 frames per second. Based on this, I would suggest seeing if changing mtu to 9000 helps. ifconfig eth0 mtu 9000 on all your nodes (every one). The argument for this is that you have less latency to pay for, even though it takes longer to transfer the payload. Another possibility is channel bonding on your display node. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615
- Previous message: [Beowulf] Help with inconsistent network performance
- Next message: [Beowulf] Help with inconsistent network performance
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
