Help with inconsistent network performance

Brendan Moloney moloney.brendan at gmail.com
Mon Dec 17 19:03:35 PST 2007

I have a cluster of 8 Linux machines connected with gigabit
ethernet (full duplex) to a HP Procurve 2848 switch.   I am using the
machines to do interactive distributed rendering.  I have noticed that the
final gather stage (where the intermediate images from the render nodes are
sent back to the viewing node) has "hiccups" in the performance.  These
hiccups occur with as few as two render nodes, and become more common as I
add more render nodes.  With a 512x512 image the final gather usually takes
a few milliseconds for each frame, but when the hiccups occur it is more
like 200+ milliseconds.

Since it is a full duplex switched network, there should not be any
collisions happening.  Since the image is less than 1 MB total, I don't
think I am saturating the switch.  I have checked the contents of
/sbin/ifconfig and there are zero erroneous packets being reported.  At this
point I am really at a loss as to what is causing this.  Any input on things
to check would be greatly appreciated.

