[Beowulf] Help with inconsistent network performance

Tue Dec 18 15:27:22 PST 2007

As has been pointed out to me offline, my numbers may be a bit more 
pessimistic than needed, in part to pipelining and other effects.  If my 
numbers were the result of a correct analysis, the most you would be 
able to see from a gigabit link would be about 37 MB/s for 1500 byte 
packets.  This is obviously not the case, so assume this to be a "worst 
case" analysis  (and I am going to go back and review what I seem to 
have dropped from the TCP bits).

Joe

Joe Landman wrote:
> Hi Brendan:
> 
> Brendan Moloney wrote:
>> I have a cluster of 8 Linux machines connected with gigabit
>> ethernet (full duplex) to a HP Procurve 2848 switch.   I am using the
>> machines to do interactive distributed rendering.  I have noticed that 
>> the
>> final gather stage (where the intermediate images from the render 
>> nodes are
>> sent back to the viewing node) has "hiccups" in the performance.  These
> 
> How are they sent?  NFS? Sockets? ...
> 
>> hiccups occur with as few as two render nodes, and become more common 
>> as I
>> add more render nodes.  With a 512x512 image the final gather usually 
>> takes
>> a few milliseconds for each frame, but when the hiccups occur it is more
>> like 200+ milliseconds.
> 
> Is this "real time" rendering so that frame rate isthe most important 
> aspect?
> 
>> Since it is a full duplex switched network, there should not be any
>> collisions happening.  Since the image is less than 1 MB total, I don't
> 
> There could be blocking ...  if one unit grabs the single network pipe 
> of the display node while the another node tries to send data, then the 
> late node will back off (well with TCP it will) in a pre-determined manner.
> 
>> think I am saturating the switch.  I have checked the contents of
>> /sbin/ifconfig and there are zero erroneous packets being reported.  
>> At this
> 
> You wouldn't see it there.  It would be on the switch, and even then it 
> wouldn't term it a collision.  It is a switch behaving normally.
> 
>> point I am really at a loss as to what is causing this.  Any input on 
>> things
>> to check would be greatly appreciated.
> 
> I assume you have a single gigabit from the display node to the switch. 
>  As you scale up the number of render nodes, you notice more of these 
> "hiccups" scaling about linearly with the number of nodes.
> 
> This suggests resource contention.  Each image would be fragmented into 
> units of 175  1500-byte packets.  This assumes 8 bit images.  If you are 
> using 8 bits per color, 3 colors and an alpha channel, then this is ~700 
> packets.  Each 1500 byte packet takes about 11us to transmit, and has a 
> non-trivial latency associated with it.  I will estimate the latency at 
> 30us (this is switch latency of ~ 5us + network stack latency on each 
> side of about 12.5us).  So for each packet, you have about 41us to 
> transfer it.   If you have 8 bit images, then this corresponds to 7.2 
> ms.  There may be some other caching effects that I am missing, or 
> mis-computed.  For 32 bits (3x 8bit color channels + 1 alpha channel), 
> this is looking like 28.8 ms for each image.  Best case you could do 
> with this is about 34.7 frames per second.
> 
> If on the other hand, you used jumbo frames with 9000 byte packets, you 
> would need 30 to transfer each image, which would require 67.1us to 
> move, and still 30 us of latency, for 97.1us per packet.  For 30 
> packets, this is 2.9ms.  For the 32 bit version as indicated previously 
> (3x 8 bit color channels, and one alpha channel) this would be about 
> 11.6ms.  Or 85.9 frames per second.
> 
> Based on this, I would suggest seeing if changing mtu to 9000 helps.
> 
>     ifconfig eth0 mtu 9000
> 
> on all your nodes (every one).
> 
> The argument for this is that you have less latency to pay for, even 
> though it takes longer to transfer the payload.
> 
> Another possibility is channel bonding on your display node.
> 
> 

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615