[Beowulf] tcp error: Need ideas!

Paulo Afonso Lopes pal at di.fct.unl.pt
Sat Jan 24 13:57:38 PST 2009


> I wonder if the switch could be implicated.  We have seen some (cheap)
> GbE switches not support (in practice) jumbo frames (irrespective of
> literature).

I got the SMC 8624T because it advertised both Jumbo and link aggregation.
Is this one of the "cheap" you have seen that does not work with Jumbo?

paulo


> Nifty Tom Mitchell wrote:
>> On Sat, Jan 24, 2009 at 09:36:09AM -0600, Gerry Creager wrote:
>>> Couple of follow-up notes.
>>>
>>> MTU=4500:  Had one node fall over with the same overflow errors.
>>> MTU=3000:  A WRF model is running, but single timesteps are executing
>>> 2.5x slower than MTU=1500
>
> Segment offload?  Is TSO on or off?
>
> 	ethtool -k eth0
>
> will tell you.  You might also have one very reluctant machine, in the
> sense of being unwilling to switch their mtu.  Could you do an
>
> 	ifconfig eth0 | grep MTU
>
> on each machine and verify that everyone is using the right MTU?
>
>
>>>
>>> I'll go snag the new driver and compile it.  After all: What can it
>>> hurt!
>>>
>>> Thanks, Guy!
>>>
>>> Regards, Gerry
>>>
>>> Guy Coates wrote:
>>>> Hi,
>>>>
>>>> We have also seen problems with the bnx2 drivers.
>>>>
>>>> I got a more recent set of bnx2 drivers from Broadcom:
>>>>
>> ......
>>
>> Has the data been snooped for this data to see if all
>> is as expected.
>>
>> If you are seeing a natural MTU running faster than a jumbo MTU
>> then something is fragmenting or causing fragmentation of the data.
>>
>> Should the MTU=4500 causes overflow errors it might be related to
>> fragmentation.
>> Both the sender and receiver have to keep all the bits on a reliable
>> transfer until the data has been acknowledged.   At one time
>> fragmentation
>> could only be done once to a minimum MTU in the life of a packet.
>>
>> In addition to snooping packets try "tracepath" to and from all
>> the involved boxes to discover what is going on.
>>
>>
>
>
> --
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics LLC,
> email: landman at scalableinformatics.com
> web  : http://www.scalableinformatics.com
>         http://jackrabbit.scalableinformatics.com
> phone: +1 734 786 8423 x121
> fax  : +1 866 888 3112
> cell : +1 734 612 4615
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>


-- 
Paulo Afonso Lopes                        | Tel: +351- 21 294 8536
Departamento de Informática               | 294 8300 ext.10763
Faculdade de Ciências e Tecnologia        | Fax: +351- 21 294 8541
Universidade Nova de Lisboa               | e-mail: pal at di.fct.unl.pt
2829-516 Caparica, PORTUGAL






More information about the Beowulf mailing list