Tweaking the yellowfin

Jason Holmes jholmes@psu.edu
Fri Apr 23 12:02:58 1999


Greetings.

I'm currently borrowing 4 Yellowfin cards and a Packet Engines' FDR and
am experiencing a few problems with them.

First off, let me explain the setup we currently have.  We currently
have an 8 node Beowulf cluster (8 nodes and a login/controlling node,
really) up and running as a prototype to test things on before we invest
in a large cluster.  Before the addition of the gnic's, the controlling
node had two 100 mbit ethernet cards (Intel Etherexpress Pro's, one
going out to the 'real' world and one going to a private 10.* network
that the nodes are on.)  Each node had one 100 mbit ethernet card (3com
Vortex or Boomerang... not sure which).  The operating system in use is
Linux 2.2.3 on the compute nodes and Linux 2.2.6-ac1 on the controlling
node.

Since we had access to only 4 gnics, we put one in our controlling node
and one each in three of the compute nodes (without removing any of the
NICs that were already present).  Now we wanted to run some MPI
benchmarks (PingPong and PingPing to begin with) utilizing only the
gnics.  So to facilitate this, I subnetted the 10.x so that the 100 mbit
cards and the gnics were on different subnets and anything going to
10.0.0.x went through the 100 mbits and anything going to 10.0.1.x went
through the gnics (each compute node with a gnic in it then had two
ip's, one on 10.0.0.x and one on 10.0.1.x).

So far so good.  Everything seemed to be working.  I then ran PingPong
and found to my dismay that it topped out at 11 MB/s (the 100Mb cards
maxxed at 9 MB/S).  Running PingPing was fine until the packet size
reached 128 bytes.  Then it died and wouldn't come back.

Furthermore, the gnics on all of the nodes are giving the following
complaint erratically:
eth2: Oversized Ethernet frame spanned multiple buffers, status 1400!

And one more: the second 100mbit card on the controlling node is saying
this:
eth1: Ethernet frame overran the Rx buffer, status e0008220!

I've looked through /proc/interrupts and /proc/pci and there doesn't
seem to be any IO or Interrupt conflicts.  I read on the packet engines
site that in their benchmarking on NT that they increased the TCP window
size to 64K and saw a notable improvement, but I can't see any a simple
way to modify this in Linux (through the proc filesystem, I mean...
maybe there's a variable in the kernel to change, but I haven't had the
chance to look yet.)

So, to wrap up this wordy email (sorry...) does anyone have any
recommendations for making these things work better?  Is there a archive
of the yellowfin mailing list somewhere that I could read?  Any
resources at all that you would recommend?

Oh, and I'm not on the list, so if you could be sure to email any
responses to me as well, I'd be grateful.

Thanks,
Jason Holmes
 | To unsubscribe, send mail to Majordomo@cesdis.gsfc.nasa.gov, and within the
 |  body of the mail, include only the text:
 |   unsubscribe this-list-name youraddress@wherever.org
 | You will be unsubscribed as speedily as possible.