Processor contention(?) and network bandwidth on AMD

Mon Apr 29 09:05:47 PDT 2002

This is probably in the category of "Yup, that's the way it is, deal with 
it", but, just in case anyone has any ideas, I'm throwing it out there.

In the course of testing the gigabit connection in a new server, I noticed 
that overloaded dual AMD systems take a big hit in network bandwidth.  I'm 
testing with ttcp, and all connections were made over the same switch (HP 
Procurve 2324).

As an example, the results for a Tiger MPX (S2466) based node with dual 
1900+s and using the integrated 3Com are:

unloaded:                                         11486.6 KB/real sec
2 matlab simulations:                             10637.8 KB/real sec
2 matlab simulations and 2 SETI at homes (nice -19):  6645.4 KB/real sec

Ouch.  This is on RedHat 7.2 with kernel 2.4.9-31.

I eliminated every variable I could think of -- I tried this on an S2462 
(Thunder MP) based system, I used a PCI Intel eepro100 card rather than 
the built-in 3Com, I upgraded to an almost vanilla (French Vanilla?) 
2.4.18 kernel (the one from SGI's 1.1 XFS release).  All showed the same 
results (well, 2.4.18 didn't show much of a drop with just the two 
matlabs, but still crashed with matlab+SETI).  The one Intel system I 
tested (dual PIII 933 on an i860) showed very little bandwidth drop with 
load, and no extra drop for an overload.

Any ideas?  Is there any way to fix this?  Or is the answer just to not 
run background nice jobs on cluster nodes?

Thanks.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University