[vortex] RX overrun with 3COM 3c982

Altmeyer, Klaus Klaus.Altmeyer@fujitsu-siemens.com
Fri Nov 15 07:03:01 2002


Hello,

I found your discussion about the RX overrruns in the vortex mailing list
archive and would like to tell you about my experiences with this problem.
In the last week I heard from two of our customers, that the performance of
the application LS-DYNA is not ok on their new systems with Tyan K7X boards
compared to the old K7 boards. I made my own tests on a comparable machine
in our lab and could reproduce the problem with LS-DYNA. Like Claude Pignol
I saw a lot of RX overruns and a bad performance with the Pallas parallel
benchmark, PMB-MPI1. Then we inserted additional ethernet cards, Intel
Ethernet Pro 100, in these nodes. These additional cards are not configured,
they are only visible by lspci. But this alone makes the onboard ethernet
controllers, 3com 3c920, recognized as 3c982 by the driver 3c59x, work well.
You see this for instance with the Pallas parallel benchmark, PMB-MPI1:

bad performace without EPRO100:

# Benchmarking Exchange
# ( #processes = 8 )
# ( 8 additional processes waiting in MPI_Barrier)
#---------------------------------------------------------------------------
--
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
Mbytes/sec
            0         1000       149.60       150.00       149.79
0.00
            1         1000       149.28       149.43       149.33
0.03
            2         1000       148.36       148.74       148.59
0.05
            4         1000       150.08       150.35       150.22
0.10
            8         1000       150.04       150.25       150.14
0.20
           16         1000       193.92       194.18       194.08
0.31
           32         1000       157.35       157.70       157.56
0.77
           64         1000       159.61       159.77       159.68
1.53
          128         1000       172.81       172.94       172.86
2.82
          256         1000       200.61       200.87       200.73
4.86
          512         1000       253.52       253.81       253.71
7.70
         1024         1000       367.81       368.32       368.06
10.61
         2048         1000     34844.21     35062.39     35007.50
0.22
         4096         1000      6631.40      6841.29      6762.29
2.28
         8192         1000      4875.81      4878.23      4877.10
6.41
        16384         1000      4586.20      4593.15      4589.59
13.61
        32768         1000     12002.10     12011.81     12007.88
10.41
       65536          640     24314.41     24369.84     24350.05
10.26
       131072          320     48162.97     48244.25     48211.81
10.36
       262144          160     97332.64     97555.27     97420.75
10.25
       524288           80    190721.71    191918.25    191334.69
10.42
      1048576           40    387557.02    392875.10    389964.55
10.18
      2097152           20    777556.00    803620.95    791587.59
9.95
      4194304           10   1583550.40   1670719.20   1623545.42
9.58

much better performance with additional EPRO100:

# Benchmarking Exchange
# ( #processes = 8 )
#---------------------------------------------------------------------------
--
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
Mbytes/sec
            0         1000       144.80       145.06       144.93
0.00
            1         1000       145.90       146.33       146.11
0.03
            2         1000       145.84       146.04       145.92
0.05
            4         1000       146.38       146.69       146.53
0.10
            8         1000       146.57       146.74       146.68
0.21
           16         1000       148.03       148.28       148.14
0.41
           32         1000       153.61       153.93       153.77
0.79
           64         1000       156.81       157.14       156.96
1.55
          128         1000       170.30       170.49       170.41
2.86
          256         1000       199.65       199.85       199.74
4.89
          512         1000       250.55       250.88       250.70
7.79
         1024         1000       366.00       366.26       366.13
10.67
         2048         1000       533.38       533.99       533.68
14.63
         4096         1000       759.47       759.82       759.65
20.56
         8192         1000      1484.48      1485.36      1484.90
21.04
        16384         1000      3127.70      3130.31      3129.32
19.97
        32768         1000      8971.88      8980.04      8976.02
13.92
        65536          640     25064.83     25110.18     25088.63
9.96
       131072          320     47286.53     47398.26     47351.32
10.55
       262144          160     96968.72     97259.74     97141.55
10.28
       524288           80    193817.61    195332.65    194691.25
10.24
      1048576           40    395990.98    402471.30    399729.34
9.94
      2097152           20    760975.35    783215.70    771369.33
10.21
      4194304           10   1489239.60   1578893.70   1539617.64
10.13

This is ok, and also the application LS-DYNA runs now as expected. The only
difference is the additional ethernet card, I made no change on the
operating system or other software items. In my opinion the additional
ethernet card changes the behaviour of the PCI bus so that the onboard
ethernet controllers are handled much better. I hope this helps to find out
more.

Best regards

Klaus Altmeyer
High Performance Computing
Fujitsu Siemens Computers GmbH
Siegenstraße 17
51427 Bergisch Gladbach
    
Telefon:		+ 49 (0) 2204 961710
Fax: 		+ 49 (0) 2204 961720
Mobile:		+ 49 (0) 170 9158 047
Email:	  	mailto:klaus.altmeyer@fujitsu-siemens.com
Internet:      	www.fujitsu-siemens.de/hpc