[Beowulf] EM64T Clusters

Wed Jul 28 16:07:18 PDT 2004

> We've just brought up a test stand with both PCI-X and PCI-E Infiniband
> host channel adapters.  Some very preliminary (and sketchy, sorry) test
> results which will be updated occassionally are available at:
> 
>    http://lqcd.fnal.gov/benchmarks/newib/

Interesting, the listed:
    * PCI Express: 4.5 microsec
    * PCI-X, "HPC Gold": 7.4 microsec
    * PCI-X, Topspin v2.0.0_531: 7.3 microsec

Seem kind of slow to me, I suspect it's mostly the nodes (not pci-x).
I'm using dual opterons, PCI-X, and "HPC Gold" and getting 0.62 seconds:

compute-0-0.local compute-0-1.local 
size=    1, 131072 hops, 2 nodes in  0.62 sec (  4.7 us/hop)    826 KB/sec

My benchmark just does a MPI_Send<->MPI_Recv of a single integer,
increments the integer it and passes it along in a circularly linked list
of nodes.  What exact command line arguments did you use with netpipe
I'd like to compare results.

> The PCI Express nodes are based on Abit AA8 motherboards, which have x16
> slots.  We used the OpenIB drivers, as supplied by Mellanox in their
> "HPC Gold" package, with Mellanox Infinihost III Ex HCA's.
> 
> The PCI-X nodes are a bit dated, but still capable.  They are based on
> SuperMicro P4DPE motherboards, which use the E7500 chipset.  We used
> Topspin HCA's on these systems, with either the supplied drivers or the
> OpenIB drivers.
> 
> I've posted NetPipe graphs (MPI, rdma, and IPoIB) and Pallas MPI
> benchmark results.  MPI latencies for the PCI Express systems were about

Are the raw results for your netpipe runs available?

> 4.5 microseconds; for the PCI-X systems, the figure was 7.3
> microseconds.  With Pallas, sendrecv() bandwidths peaked at
> approximately 1120 MB/sec on the PCI Express nodes, and about 620 MB/sec

My pci-x nodes do about midway between those numbers:
# Benchmarking Sendrecv
#bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
 524288           80      1249.87      1374.87      1312.37       727.34
1048576           40      2499.78      2499.78      2499.78       800.07
2097152           20      4999.55      5499.45      5249.50       727.35

> I don't have benchmarks for our application posted yet but will do so
> once we add another pair of PCI-E nodes.

I have 10 PCI-X dual opterons and should have 16 real soon if you want
to compare Infiniband+pci-x on nodes that are closer to your pci-express
nodes.

-- 
Bill Broadley
Computational Science and Engineering
UC Davis