very high bandwidth, low latency manner?

Patrick Geoffray patrick at myri.com
Fri Apr 12 12:48:00 PDT 2002


Steffen Persvold wrote:

>>I figure out list cost on a 256 node system at about $2000 before for
>>basic hardware.  I as wrong.  I reworked it and it is $1500 for
>>256 (and would be the same for 512 and 1024).
>>

> So what is wron with my calculations :
 >
 > 256 node L9/2MB/133MHz config :
 > Node cost                                 =   $2,195
 > and for a L9/2MB/200MHz config :
 > Node cost                                 =   $2,495

Nothing, it's right for 256 nodes. However:

128 nodes L9/133 MHz config:
Node cost                                   =   $1,595
128 nodes L9/200 MHz config:
Node cost                                   =   $1,895

For more than 128 ports, the number of switches increases to keep a 
guaranteed full-bissection, it adds about $500 per node. However, up to 
128 nodes, you need only one switch. and the numbers I gave are correct.

The switchless cost model makes sense for configs > than the biggest 
switch size for switched technologies, ie. 128 ports for Quadrics and 
Myrinet. Surprisingly, the largest SCI cluster is, AFAIK, 132 nodes ;-)

> Now we have price comparisons for the interconnects (SCI,Myrinet and
> Quadrics). What about performance ? Does anyone have NAS/PMB numbers for
> ~144 node Myrinet/Quadrics clusters (I can provide some numbers from a 132
> node Athlon 760MP based SCI cluster, and I guess also a 81 node PIII ServerWorks
> HE-SL based cluster).

Ok, I will say again what I think about these comparaisons: it's already 
hard to compare dollars (what about discount, what about support, what 
about software, etc) despite that it the same dollars, it's wasting time 
to do that for micro-benchmarks. It's something you do when you want to 
publish something in a conference next to a beach.
When a customer asks me about performance, I don't give him my NAS or 
PMB numbers, he doesn't care. He wants access to a XXX nodes machine to 
play with and run his set of applications, or he gives a list of codes 
to the vendors for the bid and the vendors guarantee the results because 
it's used officially in the bid process. If someone buys a machine 
because the NAS look pretty and his CFD code sucks, this guy will take 
his stuffs and look for a new job.

Do you spend time to tune NAS ? I don't. People already told me that the 
NAS LU test sucks on MPICH-GM. Well, the LU algorithm in HPL is much 
better. How many application behaves like the NAS LU, how many like HPL 
? If a customer comes to me because his code behaves like NAS LU, I will 
  tell him what to tune in his code to be more efficient.

The pitfall with benchmarks is that you want to tune your MPI 
implementation to looks good on them. In real world, you cannot expect 
to run efficiently a code on a machine without tuning it, specially with 
MPI.

My 2 pennies

Patrick

----------------------------------------------------------
|   Patrick Geoffray, Ph.D.      patrick at myri.com
|   Myricom, Inc.                http://www.myri.com
|   Cell:  865-389-8852          685 Emory Valley Rd (B)
|   Phone: 865-425-0978          Oak Ridge, TN 37830
----------------------------------------------------------




More information about the Beowulf mailing list