very high bandwidth, low latency manner?

Wed Apr 17 10:33:43 PDT 2002

On Tue, 16 Apr 2002, [iso-8859-1] Håkon Bugge wrote:

>1) Performance.
>
>Performance transparency is always goal. Nevertheless, sometimes an 
>implementation will have a performance bug. The two organizations owning 
>the mentioned systems, have both support agreements with Scali. I have 
>checked the support requests, but cannot find any request where your 
>incidents were reported. We find this fact strange if you truly were aiming 
>at achieving good performance. We are happy to look into your application 
>and report findings back to this news group.

I don't think we have a performance bug. We have developed
a real world application using frequent communication and
have tested/run it on multiple systems.

We do not intend to modify our algorithms to try
to get better performance on a particular system.

If people need help for gaining performance on a 
particular system, then this platform is not a target
again if I can not do the tuning by myself, which we 
did.
Not all codes are PD which makes the point before also
important.

>2) Startup time.
>
>You contribute the bad scalability to high startup time and mapping of 
>memory. This is an interesting hypothesis; and can easily be verified by 

No, I said that with larger numbers of nodes (I would like to talk
about >100 , but here I mean more than 16) the scalability is limited
(amount spent in communication increases significantly and speedup
values decrease after a certain number of nodes) and yes
the startup time also increases, which I thought to be caused
by the SCI mechanisms of exporting/mapping mem).

>using a switch when you start the program, and measure the difference 
>between the elapsed time of the application and the time it uses after 
>MPI_Init() has been called. However, the startup time measured on 64-nodes, 
>two processors per node, where all processes have set up mapping to all 
>other processes, is nn second. If this contributes to bad scalability, your 
>application has a very short runtime.

I certainly think that scalability has nothing to do with startup time.
And I just checked my earlier posting on this.
>
>3) SCI ring structure
>
>You state that on a multi user, multi-process environment, it is hard to 
>get deterministic performance numbers. Indeed, that is true. True sharing 
>of resources implies that. Whether the resource is a file-server, a memory 
>controller, or a network component, you will probably always be subject to 
>performance differences. Also, lack of page coloring will contribute to 

I think that when running on a dedicated partition of a cluster,
I would not like to receive a significant impact from other applications
because their communication increases nor would I like to influence
my advisor's application.

>different execution times, even for a sequential program. You further 
>indicate that performance numbers reported f. ex. by Pallas PMB benchmark 
>only can be used for applying for more VC. I disagree for two reasons; 
>first, you imply that venture capitalists are naive (and to some extent 
>stupid). That is not my impression, merely the opposite. Secondly, such 
>numbers are a good example to verify/deny your hypothesis that the SCI ring 
>structure is volatile to traffic generated by other applications. PMB's 
>*multi* option is architected to investigate exactly the problem you 
>mention; Run f. ex. MPI_Alltoall() on N/2 of the machine. Then measure how 
>performance is affected when the other N/2 of the machine is also running 
>Alltoall(). This is the reason we are interested in comparative performance 
>numbers to SCI based systems. It is to me strange, that no Pallas PMB 
>benchmark results ever has been published for a reasonable sized system 
>based on alternative interconnect technologies. To quote Lord Kelvin: "If 
>you haven't measured it, you don't know what you're talking about".
>
>As a bottom line, I would appreciate that initiatives to compare cluster 
>interconnect performance should be appreciated, rather than be scrutinized 
>and be phrased as "only usable to apply for more VC".
>

what's the goal then of having marketing statements which can
not be applied in general in a .signature ?

there is also PD SCI-MPICH which from reading papers applies for
the same statement.

Markus
>
>H
>At 11:40 AM 4/15/02 +0200, Markus Fischer wrote:
>>Steffen Persvold wrote:
>> >
>> > Now we have price comparisons for the interconnects (SCI,Myrinet and
>> > Quadrics). What about performance ? Does anyone have NAS/PMB numbers for
>> > ~144 node Myrinet/Quadrics clusters (I can provide some numbers from a 132
>> > node Athlon 760MP based SCI cluster, and I guess also a 81 node PIII 
>> ServerWorks
>> > HE-SL based cluster).
>>
>>yes, please.
>>
>>I would like to get/see some numbers.
>>I have run tests with SCI for a non linear diffusion algorithm on a 96 node
>>cluster with 32/33 interface. I thought that the poor
>>scalability was due to the older interface, so I switched to
>>a SCI system with 32 nodes and 64/66 interface.
>>
>>Still, the speedup values were behaving like a dog with more than 8 nodes.
>>
>>Especially, the startup time will reach minutes which is probably due to
>>the exporting and mapping of memory.
>>
>>Yes, the MPI library used was Scampi. Thus, I think the
>>(marketing) numbers you provide
>>below are not relevant except for applying for more VC.
>>
>>Even worse, we noticed, that the SCI ring structure has an impact on the
>>communication pattern/performance of other applications.
>>This means we only got the same execution time if other nodes were
>>I idle or did not have communication intensive applications.
>>How will you determine the performance of the algorithm you just invented
>>in such a case ?
>>
>>We then used a 512 node cluster with Myrinet2000. The algorithm scaled
>>very fine up to 512 nodes.
>>
>>Markus
>>
>> >
>> > Regards,
>> > --
>> >   Steffen Persvold   | Scalable Linux Systems |   Try out the world's best
>> >  mailto:sp at scali.com |  http://www.scali.com  | performing MPI 
>> implementation:
>> > Tel: (+47) 2262 8950 |   Olaf Helsets vei 6   |      - ScaMPI 1.13.8 -
>> > Fax: (+47) 2262 8951 |   N0621 Oslo, NORWAY   | >320MBytes/s and <4uS 
>> latency
>> >
>> > _______________________________________________
>> > Beowulf mailing list, Beowulf at beowulf.org
>> > To change your subscription (digest mode or unsubscribe) visit 
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>_______________________________________________
>>Beowulf mailing list, Beowulf at beowulf.org
>>To change your subscription (digest mode or unsubscribe) visit 
>>http://www.beowulf.org/mailman/listinfo/beowulf
>
>--
>Håkon Bugge; VP Product Development; Scali AS;
>mailto:hob at scali.no; http://www.scali.com; fax: +47 22 62 89 51;
>Voice: +47 22 62 89 50; Cellular (Europe+US): +47 924 84 514;
>Visiting Addr: Olaf Helsets vei 6, Bogerud, N-0621 Oslo, Norway;
>Mail Addr:  Scali AS, Postboks 150, Oppsal, N-0619  Oslo, Norway;
>
>