very high bandwidth, low latency manner?
Markus Fischer
mfischer at mufasa.informatik.uni-mannheim.de
Wed Apr 17 10:33:43 PDT 2002
On Tue, 16 Apr 2002, [iso-8859-1] Håkon Bugge wrote:
>1) Performance.
>
>Performance transparency is always goal. Nevertheless, sometimes an
>implementation will have a performance bug. The two organizations owning
>the mentioned systems, have both support agreements with Scali. I have
>checked the support requests, but cannot find any request where your
>incidents were reported. We find this fact strange if you truly were aiming
>at achieving good performance. We are happy to look into your application
>and report findings back to this news group.
I don't think we have a performance bug. We have developed
a real world application using frequent communication and
have tested/run it on multiple systems.
We do not intend to modify our algorithms to try
to get better performance on a particular system.
If people need help for gaining performance on a
particular system, then this platform is not a target
again if I can not do the tuning by myself, which we
did.
Not all codes are PD which makes the point before also
important.
>2) Startup time.
>
>You contribute the bad scalability to high startup time and mapping of
>memory. This is an interesting hypothesis; and can easily be verified by
No, I said that with larger numbers of nodes (I would like to talk
about >100 , but here I mean more than 16) the scalability is limited
(amount spent in communication increases significantly and speedup
values decrease after a certain number of nodes) and yes
the startup time also increases, which I thought to be caused
by the SCI mechanisms of exporting/mapping mem).
>using a switch when you start the program, and measure the difference
>between the elapsed time of the application and the time it uses after
>MPI_Init() has been called. However, the startup time measured on 64-nodes,
>two processors per node, where all processes have set up mapping to all
>other processes, is nn second. If this contributes to bad scalability, your
>application has a very short runtime.
I certainly think that scalability has nothing to do with startup time.
And I just checked my earlier posting on this.
>
>3) SCI ring structure
>
>You state that on a multi user, multi-process environment, it is hard to
>get deterministic performance numbers. Indeed, that is true. True sharing
>of resources implies that. Whether the resource is a file-server, a memory
>controller, or a network component, you will probably always be subject to
>performance differences. Also, lack of page coloring will contribute to
I think that when running on a dedicated partition of a cluster,
I would not like to receive a significant impact from other applications
because their communication increases nor would I like to influence
my advisor's application.
>different execution times, even for a sequential program. You further
>indicate that performance numbers reported f. ex. by Pallas PMB benchmark
>only can be used for applying for more VC. I disagree for two reasons;
>first, you imply that venture capitalists are naive (and to some extent
>stupid). That is not my impression, merely the opposite. Secondly, such
>numbers are a good example to verify/deny your hypothesis that the SCI ring
>structure is volatile to traffic generated by other applications. PMB's
>*multi* option is architected to investigate exactly the problem you
>mention; Run f. ex. MPI_Alltoall() on N/2 of the machine. Then measure how
>performance is affected when the other N/2 of the machine is also running
>Alltoall(). This is the reason we are interested in comparative performance
>numbers to SCI based systems. It is to me strange, that no Pallas PMB
>benchmark results ever has been published for a reasonable sized system
>based on alternative interconnect technologies. To quote Lord Kelvin: "If
>you haven't measured it, you don't know what you're talking about".
>
>As a bottom line, I would appreciate that initiatives to compare cluster
>interconnect performance should be appreciated, rather than be scrutinized
>and be phrased as "only usable to apply for more VC".
>
what's the goal then of having marketing statements which can
not be applied in general in a .signature ?
there is also PD SCI-MPICH which from reading papers applies for
the same statement.
Markus
>
>H
>At 11:40 AM 4/15/02 +0200, Markus Fischer wrote:
>>Steffen Persvold wrote:
>> >
>> > Now we have price comparisons for the interconnects (SCI,Myrinet and
>> > Quadrics). What about performance ? Does anyone have NAS/PMB numbers for
>> > ~144 node Myrinet/Quadrics clusters (I can provide some numbers from a 132
>> > node Athlon 760MP based SCI cluster, and I guess also a 81 node PIII
>> ServerWorks
>> > HE-SL based cluster).
>>
>>yes, please.
>>
>>I would like to get/see some numbers.
>>I have run tests with SCI for a non linear diffusion algorithm on a 96 node
>>cluster with 32/33 interface. I thought that the poor
>>scalability was due to the older interface, so I switched to
>>a SCI system with 32 nodes and 64/66 interface.
>>
>>Still, the speedup values were behaving like a dog with more than 8 nodes.
>>
>>Especially, the startup time will reach minutes which is probably due to
>>the exporting and mapping of memory.
>>
>>Yes, the MPI library used was Scampi. Thus, I think the
>>(marketing) numbers you provide
>>below are not relevant except for applying for more VC.
>>
>>Even worse, we noticed, that the SCI ring structure has an impact on the
>>communication pattern/performance of other applications.
>>This means we only got the same execution time if other nodes were
>>I idle or did not have communication intensive applications.
>>How will you determine the performance of the algorithm you just invented
>>in such a case ?
>>
>>We then used a 512 node cluster with Myrinet2000. The algorithm scaled
>>very fine up to 512 nodes.
>>
>>Markus
>>
>> >
>> > Regards,
>> > --
>> > Steffen Persvold | Scalable Linux Systems | Try out the world's best
>> > mailto:sp at scali.com | http://www.scali.com | performing MPI
>> implementation:
>> > Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 -
>> > Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS
>> latency
>> >
>> > _______________________________________________
>> > Beowulf mailing list, Beowulf at beowulf.org
>> > To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>_______________________________________________
>>Beowulf mailing list, Beowulf at beowulf.org
>>To change your subscription (digest mode or unsubscribe) visit
>>http://www.beowulf.org/mailman/listinfo/beowulf
>
>--
>Håkon Bugge; VP Product Development; Scali AS;
>mailto:hob at scali.no; http://www.scali.com; fax: +47 22 62 89 51;
>Voice: +47 22 62 89 50; Cellular (Europe+US): +47 924 84 514;
>Visiting Addr: Olaf Helsets vei 6, Bogerud, N-0621 Oslo, Norway;
>Mail Addr: Scali AS, Postboks 150, Oppsal, N-0619 Oslo, Norway;
>
>
More information about the Beowulf
mailing list