[Beowulf] Register article on Cray Cascade

Fri Nov 9 12:40:52 PST 2012

that's not how fast you can get the data at each core.

The benchmark i wrote is actually a reflection of how a hashtable  
works for Game Tree Search in general.
the speedup of it is exponential, so doing it in a different way we  
can PROVE (as in mathematical proof)
that you will have troubles getting the same exponent (which we cal  
branching factor).

So practical testing then what you can achieve from core to core is  
what matters.

The first disappointment then happens with the new opteron cores  
actually, namely that AMD has designed
a memory controller which just doesn't scale if you use all cores.

Joel Hruska performed some tests there (not sure where he posted it  
online).
We see then that the bulldozer type architecture still scales ok if  
you run benchmarks single core.
Sure no real good latency but still...

Yet if you move then from using 4 processes to measure to 8 processes  
to measure, this
at a chip we already land at nearly 200 ns, which is real slow.

The same effect happens when at a big supercomputer you run at full  
throttle with all cores.

Manufacturers can claim whatever, but it is always paper math.

If they ever release something it's some sort of single core, whereas  
in the first place that
box didn't get ordered to work single core.

You don't want the performance at a single core in a lab with  
temperatures nearby 0 Kelvin,
you want to see that the box you got performs like this with all  
cores running :)

And on the number posted you already start losing at Cray, starting  
with the actual CPU's that suck when you use all cores.

On Nov 9, 2012, at 8:38 PM, atchley tds.net wrote:

> Vincent, it is easy to measure.
>
> 1. Connect to NICs back-to-back.
> 2. Measure latency
> 3. Connect machines to switch
> 4. Measure latency
> 5. Subtract (2) from (4)
>
> That is how we did it at Myricom and that is how we do it at ORNL.
>
> Try it sometime.
>
> Scott
>
>
> On Fri, Nov 9, 2012 at 2:36 PM, Vincent Diepeveen <diep at xs4all.nl>  
> wrote:
>
> On Nov 9, 2012, at 7:31 PM, atchley tds.net wrote:
>
> Modern switches need 100-150 ns per hop.
>
> yeah that's BS when you have software that goes measure that with  
> all cores busy.
>
> I wrote a benchmark to measure that with all cores busy.
>
> The SGI box back then that was having 50 ns switches which would  
> have 'in theory' a latency of 480 ns @ 500 cpu's,
> so 960 for a blocked read, i couldn't get it down to less than 5.8  
> us on average.
>
>
>
>
> There are some things that do not scale per hp such as traversing  
> the PCIE link from socket to NIC and back. So, I see it as 1.2 to  
> go to the router and back and 100 ns per hop.
>
> Scott
>
>
> On Fri, Nov 9, 2012 at 11:17 AM, Vincent Diepeveen <diep at xs4all.nl>  
> wrote:
> The latency estimate taking 5 hops seems a tad optimistic to me
> except when i read the English wrong and they mean 1.7 microseconds a
> hop making it for a 5 hop 5 * 1.7 = 8.5 microseconds in total.
>
> "Not every node is only one hop away, of course. On a fully
> configured system, you are five hopes away maximum from any socket,
> so there is some latency. But the delta is pretty small with
> Dragonfly, with a minimum of about 1.2 microseconds for a short hop,
> an average of 1.5 microseconds on average, and a maximum of 1.7
> microseconds for the five-hop jump, according to Bolding."
>
> On Nov 8, 2012, at 7:13 PM, Hearns, John wrote:
>
> > Well worth a read:
> >
> >
> >
> > http://www.theregister.co.uk/2012/11/08/
> > cray_cascade_xc30_supercomputer/
> >
> >
> >
> > John Hearns | CFD Hardware Specialist | McLaren Racing Limited
> > McLaren Technology Centre, Chertsey Road, Woking, Surrey GU21  
> 4YH, UK
> >
> >
> > T:  +44 (0) 1483 262000
> >
> > D:  +44 (0) 1483 262352
> >
> > F:  +44 (0) 1483 261928
> > E:  john.hearns at mclaren.com
> >
> > W: www.mclaren.com
> >
> >
> >
> > The contents of this email are confidential and for the exclusive
> > use of the intended recipient. If you receive this email in error
> > you should not copy it, retransmit it, use it or disclose its
> > contents but should return it to the sender immediately and delete
> > your copy.
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> > Computing
> > To change your subscription (digest mode or unsubscribe) visit
> > http://www.beowulf.org/mailman/listinfo/beowulf
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
>