<span style="font-family:arial,sans-serif;font-size:13px">Vincent,</span><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">You are changing the item being tested. You disputed my statement that switches can have a latency as low as 100-150 ns. I described how to test the latency of a single hop (I neglected to say that the two NICs must be connected to the same switch chip i.e. blade, cross-bar, etc). You can additional measure multi-hop links the same way by choosing your ports correctly.</div>
<div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">Please don't change rules because you cannot admit you are wrong.</div><div style="font-family:arial,sans-serif;font-size:13px">
<br></div><div style="font-family:arial,sans-serif;font-size:13px">Scott</div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">(reposted to the whole group)</div>
<div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Nov 9, 2012 at 3:40 PM, Vincent Diepeveen <span dir="ltr"><<a href="mailto:diep@xs4all.nl" target="_blank">diep@xs4all.nl</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
that's not how fast you can get the data at each core.<br>
<br>
The benchmark i wrote is actually a reflection of how a hashtable works for Game Tree Search in general.<br>
the speedup of it is exponential, so doing it in a different way we can PROVE (as in mathematical proof)<br>
that you will have troubles getting the same exponent (which we cal branching factor).<br>
<br>
So practical testing then what you can achieve from core to core is what matters.<br>
<br>
The first disappointment then happens with the new opteron cores actually, namely that AMD has designed<br>
a memory controller which just doesn't scale if you use all cores.<br>
<br>
Joel Hruska performed some tests there (not sure where he posted it online).<br>
We see then that the bulldozer type architecture still scales ok if you run benchmarks single core.<br>
Sure no real good latency but still...<br>
<br>
Yet if you move then from using 4 processes to measure to 8 processes to measure, this<br>
at a chip we already land at nearly 200 ns, which is real slow.<br>
<br>
The same effect happens when at a big supercomputer you run at full throttle with all cores.<br>
<br>
Manufacturers can claim whatever, but it is always paper math.<br>
<br>
If they ever release something it's some sort of single core, whereas in the first place that<br>
box didn't get ordered to work single core.<br>
<br>
You don't want the performance at a single core in a lab with temperatures nearby 0 Kelvin,<br>
you want to see that the box you got performs like this with all cores running :)<br>
<br>
And on the number posted you already start losing at Cray, starting with the actual CPU's that suck when you use all cores.<div class="HOEnZb"><div class="h5"><br>
<br>
On Nov 9, 2012, at 8:38 PM, atchley <a href="http://tds.net" target="_blank">tds.net</a> wrote:<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Vincent, it is easy to measure.<br>
<br>
1. Connect to NICs back-to-back.<br>
2. Measure latency<br>
3. Connect machines to switch<br>
4. Measure latency<br>
5. Subtract (2) from (4)<br>
<br>
That is how we did it at Myricom and that is how we do it at ORNL.<br>
<br>
Try it sometime.<br>
<br>
Scott<br>
<br>
<br>
On Fri, Nov 9, 2012 at 2:36 PM, Vincent Diepeveen <<a href="mailto:diep@xs4all.nl" target="_blank">diep@xs4all.nl</a>> wrote:<br>
<br>
On Nov 9, 2012, at 7:31 PM, atchley <a href="http://tds.net" target="_blank">tds.net</a> wrote:<br>
<br>
Modern switches need 100-150 ns per hop.<br>
<br>
yeah that's BS when you have software that goes measure that with all cores busy.<br>
<br>
I wrote a benchmark to measure that with all cores busy.<br>
<br>
The SGI box back then that was having 50 ns switches which would have 'in theory' a latency of 480 ns @ 500 cpu's,<br>
so 960 for a blocked read, i couldn't get it down to less than 5.8 us on average.<br>
<br>
<br>
<br>
<br>
There are some things that do not scale per hp such as traversing the PCIE link from socket to NIC and back. So, I see it as 1.2 to go to the router and back and 100 ns per hop.<br>
<br>
Scott<br>
<br>
<br>
On Fri, Nov 9, 2012 at 11:17 AM, Vincent Diepeveen <<a href="mailto:diep@xs4all.nl" target="_blank">diep@xs4all.nl</a>> wrote:<br>
The latency estimate taking 5 hops seems a tad optimistic to me<br>
except when i read the English wrong and they mean 1.7 microseconds a<br>
hop making it for a 5 hop 5 * 1.7 = 8.5 microseconds in total.<br>
<br>
"Not every node is only one hop away, of course. On a fully<br>
configured system, you are five hopes away maximum from any socket,<br>
so there is some latency. But the delta is pretty small with<br>
Dragonfly, with a minimum of about 1.2 microseconds for a short hop,<br>
an average of 1.5 microseconds on average, and a maximum of 1.7<br>
microseconds for the five-hop jump, according to Bolding."<br>
<br>
On Nov 8, 2012, at 7:13 PM, Hearns, John wrote:<br>
<br>
> Well worth a read:<br>
><br>
><br>
><br>
> <a href="http://www.theregister.co.uk/2012/11/08/" target="_blank">http://www.theregister.co.uk/<u></u>2012/11/08/</a><br>
> cray_cascade_xc30_<u></u>supercomputer/<br>
><br>
><br>
><br>
> John Hearns | CFD Hardware Specialist | McLaren Racing Limited<br>
> McLaren Technology Centre, Chertsey Road, Woking, Surrey GU21 4YH, UK<br>
><br>
><br>
> T: +44 (0) 1483 262000<br>
><br>
> D: +44 (0) 1483 262352<br>
><br>
> F: +44 (0) 1483 261928<br>
> E: <a href="mailto:john.hearns@mclaren.com" target="_blank">john.hearns@mclaren.com</a><br>
><br>
> W: <a href="http://www.mclaren.com" target="_blank">www.mclaren.com</a><br>
><br>
><br>
><br>
> The contents of this email are confidential and for the exclusive<br>
> use of the intended recipient. If you receive this email in error<br>
> you should not copy it, retransmit it, use it or disclose its<br>
> contents but should return it to the sender immediately and delete<br>
> your copy.<br>
><br>
> ______________________________<u></u>_________________<br>
> Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin<br>
> Computing<br>
> To change your subscription (digest mode or unsubscribe) visit<br>
> <a href="http://www.beowulf.org/mailman/listinfo/beowulf" target="_blank">http://www.beowulf.org/<u></u>mailman/listinfo/beowulf</a><br>
<br>
______________________________<u></u>_________________<br>
Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" target="_blank">http://www.beowulf.org/<u></u>mailman/listinfo/beowulf</a><br>
<br>
<br>
<br>
</blockquote>
<br>
</div></div></blockquote></div><br></div>