[Beowulf] More AMD rumors
diep at xs4all.nl
Mon Nov 19 09:42:49 PST 2012
On Nov 19, 2012, at 6:12 PM, Robert G. Brown wrote:
> On Mon, 19 Nov 2012, Vincent Diepeveen wrote:
>> If you measure memory latency at all 8 cores at the same time, it's
>> even more horrible.
> Thanks for a remarkably clear and useful reply, Vincent. This nearly
> precisely mirrors my own measurements with a more floating point
> intensive task. The larger i7-3770 cache and its 8 operational
> (it is a four core system but it maintains two completely independent
> contexts per core, IIRC) seem to give it an overwhelming advantage
> the FX with its eight "real" cores but much smaller cache.
> to see that this continues with the (I assume) integer/logic intensive
> chess code.
Maybe you meant saying it correctly but wrote it wrong.
The FX8150 has a huge SLOW L2 cache of 1MB or so (2MB a module) and
the i7's all have
a small FAST L2 cache of around 256KB.
If we measure accurately then the FX8150 gets a huge speedup from the
So moving from 4 cores to 8 cores it benefits really a lot. Exactly
what you would expect
with a slow L2 cache.
The i7 on the other hand hardly profits from Hyperthreading. In
general the higher you clock (or overclock)
the i7 it profits more yet we speak about a small percentage still.
20% at lower clock up to 30% high clock
For most number cruncing floating point code here (prime numbers) the
speedup from hyperthraeding
is more around 5%, so it hardly benefits there.
At the more modern i7's the multiplication unit has been speeded up.
So it can deliver a much bigger
This whereas the FX8150 has been slowed down factor 2.
> Basically, the i7 looks like a butt-kicking good processor, with
> the one
> problem being that it doesn't look like a multiprocessing cpu (at
> I can't find a dual i7 motherboard, although in principle it
> appears to
> be possible, leaving one with Xeons that don't LOOK like they would
> perform as well although I'd be interested in information on that as
The i7-3770k is the latest i7 and it's Ivy Bridge.
It's really low power though, just around 50 watts.
The Xeons are all older generation i7, a Sandy Bridge. They eat lots
yet performance is very good.
Intel wants to cash in on them, AMD really messed up in that market
For most servers in server market, not to confuse with HPC,
power consumption does matter and intel is winning the battle there.
> At the moment, single processor i7's look like they might actually be
> the world's fastest, at least on a per core basis. OTOH, it might
> be that putting two of them on a single board would horribly saturate
> the memory bus and cause memory management collisions and worse and
> them their advantage.
In itself AMD's coherency protocol is in some areas superior to intels.
Intel already struggles there for a big number of years, which is
especially visible in the 4 socket domain
not to mention 8 sockets.
Note that newer Xeons have a few features which AMD doesn't have,
which in some software
might kick butt. That's synchronisation within the L3s, whereas AMD
goes via the RAM.
I'm not into patents, yet it's possible one reason of succes is that
AMD took over DEC Alpha's
master slave concept. I'm not sure whether intels problem was to get
around those patents.
In either case, latency to the RAM intel always was faster than AMD,
except for when intel still was
off die with the RAM and opteron released.
AMD then got quickly 50% market share in the server market with
opteron for a short while.
I wrote a testprogram to measure latency to the RAM doing just random
reads of 8 bytes into a big buffer,
with all cores at the same time.
From head i remember next numbers:
i7 single chip : 60 - 70 ns
dual i7 Xeon 3.4Ghz : 90 ns
Phenom DDR3 : 100 ns
FX8150 : 160+ ns (thanks to Joel Hruska for benchmarking)
So AMD's design idea now to design a chip with a latency even worse
than their previous generation Phenom core
is not explainable for the servermarket. They did do well previous
time when latency to the RAM was BETTER than from
intel. So getting it worse there is a weird decision.
This is not just architect faults. This is something so important to
a company like AMD, the CEO must be involved in such
In all server loads this latency issue of the bulldozer is a BIG
issue why it is so slow.
Both the L2 latency as well as the RAM.
Please note if you measure single core to 4 cores the latency at
bulldozer is a lot faster. It slows down really a lot when putting
all cores under load.
> I'm getting ready to do some very data intensive stuff -- terabyte-
> datasets being chewed to pieces basically -- to the point where my
> "cluster" will probably be a pile of RAIDs each with its own private
> copy of the datasets in questions and equipped with an i7 motherboard,
> which seems odd somehow (as the i7 motherboards aren't generally
> configured as "server" motherboards) but the Xeons all run at lower
> clock and are older technology.
> Comments from anyone else?
cheapskate clusters with low clocked cpu's are total unbeatable
I don't know whether you can use AVX. If not did you consider buying
for $150 a bunch of nodes 2 socket Xeon L5420 or something
with 8 GB ram?
For a single i7 system you can get 3 to 4 of them.
Another idea is using a 48 core AMD system. Though on ebay the cpu's
are a tad more expensive now,
the 6180SE if you buy 4 of them and a motherboard, you have 48 cores,
huge RAM and 4 memory controllers and 6 memory channels
A total of 24 memory channels or so (if i did do my math ok).
Until recently these 6180SE cpu's were $450 on ebay, though i see
them now for $650 or so.
If your workload parallellizes well it could be an idea. They do not
have AVX however.
For what you are gonna do maybe your biggest pal is ebay, regardless
what you want to order.
>>> I would have hoped that AMD would dig in an innovate and
>>> regain at least parity if not the lead, because it is good for the
>>> industry for Intel to have serious competition, but while Intel
>>> make money and survive as second best to AMD, AMD can't make any
>>> as second best to Intel...
>> We must split of course the 2 worlds of HPC performance.
>> In fact htere is 3 but let's do a rough 2 world division
>> a) floating point or vectorized performance (can be integers as well)
>> We skip A : the manycores have won there.
>> b) integer performance non-vectorized
>> For integers and branches if i take a huge program like Diep.
>> More is better.
>> i7-3960X-EE : 2.0 Million chess positions a second (12 logical
>> i7-980x turbo: 1.85 Million chess positions a second (12 logical
>> i7-3770k: 1.47 million chess positions a second (8 logical
>> AMD Phenom X6 1100T : 1.34 million chess positions a second (6 cores)
>> AMD Phenom X6 1090T : 1.30 million chess positions a second (6 cores)
>> FX-8150 : 1.22 million chesspositions a second (8 mini cores)
>> The FX-8150 is AMD's latest 'bulldozer' CPU.
>> The problem is the new generation FX-8150 at a NEW process
>> technology, with 2 billion transistors or so (caches counted
>> - the initial press release from AMD - not the later one where they
>> creatively not counting things reached 1.2 billion) is not beating
>> their own old design.
>> Furthermore another big problem is power usage.
>> Under full load:
>> Phenom X6 1090T : 69.6 watt,
>> Phenom X6 1100T : 92 watt
>> We see how the 1100T already was clocked a tad too high by AMD, which
>> explains the huge power increase.
>> Now the FX-8150 : 115.2 watt
>> As if Law of Moore garantueeing progress doesn't exist...
>> As for you, in many benchmarks you did do maybe multiplication was
>> important. Each minicore has its own multiplication unit.
>> Sounds good huh?
>> So far the good news: the problem is: it's also over 2 times slower
>> that unit...
>> Please note that bulldozer does have AVX. From benchmarks we know
>> that both intel as well as AMD with this bulldozer,
>> had tried to optimize performance for game. Games using AVX
>> It's not doing bad there in fact. Worse than the quadcore intels. I
>> don't want a quadcore chip though.
>> I want a million cores.
>>>> Mailscanner: Clean
>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>>>> To change your subscription (digest mode or unsubscribe) visit
>>> Robert G. Brown http://www.phy.duke.edu/~rgb/
>>> Duke University Dept. of Physics, Box 90305
>>> Durham, N.C. 27708-0305
>>> Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>>> To change your subscription (digest mode or unsubscribe) visit
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>> To change your subscription (digest mode or unsubscribe) visit
> Robert G. Brown http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf