[Beowulf] PCPro: AMD: what went wrong?
Lux, Jim (337C)
james.p.lux at jpl.nasa.gov
Mon Feb 20 12:29:43 PST 2012
Comments below about automated vs manual design..
On 2/20/12 10:10 AM, "Mark Hahn" <hahn at mcmaster.ca> wrote:
>> mid-range Core i5s. The verdict was unanimous; our sister title
>> bit-tech dubbed the FX-8150 a ?stinker?.
>well, for desktops. specFPrate scores are pretty competitive
>(though sandybridge xeons are reportedly quite a bit better.)
>> Light was shed on Bulldozer?s problems when ex-AMD engineer Cliff
>> Maier spoke out about manufacturing issues during the earliest stages
>> of design. ?Management decided there should be cross-engineering
>> [between AMD and ATI], which meant we had to stop hand-crafting CPU
>> designs,? he said.
>I'm purely armchair when it comes to low-level chip design, but to me,
>this makes it sound like there are problems with their tools. what's
>the nature of the magic that slower/human design makes, as opposed to
>the magic-less automatic design?
One place where humans can do a better job is in the place and route,
particularly if the design is tight on available space. If there's plenty
of room, an autorouter can do pretty well, but if it's tight, you get to
high 90s % routed, and then it gets sticky. It's a very, very complex
problem because you have to not only find room for interconnects, but
trade off propagation delay so that it can actually run at rated speed:
spreading out slows you down. (same basic problem as routing printed
Granted modern place and route is very sophisticated, but ultimately, it's
a heuristic process (Xilinx had simulated annealing back in the 80s, for
instance) which is trying to capture routine guidelines and rules (as
opposed to trying guided random strategies like GA, etc.)
Skilled humans can "learn" from previous similar experience, which so far,
the automated tools don't. That is, a company doesn't do new CPU designs
every week, so there's not a huge experience base for a "learning" router
to learn from.
The other thing that humans can do is have a better feel for working the
tolerances.. That is, they can make use of knowledge that some
variabilities are correlated (e.g. Two parts side by side on the die will
"track", something that is poorly captured in a spec for the individual
Pushing the timing margins is where it's all done.
> is this a tooling-up issue that would
>only affect the first rev of auto-designed CPUs? does this also imply
>that having humans tweak the design would make the GPU/APU chips faster,
>smaller or more power-efficient?
Historically, the output of the automated tools is very hard to modify by
a human, except in a peephole optimization sense. This is because a human
generated design will typically have some sort of conceptual architecture
that all hangs together. An automated design tends to be, well,
unconstrained by the need for a consistent conceptual view.
It's a lot harder to change something in one place and know that it won't
break something else, if you didn¹t follow and particpate the design
process from the top.
There's a very distinct parallel here to optimizing compilers and "hand
coded assembly". There are equivalent tools to profilers and such, but
it's the whole thing about how a bad top level design can't be saved by
extreme low level optimization.
Bear in mind that Verilog and VHDL are about like Assembler (even if they
have a "high level" sort of C-like look to them). There are big
subroutine libraries (aka IP cores), but it's nothing like, say, an
automatically parallelizing FORTRAN compiler that makes effective use of a
>presumably this change from semi-manual to automatic design (layout?)
>was motivated by a desire to improve time-to-market. or perhaps improve
>consistency/predictability of development? have any such improvements
>resulted? from here, it looks like BD was a bit of a stinker and that
>the market is to some extent waiting to see whether Piledriver is the
>chip that BD should have been. if PD had followed BD by a few months,
>this discussion would have a different tone.
There is a HUGE desire to do better automated design, for the same reason
we use high level languages to develop software: it greatly improves
productivity (in terms of number of designs that can be produced by one
There aren't all that many people doing high complexity IC development.
Consider something like a IEEE-1394 (Firewire) core. There are probably
only 4 or 5 people in the *world* who are competent to design it or at
least lead a design: not only do you need to know all the idiosyncracies
of the process, but you also need to really understand IEEE-1394 in all of
it's funky protocol details.
Ditto for processor cores. For an example of a fairly simple and well
documented core, take a look at the LEON implementations of the SPARC
(which are available for free as GPLed VHDL). That's still a pretty
complex piece of logic, and not something you just leap into modifying, or
>then again, GPUs were once claimed to have a rapid innovation cycle,
>but afaikt that was a result of immaturity. current GPU cycles are
>pretty long, seemlingly as long as, say, Intel's tick-tock. Fermi
>has been out for a long while with no significant successor. ATI
>chips seem to rev a high-order digit about once a year, but I'm not
>sure I'd really call 5xxx a whole different generation than 6xxx.
>(actually, 4xxx (2008) was pretty similar as well...)
I suspect that the "cycle rate" is driven by market forces. At some point,
there's less demand for higher performance, particularly for something
consumer driven like GPUs. At some point, you're rendering all the
objects you need at resolutions higher than human visual resolution, and
you don't need to go faster. Maybe the back-end physics engine could be
improved (render individual sparks in a flame or droplets in a cloud) but
there's a sort of cost benefit analysis that goes into this.
For consumer "single processor" kinds of applications we're probably in
that zone.. How much faster do you need to render that spreadsheet or word
document. The bottleneck isn't the processor, it's the data pipe coming
in, whether streamed from a DVD or over the network connection.
>> Production switched to faster automated methods, but Maier says the
>> change meant AMD?s chips lost ?performance and efficiency? as crucial
>> parts were designed by machines, rather than experienced engineers.
>were these experienced engineers sitting on their hands during this time?
No, they were designing other things (or were hired away by someone else).
There's always more design work to be done than people to do it. Maybe
AMD had some Human Resources/Talent Management/Human Capital issues and
their top talent bolted to somewhere else? (there are people with a LOT
of cash in the financial industry and in government who are interested in
ASIC designs.. At least if the ads in the back of IEEE Spectrum and
similar are any sign.)
Being a skilled VLSI designer capable of leading a big CPU design these
days is probably a "guaranteed employment and name your salary" kind of
>> AMD?s latest chips haven?t stoked the fires of consumers, either.
>> Martin Sawyer, technical director at Chillblast, reports that ?demand
>> for AMD has been quite slow?, and there?s no rush to buy Bulldozer.
>well, APU demand seems OK, though not very exciting because the CPU
>cores in these chips are largely what AMD has been shipping for years.
I would speculate that consumer performance demands have leveled out, for
the data bottleneck reasons discussed above. Sure, I'd like to rip DVDs
to my server a bit faster, but I'm not going to go out and buy a new
computer to do it (and of course, it's still limited by how fast I can
read the DVD)
>> ?With no AMD solutions competitive with an Intel Core i5-2500K?, he
>> says, ?AMD is a tough sell in the mid- and high-end market.? Another
>> British PC supplier told us off-the-record that sales are partly
>> propped up by die-hards who only buy AMD ?because they don?t like
>to some extent. certainly AMD has at various times in the past been able
>to claim the crown in:
> - 64b ISA and performance
> - memory bandwidth and/or cpu:mem balance
> - power efficiency
> - integrated CPU-GPU price/performance.
> - specrate-type throughput/price efficiency
>but Intel has executed remarkably well to take these away. for instance,
>although AMD's APUs are quite nice, Intel systems are power efficient
>enough that you can build a system with an add-in-card and still match
>or beat the APU power envelope. Intel seems to extract more stream-type
>memory bandwidth from the same dimms. and Intel has what seems like a
>pipeline already loaded with promising chips (SB Xeons, and presumably
>ivybridge improvements after that). MIC seems promising, but then again
>with GCN, GPUs are becoming less of an obstacle course for masochists.
Maybe Intel hired all of AMDs top folks away, and that's why AMD is using
more automated design? <grin>
>from the outside, we have very little visibility into what's going on with
>AMD. they seem to be making some changes, which is good, since there have
>been serious problems. whether they're the right changes, I donno. it's
>a little surprising to me how slowly they're moving, since being
>would seem to encourage urgency. in some sense, the current state is near
>market equilibrium, though: Intel has the performance lead and is clearly
>charging a premium, with AMD trailing but arguably offering decent value
>with cheaper chips. this doesn't seem like a way for AMD to grow market
But hasn't that really been the case since the very early days of x86? I
seem to recall some computers out in my garage with AMD 286 and 386 clones
AMD could also attack the embedded processor market with high integration
flavors of the processors.
Does AMD really need to grow market share? If the overall pie keeps
getting bigger, they can grow, keeping constant percentage market share.
They've been around long enough that by no means could they be considered
a start=up in a rapid growth phase.
More information about the Beowulf