[Beowulf] itanium vs. x86-64

Toon Knapen toon.knapen at gmail.com
Mon Feb 9 23:42:38 PST 2009

Mark Hahn wrote:
>> I have been working on itanium machines for a year now and I actually 
>> found
>> the hw pretty elegant and the dev software stack on top of it (compiler,
>> profiler etc) pretty handy.
> aren't all the same tools available on x86_64?  or were you referring 
> to, eg,
> something SGI-specific?

Actually I'm using an itanium in an HP-integrity server. The tools I'm 
referring to are the HP C and C++ compiler and their profiler called 

The C compiler for instance can add memory-debugging code (do not know 
any compiler with a similar feature, valgrind nevertheless is more 
powerfull). Next caliper allows to get a lot of diagnostics from the cpu 
(also because ia64 supports all that while x86-64 does not AFAICT) like 
number of bubbles in the pipeline, L2-cache misses, clock-cycles per 
line of C-code etc.

>> but now with the Tukwila switching to the QuickPath, how do you guys 
>> think
>> Itanium will perform in comparison to Xeon's and Opteron's ?
> this change would be interesting if it meant that the next-gen numalink
> box could take nehalems rather than ia64.  I can't really understand why 
> Intel has stuck with ia64 this long - perhaps the economy will provide
> the fig-leaf necessary to dump it.

Are you sure nehalem will outperform ia64. I will probably switch from 
ia64 to x86-64 and knowing that my code is mostly memory-bound, I'm 
wondering what I will gain.  Of course the only way to know is to test 
it but that has not been possible yet.

> (why am I down on ia64?  mainly the sense of unfulfilled promise: the 
> ISA was
> supposed to provide some real advantage, and afaikt never has.  the VLIW-ie
> ISA was intended to avoid clock scaling problems created by CISC decode and
> OOO, no?  but the ia64 seems to have only distinguished itself by 
> relatively
> large caches and offering cache-coherency hooks to SGI.  have other 
> people had the experience ia64 doing OK on code with 
> regular/unrollable/prefetchable data patterns, but poorly otherwise?)

I'm not able to compare yet because I have not run the code on anything 
else than ia64. But caliper allowed me to get a lot of diagnostics on 
the cpu while running my code that allowed me to optimise easily.


More information about the Beowulf mailing list