>2 p4 processor systems

Steve Gaudet SGaudet at turbotekcomputer.com
Tue Aug 27 13:00:51 PDT 2002

Hello Brian,

> For whatever reason, management has decided that real estate 
> in the server
> room is an *extreme* issue.  There are plenty of empty racks, 
> but hey...what
> do I know.
> After testing, I found a single-cpu p4 system literally did 
> almost exactly
> (averaged, less than 1% difference) the same amount as a dual p3 1ghz.
> For what ever reason, it has been decided that getting 1U 
> 6-way p3 1ghz
> systems at extreme costs would be better than simply getting a dual p4
> system.   The dual p4 would do 2/3 the work, at 1/7 the cost.  Maybe I
> didn't take that special math that has caused so many places 
> so many issues
> lately, but...that just doesn't make sense to me.
> So I'm trying to find out if anyone knows of a 4-way p4 
> system out there.
> I'm wanting to bring a couple dual-p4's in here just so 
> they'll see that the
> performance far surpases the current per-node performance we 
> have on our
> cluster, but...brick wall.  The guy above me agrees with me, 
> the guy above
> him won't talk to me about it.  He just gets all excited 
> about a 6-way p3
> server in 1u.  Whoopie.
> So...help?  Anyone know of any 4-way p4 systems?  And no, amd isn't an
> option (unfortunately).

Check out SuperMicro's quad, currently tops out at 1.6Ghz.  However, should
release it with 2Ghz support soon.


Intel's Netburst architecture used in P4P and Xeon processors
requires has many new performance-enhancing functions (SSE2, Net-burst
Architecture, Hyper-threading, etc.) which require compilers that are aware
of how to use them.  Unfortunately, many developers are using GCC which has
few/poor optimizations for Netburst.  PGI is better, and the Intel compilers
are very good at extracting the best performance.  A compiler that doesn't
know about Netburst won't often:
- Deliver the best floating point throughput
- Effectively utilize data prefetching
- Automatically parallelize (vectorize) code

The good news for developers is that the Intel compilers also product the
best code for most x86 machines -- in fact AMD often uses them to compile
their benchmarks.

Intel also has hand packed MKL's and primitives which can take full
advantage of Netburst.  Information, free evaluations, etc. are all
available here:

There are also a few 3rd parties that have written reviews.  They show how
well the Intel compilers work, but don't highlight the additional gains
available on Netburst nor show the other tools (VTune, MKL's, etc.) that can
further enhance performance:

BOTTOM LINE: AMD's brute force approach (big/fast L1/L2 caches and faster
x87-like FPUs) don't require optimizations to extract performance.  But the
architecture has limits and doesn't scale well -- this is why Intel is
quickly increasing it's GHz advantage vs AMD.  Intel took a different
approach and rearchitected the cpu to scale rapidly.  But this new
architecture requires a new generation of compilers to deliver optimal
applications.  A good explanation of the functional differences between the
original Northwood and thunderbird are found here:

Hope this helps.


Steve Gaudet 
Linux Solutions Engineer
| Turbotek Computer Corp.    tel:603-666-3062 ext. 21             |
| 161 Abby Rd                fax:603-666-4519                     |
| Manchester, NH 03103       e-mail:sgaudet at turbotekcomputer.com  |
| toll free:800-573-5393     web: http://www.turbotekcomputer.com |


More information about the Beowulf mailing list