Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

BLAS-1, AMD, Pentium, gcc

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Don Holmgren djholm at fnal.gov
Fri Apr 12 10:36:22 PDT 2002


On Fri, 12 Apr 2002, Hung Jung Lu wrote:

> Hi,
> 
> I am thinking in migrating some calculation programs
> from Windows to Linux, maybe eventually using a
> Beowulf cluster. However, I am kind of worried after I
> read in the mailing list archive about lack of
> CPU-optimized BLAS-1 code in Linux systems. Currently
> I run on a Wintel (Windows+Pentium) machine, and I
> know it's substantially faster than equivalent AMD
> machine, because I use the Intel's BLAS (MKL) library.
> (I apologize for any misapprehensions in what
> follows... I am only starting to explore in this
> arena.)
> 
> (1) Does anyone know when gcc will have memory
> prefetching features? Any time frame? I can notice
> very significant performance improvement on my Wintel
> machine, and I think it's due to memory prefetching.


If you mean, "when will gcc's optimizer do automatic prefetching?", I
have no idea.  But, many programmers have been doing manual prefetching
with gcc for quite a while. If you don't mind defining and using
assembler macros, gcc handles it just fine now.  Here's an example:

#define prefetch_loc(addr) \
__asm__ __volatile__ ("prefetchnta %0" \
                      : \
                      : \
                      "m" (*(((char*)(((unsigned int)(addr))&~0x7f)))))


> (2) I am a bit confused on the following issue: Intel
> does release MKL for Linux. So, does this mean that if
> I use Pentium, I still get full benefit of the
> CPU-optimized features in BLAS-1, despite of gcc does
> not do memory prefetching? How is this possible?


The Intel compiler produces object files compatible with gcc, and vice
versa.  I would assume they implemented the library with the Intel
compiler, which has full SSE/SSE2 support (including prefetching).  They
list the MKL for Linux as compatible with both gnu and Intel compilers.


> (3) Related to the above: for general linear algebra
> operations, is Pentium processor then better than AMD,
> since Intel has the machine-optimized BLAS library? I
> get contradictory information sometimes... I've seen
> somewhere that Pentium-4 compares unfavorably with AMD
> chips in calculation speed... Any opinions?
> 
> thanks,
> 
> Hung Jung Lu


For the very simple SU3 linear algebra (3X3 complex matrices and 3X1
complex vectors) used in our codes, the Pentium 4 outperforms the Athlon
on most of our SSE-assisted routines.  See the table near the bottom of
   http://qcdhome.fnal.gov/sse/inline.html
for Mflops per gigahertz on various routines for P-III, P4, and Athlon.
Perhaps re-coding in 3DNow! would give the Athlon a boost.

For our codes, which are bound by memory bandwidth, P4's do
significantly better than Athlons because of the faster front side bus
(400 Mhz effective).  See 
   http://qcdhome.fnal.gov/qcdstream/compare.qcdstream
for a table comparing memory bandwidth and SU3 linear algebra
performance on a 1.2 GHz Athlon, 1.4 GHz P4, and 1.7 GHz P7 (see
   http://qcdhome.fnal.gov/qcdstream/   
for information about this benchmark).

Don Holmgren
Fermilab




More information about the Beowulf mailing list