C++ programming (was Newbie Alert: Beginning parallel program ming with Scyld)

Gerry Creager N5JXS gerry.creager at tamu.edu
Thu Oct 17 21:42:41 PDT 2002


Caveat:  I couldn't stop myself.  Read at your own risk.

Mark Hahn wrote:
>>Trying to be objective and not fan the flames ..
> 
> 
> hah!

Where'd I leave that gasoline can?

>>Hence a compiler can always optimise Fortran code more than C code becuase
>>the programmer has expressed the operation at a higher level (eg. a
>>matrix-multiply rather than a nested set of for-loops). However ultimately
> 
> 
> reasonably true for C vs Fortran, but clearly not for C++ vs Fortran.

And questionable for most F77 implementations, as well.  While matrix 
multiply was part of the standard, it was a part that was frequently 
poorly implemented, in my past experience.

>>an expert C-programmer (read 'assembly programmer') may be able to tweak
>>even more performance by hand-unrolling loops, using prefetch, etc.  
> 
> 
> I'm afraid I don't see why C is somehow hostile to compiler-directed
> unrolling and prefetch.  in fact, gcc demonstrates this.

It's probably related to the unnatural use of indentation.  C++ is 
further problemmatic by its inclusion of curly braces instead of 
readable code.

>>(Also Mark Hahn suggested that Fortran was not inherently parallel - I would
>>argue that it is since Fortran source code exposes available concurrency of
>>operations : vector notation, loop trip counts are known a-priori, no
>>aliasing of pointers, etc. )
> 
> 
> gcc also demonstrates that C has had optimization-friendly aliasing
> rules since C89.

Strictly speaking, an accomplished Fortran programmer (OK.  I see you 
out there.  Stop giggling!) goes through 3 phases of accomplishment when 
learning about parallelization.

1.  Too dumb to follow convention.  Loops are almost always simply 
unrolled and parallelizable.

2.  Learned to follow the herd.  Loops are consistent with convention. 
Must decompose the entire program to make the one parallelizable loop in 
the middle a little more efficient.  Rest of code should have stayed 
linear but now suffers processing delays when CPU laughs at program 
structure.

3.  Learned what to do to parallelize code.  Segregates parallel code 
from serial code.  Processes each appropriately.  Trusts no compiler. 
Looks at assembly output for flaws.  Lives on Twinkies, Jolt and Cheetos 
(crispy, pepper-hot).

-- 
Gerry Creager -- gerry.creager at tamu.edu
Network Engineering, Academy for Advanced Telecommunications
Texas A&M University, College Station, TX
Office: 979.458.4020 FAX: 979.847.8578 Cell: 979.229.5301 Page: 979.228.0173




More information about the Beowulf mailing list