C++ programming (was Newbie Alert: Beginning parallel program ming with Scyld)
Gerry Creager N5JXS
gerry.creager at tamu.edu
Thu Oct 17 21:42:41 PDT 2002
Caveat: I couldn't stop myself. Read at your own risk.
Mark Hahn wrote:
>>Trying to be objective and not fan the flames ..
>
>
> hah!
Where'd I leave that gasoline can?
>>Hence a compiler can always optimise Fortran code more than C code becuase
>>the programmer has expressed the operation at a higher level (eg. a
>>matrix-multiply rather than a nested set of for-loops). However ultimately
>
>
> reasonably true for C vs Fortran, but clearly not for C++ vs Fortran.
And questionable for most F77 implementations, as well. While matrix
multiply was part of the standard, it was a part that was frequently
poorly implemented, in my past experience.
>>an expert C-programmer (read 'assembly programmer') may be able to tweak
>>even more performance by hand-unrolling loops, using prefetch, etc.
>
>
> I'm afraid I don't see why C is somehow hostile to compiler-directed
> unrolling and prefetch. in fact, gcc demonstrates this.
It's probably related to the unnatural use of indentation. C++ is
further problemmatic by its inclusion of curly braces instead of
readable code.
>>(Also Mark Hahn suggested that Fortran was not inherently parallel - I would
>>argue that it is since Fortran source code exposes available concurrency of
>>operations : vector notation, loop trip counts are known a-priori, no
>>aliasing of pointers, etc. )
>
>
> gcc also demonstrates that C has had optimization-friendly aliasing
> rules since C89.
Strictly speaking, an accomplished Fortran programmer (OK. I see you
out there. Stop giggling!) goes through 3 phases of accomplishment when
learning about parallelization.
1. Too dumb to follow convention. Loops are almost always simply
unrolled and parallelizable.
2. Learned to follow the herd. Loops are consistent with convention.
Must decompose the entire program to make the one parallelizable loop in
the middle a little more efficient. Rest of code should have stayed
linear but now suffers processing delays when CPU laughs at program
structure.
3. Learned what to do to parallelize code. Segregates parallel code
from serial code. Processes each appropriately. Trusts no compiler.
Looks at assembly output for flaws. Lives on Twinkies, Jolt and Cheetos
(crispy, pepper-hot).
--
Gerry Creager -- gerry.creager at tamu.edu
Network Engineering, Academy for Advanced Telecommunications
Texas A&M University, College Station, TX
Office: 979.458.4020 FAX: 979.847.8578 Cell: 979.229.5301 Page: 979.228.0173
More information about the Beowulf
mailing list