C++ programming (was Newbie Alert: Beginning parallel program ming with Scyld)

Gerry Creager N5JXS gerry.creager at tamu.edu
Fri Oct 18 17:01:26 PDT 2002

Now, I must take exception the fine elcture on "What is a Library." 
I'll have you know *I* was using reusable code back in the days of 029 
keypunches, and IBM's IEBPRINTPUNCH utilities.  I called 'em subroutines 
and functions.  I called them directly or indirectly, and I reused them 
by loading them into the hopper every time I ran the job.  I lived for 
the times when I could justify a data _TAPE_ instead of 3 boxes of data 
_CARDS_, though!

Winkingly yours, Gerry

And for the record... How many REALLY recall the "Do Inspite Of" 
construct in the Fortran77 draft standards?


Robert G. Brown wrote:
> On Thu, 17 Oct 2002, Gerry Creager N5JXS wrote:
>>Strictly speaking, an accomplished Fortran programmer (OK.  I see you 
>>out there.  Stop giggling!) goes through 3 phases of accomplishment when 
>>learning about parallelization.
>>1.  Too dumb to follow convention.  Loops are almost always simply 
>>unrolled and parallelizable.
>>2.  Learned to follow the herd.  Loops are consistent with convention. 
>>Must decompose the entire program to make the one parallelizable loop in 
>>the middle a little more efficient.  Rest of code should have stayed 
>>linear but now suffers processing delays when CPU laughs at program 
>>3.  Learned what to do to parallelize code.  Segregates parallel code 
>>from serial code.  Processes each appropriately.  Trusts no compiler. 
>>Looks at assembly output for flaws.  Lives on Twinkies, Jolt and Cheetos 
>>(crispy, pepper-hot).
> <rgb type="rant" category="ignorable" on_topic_index="somewhat"
> affect="semihumorous">
> And just what does any of this have to do with Fortran?  Especially
> number 3?  Is it that Fortran programmers take two steps to get to step
> 3, while C programmers already live on TJC, trust no compiler, and
> recognize that they'd damn well better learn what to do to parallelize
> code because no silly-ass compiler be able gonna do it for them?  Heck,
> C compilers won't even >>serialize<< code for you...I put instructions
> out of order all the time;-)
> A point to raise before we pass into a state of outright war (not with
> you specifically Gerry, just with the discussion:-) is that there are
> Libraries, and there is the Compiler.  The compiler is this thing that
> turns higher level logic/action constructs into machine code, maybe
> taking a detour through Mr.  Assembler on the way.  Libraries are these
> collections of reusable code, accessible at a higher level via an API.
> Code that compiles in a lovely way will not run unless linked to these
> libraries.
> This discussion has almost from the beginning confused the two.
> C is arguably the thinnest of skins between the programmer and the
> machine language, the thinnest of interfaces to the operating system and
> peripheral devices.  For that reason it is generally preferred, I
> believe, for doing things like writing operating systems and device
> drivers, where you don't WANT a compiler doing something like
> rearranging your code at some higher level.  It is also one of the most
> powerful languages -- one of the major complaints against C is that it
> provides one with so LITTLE insulation agains all that raw power.  Wanna
> write all over your own dataspace and randomly destroy your program?
> With C you can.  Other languages might stop you, which is great a lot of
> the time but then stops you from doing it when it might really be
> clever, deliberate, and useful.  A C programmer has to be the most
> disciplined of programmers because with unholy power comes unholy
> responsibility or you'll spend an unholy amount of time dealing with
> memory leaks, overwriting your own variables, and sloppy evil.  But it
> can be oh, so efficient!
> C++, Fortran, Pascal, Basic all add or modify things to this basic skin.
> A lot of what they modify is syntactical trivia -- braces vs end
> statements to indicate logical code blocks, = vs := for assignment, ==
> vs .eq. for logical equality.  This sort of "difference" is irrelevant
> -- a good perl script could translate from one syntax to the other and
> in fact some good perl scripts do.
> However, the issue of "fortran can parallelize better than C" (or can
> parallelize at all) goes beyond differences in the language syntax.  The
> issue there is whether parallelization is better done (or is done at
> all) at the level of the compiler (translator to machine language
> statements) or with libraries.  Is it, should it be, intrinsic to the
> language constructs themselves or a deliberate choice engineered into
> the code.
> There has been debate about this over the ages, but my own opinion is
> that none of the existing "popular" languages are designed in a way to
> facilitate parallelism at the compiler level or (for that matter)
> vectorization, with the possible exception of APL, which actually had a
> hellacious way with arrays, where formulae like x = Ay with x and y
> vectors and A an array coded a lot like x <- Ay (where allowance should
> be made by any APL experts out there for the fact that I haven't
> actually used it in about twenty years;-).  C, F-whatever, C++ -- all of
> them would either do this with explicit loops in the code itself or with
> library calls, where the library would HIDE from the programmer the
> loops but they would be there nonetheless, likely written in code that
> was itself compiled from C or F or whatever source.  In APL those loops
> are STILL there, but completely and inextricably hidden from the user in
> the compiler itself.
> This may sound like a silly distinction, but it is not.  Before thinking
> of the really complicated parallel case, consider the much simpler case
> of (single threaded) linear algebra.  If you like, there are many BLAS.
> There are good BLAS, bad BLAS, ATLAS BLAS.  If you don't like your BLAS,
> you can change it, and provided you program via a BLAS-based API, you
> don't even have to change your code, ditto of course for higher order
> linear algebra or other libraries.  Consider how a regular compiler
> could deal with parallel BLAS.  Consider how one could link a regular
> BLAS with code compiled with a "parallel compiler".
> The real question is then, what SHOULD be done by the compiler and what
> SHOULD be done by the programmer with libraries, not just in a parallel
> environment but in any environment? C has always kept the compiler
> itself minimal and simple.  Even math (at one time "intrinsic" to
> fortran) is >>linked<< as a C library, because there are actually good
> ways and bad ways to code something as simple as sin(x), and it doesn't
> make sense to have to completely change compilers to replace the
> operational function calls.  Imagine buying a compiler with intrinsic
> BLAS if you had to buy a different revision for each hardware
> combination (to get ATLAS-like tuning).  Oooo, expensive.
> There it gets down to the hardware.  If the hardware supports just one
> best way of doing something like evaluating e.g. sin(x) or doing a x =
> Ay, then writing a compiler to support it as an elementary operation
> makes sense.  Just in the x86 family's evolution, however, I've watched
> 8 bit 8088 give way to 16 bit 8086, 8086 give way to 8086+8087,
> 8086+8087 give way to 486 (unifying the command operations) and on to P5
> and P6's, just to indicate a single architecture, where I used fortran
> compilers on this lot at least sometimes up to just about the 486.
> Well, the original 8088/8086 fortran just ignored the 8087, and
> replacements were expensive and slow to arrive.  One had to hand code
> assembler subroutine libraries to replace things like sin(x) in order to
> experience about a tenfold speedup in numerical code (to a whopping oh,
> 100 Kflops:-).  I wrote a bunch of them, then fortran started to
> directly support the 8087 instructions, then I stopped using fortran and
> never looked back.
> The moral of this story is that the "parallel constructs" in fortran are
> at least partly an illusion created by building certain classes of
> optimizing library calls into the compiler itself, which is a generally
> dangerous thing to do and also expensive -- requires constant
> maintenance and retuning (which is partly what you pay for with
> compilers that do it, I suppose).  For some, the performance boost you
> can get without rewriting your code is worth it (where the "without
> rewriting your code" is a critical feature, as the most common reason I
> hear for people to request fortran support is "I have this huge code
> base already written in fortran and don't want to port", not "I just
> love fortran and all its constructs":-).  If you DO have to rewrite your
> code anyway, then the thinness of the C interface provides a clear
> advantage that interpolates the non-portability of naked assembler and
> the convenience of x = Ay constructs at the compiler level, and because
> you will be "forced" to use libraries even for simple math, you'll be
> forced to confront library efficiency and algorithm.  You'll probably do
> better on a rewrite than you ever would with a "parallelizing compiler"
> and no rewrite, Gerry's original point.
> This is why I don't think there is really much difference between the
> major procedural or OO compilers for the purposes of writing parallel or
> most other code (lisp, apl etc excepted).  They all have loops,
> conditionals, subroutine and function calls.  They all support a fairly
> wide and remarkably similar range of data types that may or may not
> successfully create a layer of abstraction for the programmer depending
> on how religious the programmer and the compiler are about rules (Wanna
> access your double array in C as a linear char string?  Sure, no
> problem...:-).  Some folks like a more disciplined compiler that spanks
> them should they try this, or forces them to access their data objects
> only through a "protected" interface via a method.  Others like to live
> dangerously and have access to the raw data whenever they like, for good
> or evil purpose.  But this is just a matter of personal preference and
> educational history, no matter what the religious zealots of both sides
> would claim, and not worth fighting about.
> In parallel programming in particular (yes, this post IS relevant to
> beowulfery, yes it is:-) this issue is of extreme importance.  A true
> "parallel compiler" (in my opinion) would be something that could be fed
> very simple constructs such as x = A*y in code) that would spit out a
> distributeable executable that one could then "execute" with a single
> statement and have it run, in automagic parallel, on your particular
> parallel environment.  So far, I don't think there has been a dazzling
> degree of success with this concept even with dedicated parallel
> hardware -- it ends up being done with library calls and not by the
> compiler even then, and even then it doesn't always do it very WELL
> without a lot of work.
> Compared to dedicated hardware, beowulfs can be architected in many
> ways, with lots of combinations of hardware, memory cache, network
> speeds and types, latencies -- dazzlingly complex.  Even with a true
> "beowulf operating system", a flat pid space, a single executable line
> head node interface (such as Scyld is building) writing a parallel
>>>compiler<< would be awe inspiringly difficult.  Much simpler to leave
> the parallelization to either the user (via library calls) in a message
> passing model or at worst foist the problem off on the operating system
> by creating one of the distributed shared memory interfaces -- CC-NUMA
> or the like -- that hides IPC details from even the compiler and
> libraries.
> I won't say it'll never happen, only that I don't THINK that it'll ever
> happen.  Things change too quickly for it to even be a good idea, at
> least using todays COTS hardware.  Until then, I think that all wise
> programmers will pretty much ignore statements like "fortran compilers
> can parallelize better than _____ compilers" -- compilers don't, or
> shouldn't, parallelize at all, and code written for a serial system
> should almost certainly be REwritten to run in parallel, at least if you
> care enough about speedup that you bother to get a parallel machine in
> the first place.  At the library level, you aren't comparing compilers,
> you're comparing libraries, and may even be able to use the same library
> in multiple compilers.
> So let's be very careful, in our religious wars concerning compilers
> suitable for working with parallel computers, beowulfs in particular, to
> differentiate between "true" differences between compilers -- ways they
> do things that are fundamentally different and relevant to
> parallelization and their irrelevant syntactical differences e.g.  x**y
> vs pow(x,y) or {} vs do end.  Let us also be sure to leave out the
> equally irrelevant issues of whether or not objects, protection and
> inheritance, classes and so forth are good or bad things -- you may like
> them, I may not, and so what does that have to do with parallelization?
> As far as parallelization is concerned, PVM or MPI or sockets are PVM or
> MPI or sockets, in fortran or in c or in c++.  All that changes is the
> syntax and call methodology of the API, and even that doesn't change
> much.  That there might be trivial advantages here, or disadvantages
> there, for particular problems, comes as no suprise.  It is a GOOD thing
> that these are NOT features of the compiler, and a BAD thing to suggest
> to potential newbie parallel programmers that they "must" use one
> compiler or another to write good parallel code or to suggest that one
> compiler "parallelizes" code at all, let alone better than another.
> </rgb>
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Gerry Creager -- gerry.creager at tamu.edu
Network Engineering, Academy for Advanced Telecommunications
Texas A&M University, College Station, TX
Office: 979.458.4020 FAX: 979.847.8578 Cell: 979.229.5301 Page: 979.228.0173

More information about the Beowulf mailing list