[Beowulf] Teaching Scientific Computation (looking for the perfect text)

Robert G. Brown rgb at phy.duke.edu
Wed Nov 21 06:40:00 PST 2007


On Tue, 20 Nov 2007, Donald Shillady wrote:


> Hi, I am one of those old grey-haired Professors who spent years since
> 1963 using variants of FORTRAN with time out for four years of enforced
> Burroughs ALGOL and then back to FORTRAN.  In Chemistry education there
> is an added problem of students who want to avoid mathematics.  I taught
> Physical Chemistry Laboratory for some 30 years and found BASIC/GWBASIC
> is a way to introduce students to sequential coding that could lead to
> later use of other languages such as f77, C++ or PASCAL.  In a lab
> course it is easy to write some special purpose program in GWBASIC for a
> given lab report and the concept of programming takes second place to
> the formulas used; often such programs are less than a single page.  In
> my area of Quantum Chemistry there are a few folks pushing C++, but I
> would estimate that 95% of most huge programs such as GAMESS or GAUSSIAN
> are in some form of FORTRAN.  Personaly I want to "think" in formulas
> not in pointers and the pointers just add another layer of complexity to
> what is probably already complicated mathematics.  After all that is

Sure, but while pointers CAN be used to write obfuscated code in C -- a
practice I totally eschew -- they can also be used to work miracles in C
creating "matrices" that are not square, and do not start with index 0
or 1, and do not waste space, and map into a specific vector so that one
can write general purpose subroutines that act on vectors and use them
for arbitrary tensor forms.  When you combine this with C's ability to
specify and manipulate structs (which are the old-timey objects that
preceeded c++ and continue to be the basis of OO programming in C today)
you have a truly amazing ability to generate custom data objects that
are both efficient and natural for -- translating formulas.

To pick a single example, if I were to have a four-dimensional set of
differential equations where the indices were e.g. particle number,
principle quantum number n, angular momentum indices l and m, where n,
l, and m are range restricted so that the most efficient allocation of
memory would be a triangular one and where one wishes to access the
particle by means of something like x[i][n][l][m], it would be nearly
impossible to build the array in Fortran without (effectively)
manipulating pointers.  It is easy in C -- the code to lay out the
matrix would take me maybe thirty minutes to write and another thirty to
test.  If one then wishes to (for whatever reason) write coupled
differential equations to generate x_inlm(t) from some initial state --
well, most ODE solvers only work on vectors of ODEs.

In fortran this leaves one working with an abomination of displacement
arithmetic on a vector -- I'm sure we've all written X[I + IMAX*N +
IMAX*NMAX*L...]  (which isn't even right -- it has been years since I've
had to do this and thank God I never will have to again) to somehow
access a vector in terms of matrix indices.  The resulting code is both
unreadable and impossible to debug.

In C there is simply no problem.  I allocate all the memory required in
a single block and assign it to a vector pointer v.  I then do all the
displacement arithmetic one time using a very simple repacking algorithm
to pack the ADDRESSES of only the last row starting points into a ****x
pointer, which I can then address like x[i][n][l][m].  I can then call
the ODE solver with vector v, but write the derivs routine with
x[i][n][l][m].  Later on I can print out the results as x[i][n][l][m].
No space is wasted, I only write READABLE code for derivatives and so
on, and I can lay out the objects in memory if I care to to optimize
streaming data access and speed up the routine by a factor of 2 or
thereabouts relative to the fixed stride and predetermined layout that
one would have to use to allocate a bit rectangular block matrix most of
which would never be used and would have to be stepped over.

> where the acronynym came from: "Formula Translation".  The idea is to
> think in math terms and let the compiler translate the formulas to
> machine code.  Another factor is the ability to just create variables as
> desired without all the declarations required in PASCAL and ALGOL.

I agree that Pascal is pretty fascist here and I dislike it for that
reason as well.  ALGOL, I am happy to say, was before my time (not much
was, alas:-).  C lets you generate variables on the fly, and pointers
are one of the ways you can in fact dynamically allocate a variable, use
it for a while, free it, and the reuse the name for a new version of the
variable with perhaps different parameters.  I can create an array in
this pass through a loop that is 3x3, next pass 5x5, next pass 2x2, and
don't have to allocate a fixed size array of 5x5 to do so.  The downside
of this absolute freedom is that you have the freedom to leak memory
like a sieve or allocate "untyped" variables and then use them in such a
way that introduces garbage into the code.

Pascal was fascist because it was basically a learning language and was
trying to force undisciplined students to be absolutely disciplined
about declaring, typing and so on their variables.  While I dislike its
rigidity, its goal is a good one and a programmer in ANY language is
unwise if they are too cavalier about their data typing or too sloppy
about data declarations.  Fortran's implicit types, for example, mean
that variable names often look "odd" (even though I find nearly 20 years
after I last wrote fortran that I tend to use i-n as the first letter of
integer variables in C, sigh).  C is moderately strongly typed, but does
only a little, easily defeated, compile time checking.  Runtime misuse
of typed variables is just plain your fault and you have to fix it.

> Finally, for a long, long time the machine code produced by various
> fortran compilers was testably faster and PASCAL would be a counter
> example of lucid slowness!  As far as a text, I never used one, I am
> coasting along on a two week course in FORTRAN II in 1963 and along the
> way just looking at published routines and learning what works and what
> does not.  For learning I would still suggest BASIC/GWBASIC.

Fortran compilers were, without question, well written and efficient for
a lot of numerical code.  Probably still are.  The very ability to
generate freeform arrays and so on I describe above CAN lead to very
efficient C code programs, or one can generate data structs in orders
that actively defeat streaming memory access optimization because they
are (paradoxically) easier to read and think about that way.  Sometimes
there really is an ease of coding/performance trade off, after all.
Literal translation of formulas may not be efficient, and as has already
been demonstrated, can all too easily lead to wrong answers when e.g.
summing series, especially alternating or long tail series.

Fortran has always been great on linear algebra, though, BECAUSE its
square matrices and linear vectors and lack of object/pointer
flexibility made it possible to really work on linear algebra algorithm
optimization.  The one other feature of fortran that I miss is its
binary exponentiation operator.  In C exponentiation is a library
function.  In fortran it PROBABLY is as well -- it certainly requires a
call to a complex piece of code as opposed to a simple code fragment
however it is represented in a program -- but y'know, it really is
easier to write and understand code like a*b**i or a*b^i rather than
a*pow(b,i).

Fortran's I/O commands are terrible.  Fortran is miserable if you have
to manipulate text.  Fortran isn't the easiest thing (consequently) to
interface with any sort of GUI or human/interactive code.  Here C has a
really significant advantage, although C is still far short of the ease
of managing strings and the like in e.g. perl.  A second way I'd LOVE to
change C is to fully integrate regular expressions into the language for
string manipulation so that perl-like constructs such as 
if(a =~ /^Start/){
   do something
}
worked.  Yes, you can do it with strcmp or a regexp library and some
effort, but the programming time in perl is vastly lower and the code is
much more readable.  C's "parsing" is simply not what it could be,
although it is entirely understandable and one can easily manipulate
data at the byte by byte level to do whatever you like.

    rgb

> Don Shillady
> Emeritus Professor of Chemistry, VCU
> Ashland VA (working at home)> Date: Tue, 20 Nov 2007 13:26:08 -0800> From: lindahl at pbm.com> To: diep at xs4all.nl> Subject: Re: [Beowulf] Teaching Scientific Computation (looking for the perfect text)> CC: Beowulf at beowulf.org> > X-Frumious: Bandersnatch> > On Tue, Nov 20, 2007 at 09:46:41PM +0100, Vincent Diepeveen wrote:> > There is several ways to look at this issue.> > Suppose your students totally fail as physics student and even more > > as future manager/teamleader and continue as computer science students.> > > > Then what language can they use best?> > ... then they'll be studying many languages, and it won't be any big> deal that they studied Fortran, Python, and Mathematica in their first> course.> > It's dumb to act as if these students are never learning another> language.> > -- greg> > _______________________________________________> Beowulf mailing list, Beowulf at beowulf.org> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mai
 lman/listinfo/beowulf

-- 
Robert G. Brown
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone(cell): 1-919-280-8443
Web: http://www.phy.duke.edu/~rgb
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977


More information about the Beowulf mailing list