[Beowulf] Teaching Scientific Computation (looking for the perfect text)

Robert G. Brown rgb at phy.duke.edu
Tue Nov 20 12:23:30 PST 2007


On Tue, 20 Nov 2007, Nathan Moore wrote:

> After reflection though, I've started to wonder about the wisdom of my
> choice.  Specifically (like RGB), I love the GSL library, and extending GSL
> to fortran in an intro class is non-trivial.  Additionally, most vendors
> supply "fast" hardware libraries in C (I may be ignorant, but if a student
> wants to call an AMD ACML fast-math function(
> http://developer.amd.com/acml.jsp), or write a linear algebra function to
> run on a graphics card(http://developer.nvidia.com/object/cuda.html), the
> vendors seem to assume that you'll write the code in C).
>
> Also, and more relevant, I assume that most employers word-associate
> "Fortran is to backwards as C is to competence".
>
> So, I'm thinking about reworking the class to favor C, and fearing 3 weeks
> of pointer and addressing hell.  For those of you who teach scientific
> computation (and also those of you who hire undergrads), I'd be grateful for
> your thoughts.  One specific question I have is what text covers scientific
> programming and touches on MPI using the C language.

Ah, grasshopper, you have finally managed to snatch the stone.  (Or is it
drink the kool-ade, I never can remember...;-)

Let's see.  There are some truly excellent books that have helped my
independent study students on programming (most of whom are trained in
Java only these days, God help us all:-( learn C.  There are also two
free online C reference manuals, three if you count Gnu's (which
unfortunately is by far the worst of them).  I will endeavor to provide
you with a short shopping list, as I keep pretty much a complete C
programmer's reference toolkit on my laptop at all times including the
two aforementioned C textbook/manuals, various other free/online
manuals, and of course the man pages.  Even though I write a LOT of C I
still have to look things up -- what programmer doesn't?  When I'm
online I have even more access to free resources as GIYF.

To my mind, the most difficult aspect of programming in linux for a
newbie isn't the compiler, it is the programming environment and
learning how to create a project directory, put it under subversion (or
possibly CVS) control, pick a text editor (not a WYSIWYG WP, an editor,
ideally one that groks compilation and make), and write that first hello
world program.  The second longest thing is teaching them a sane way of
getting variables off of the command line and into a program, especially
when some of the kids have never really used a command line at all in
the past.  Drop the mice, guys -- fingers on those home keys!  That's
where you type, after all...

This process is greatly sped up if you provide them with a "standard
program template" like the one here:

   http://www.phy.duke.edu/~rgb/General/project.php

which can create a work directory for you by running a little script and
prepopulate it with a functional Makefile and a set of hello-world
sources complete with a routine that parses the command line and can
fairly easily be hacked to add new CL arguments.  Note that this
template is always changing and you'll like have to modify it and test
it to meet the needs of your students, but it will TREMENDOUSLY improve
your ability to grade projects as they can "make tgz" and ship you a
project tarball, ready to unpack, build, test.

I personally like the jove (Jonathan's Own Version of Emacs) for all
text editing because emacs is maddeningly complex, crufted, overladen
with features, and hence all but unlearnable.  jove installs easy, there
is a teachjove tutorial that will walk students through all they need to
know to use jove as a semi-IDE, and you're done.  I've got jove
tarball/rpm's, or it is in debian ready to roll from what I've heard
(what isn't?:-).

One of the two C books I'd recommend is here:

   http://www.phy.duke.edu/~rgb/General/c_book.php

which is a legal mirror of the original book.  Note well the license is
a "beer" license.  You have to buy the authors beer if the opportunity
to do so presents itself.  This may present problems for your students
if they are underage, but I imagine the authors of the book will be down
with that.  So each student can mirror my mirror, or they can find the
original online with google or otherwise and mirror that.  That way they
can have an entire, quite readable book on C, replete with cut-n-paste
examples, free, installed on their personal computers where it is always
handy and useful.

This is a "beginner" level book -- complete enough for simple code, but
not really up to snuff on things like threads or advanced data
structures.  For the SERIOUS student, or the student working on an
advanced project where forking, threads, and so on are necessary, I'd
recommend that they use Dave Marshall's book:

   http://www.cs.cf.ac.uk/Dave/C/CE.html

This is a GREAT book.  A full out, pro-grade book on C including some
very advanced stuff.  Parts are a tiny bit dated -- the first chapter of
the book deals with CDE on Solaris -- but it is a killer reference.
Online access is free, and if you contact Dave regarding redistribution
of the book to a class full of students in PDF format (which he has, and
for that matter so do I) he might be open to it.  He sent one to me
willingly enough when I asked him about it, so I've got it on my laptop
for eternity, network or not.  He was going to look into a Gnu OPL for
it when last we communicated.  At this point I'd suggest (to him)
throwing it up onto Lulu for cheap download in PDF or paper print with a
reasonable markup -- he'd make money and students would get a generally
lovely book in whatever form(s) they prefer.

Then there are a variety of algorithm books or programming environment
books I like to recommend to students I'm teaching:

  * King Abel and Graham Glass's book.

http://www.amazon.com/Linux-Programmers-Users-Graham-Glass/dp/0131857487

I've used this book since it was just glass and Unix instead of Linux.
A classic.  Way too EXPENSIVE a classic at $84, but a classic.

  * Mastering Algorithms in C (O'Reilly Seahorse IIRC).  Excellent
reference for stock-in-trade stuff like data structures, linked lists,
sorting, numerical methods (to a point).

  * Kernighan and Pike's The Practice of Programming.  I'd actually make
this a requirement for any new C programmer.  It covers things like
INDENT YOUR GODDAMN CODE AND I REALLY MEAN IT, and IF YOU DON'T COMMENT
YOUR CODE YOU WON'T UNDERSTAND IT YOURSELF TWO WEEKS FROM NOW, and WHAT
THE HELL IS aRp IN YOUR CODE?  DOES THAT MEAN SOMETHING TO YOU ASIDE
BESIDES "BARF"?  so that you don't have to yell QUITE so much at your
students and has lots of other wisdom as well.

  * Optionally Kernighan and Pike's The Unix Programming Environment
isn't quite as good as Abel and Glass, but it is still damn good and and
excellent reference.

For numerical methods per se, well, there is STILL Numerical Recipes in
C, but I no longer use it in any code because it sucks in so many ways.
It is encumbered, with a code-reuse license out of the dark ages, I mean
a really, really, really bad one (in spite of the fact that some of the
code they are trying to encumber comes right out of the literature).  I
personally think that all they need is the GSL itself and the GSL's
manual, online or otherwise.  At this point anything you're at all
likely to have beginners doing is well-supported there, with
documentation and code examples, with a list to ask difficult questions
on.  And yes, they can buy a paper copy of this from Amazon (and help
support the project at the same time).  And it is even cheap.

My actual favorite numerical methods book is alas in fortran --
Forsythe, Malcom and Moler -- but all that they really need from any
such textbook is a two lecture discussion on discretization error and
bad ways of doing what appear to be straightforward tasks because
computers do discrete arithmetic.  If the course were going to FOCUS on
numerical methods, e.g. derive a 4th-5th order RK ODE solver and code
it, you'd likely need t textbook to help, but to USE an RK ODE solver --
just the GSL, just the GSL.

The last reference I like to recommend is (of course) Knuth's TAOP.
'Nuff said.  Expensive as hell for the full set, but it IS the only
reference you'll ever need, unless you plan to do network programming in
which case you'll likely need one or more of Stevens' excellent books,
or plan to do systems programming at the kernel level, in which case
you'll need several books I'm not going to list here, or MPI or PVM, in
which case you'll need -- what?  Well there are a few excellent books
from IIRC MIT press on them, but they may be dated at this point.  There
are the many superb articles on www.clustermonkey.new and in the Linux
Magazine archives.  Again, having a good template goes a long way --
there is a PVM template on my website but I don't use MPI (or PVM, much,
anymore) so you'll have to get them from other folks.

I've found that some subset of the *'d references and free references
above can take a student that can program in ANY language already and
make a credible C programmer out of them in a semester.

And jeeze, man.  Pointers rock.  If you want students to actually learn
how computers work so that they can UNDERSTAND what they are doing they
are peerless (if painful) instruments of learning.  Want a block of
memory?  Go get it, lay it out, access it, all with fairly high level
commands.  Want to overlay a secondary addressing scheme on top of it?
Sure, why not.  Allocate a vector, pack its addressing into a ***pointer
to make a matrix[][][] that you can still pass to an ODE solver that
wants vectors only, while addressing it in a completely natural way in
the deriv evaluator.  You can do things in C easily that you can't do AT
ALL in other languages, all because of pointers and the ability to
recast variables.  Sure, you can break a program more horribly than you
ever could in Fortran (although I've managed to break fortran pretty
horribly).  That's why you get them K&P and make them read it FIRST, and
why you teach them about adding


#define MYDEBUG(b)  if ( (verbose == b) || (verbose == D_ALL) )

  typedef enum {
    D_QUIET,
    D_ALL,
    D_V_2,
    D_V_3,
    D_V_3,
    N_DEBUG };

into their primary program header and sticking:

x = 1.0;
MYDEBUG(D_V_1){
  fprintf(stderr,"# This is my subroutine 1, and variable x = %f\n",x);
}

chunklets all over their damn code as they write it.  Run the program as

   $myprog -v 2

and out spews:

# This is my subroutine 1, and variable x = 1.000000

Run it as

   $myprog

and it doesn't.  I instrument my code so that at any instant I can get a
completely verbose picture of everything it is doing as it goes along,
or "zoom in" on just one subroutine.  It is tricks like this that make C
programming robust and doable.  One should do it in any language, of
course, but in C it is ESSENTIAL to make sure that the program is doing
what it is supposed to be doing, not overwriting array boundaries, and
so on.  In fortran you are lulled into a false sense of security because
you THINK the compiler or runtime engine will catch errors for you when
in fact there are lots of errors they will miss, and then you are REALLY
dead.

HTH,

    rgb

> regards,
>
> Nathan Moore
>
>
>

-- 
Robert G. Brown
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone(cell): 1-919-280-8443
Web: http://www.phy.duke.edu/~rgb
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977



More information about the Beowulf mailing list