[Beowulf] Hadoop

Gus Correa gus at ldeo.columbia.edu
Mon Dec 29 09:18:32 PST 2008

Hello Beowulfers

(This thread should be renamed "Matlab and Octave".)

Matlab is the "lingua franca" for computing among students and young 
at least in Earth Sciences (solid earth, atmosphere/oceans/climate,
geochemistry, etc), as I observe it here.
A number of our students come from Physics, Chemistry,  Biology, etc,
hence the trend is probably more widespread.
Some can get by graduate school even with Excel only.

As others observed on this thread,  Matlab is a great prototyping tool,
which makes it very attractive.
Integrated environment, with GUI, editor, online help,
programming examples and tips, and instant visualization of results,
is yet another high point of Matlab.
For most people this type of environment is not only convenient,
but also addictive.
I like Octave, the command line is virtually identical to Matlab,
but couldn't get all these GUI-sh bells and whistles to work in Octave.
Because of this dependence, our Observatory has a Matlab site license.

Moreover, several top numerical models in oceans and climate depend
heavily on Matlab scripts for post-processing and data analysis.
This may be the case in other areas too.
For instance, not long ago I saw several job ads for Matlab programmers in
the Princeton Plasma Physics Lab.
As Matlab scripts and tasks get bigger and bigger, 
the positive feedback created the need and market for parallel versions 
of Matlab.

In many cases Matlab is the only programming environment that
science and engineering students came across with.
It is introduced on Linear Algebra, Numerical Analysis, Signal Processing,
and other classes, and it sticks, it settles down.
As James, Gerry and others observed,
a lot of people only need to do prototyping anyway: 
proof of concept, one-time calculations of modest size,
and for this Matlab works very well.

Matlab's cavalier approach to memory management -
or perhaps the inadvertent cavalier approach to Matlab by naive users -
may be the main cause for the scaling problems.
Most failures I've seen in Matlab scripts come from exhaustion of
computer resources,  particularly memory.
Even when you free memory judiciously, problems may arise.
This happens here very often with people trying to do, say,
singular value decomposition or principal component analysis
on huge and dense matrices / datasets, etc.

In the old days of punched cards, Fortran was part of the
engineering and scientific training.
Fortran was king in Intro to Computers classes or similar.
That is no longer true.
Fortran lost its charm and status among computer scientists
(even John Backus abandoned it).
In addition, today most college scientific curricula take for granted
the computer literacy of its freshmen students.
A mistake, I think.
(A few students are great hackers, but most only know Skype,
Facebook, MS Word.)

I think Intro to Computers courses would continue to be useful for
engineers and science majors.
(Not for prospective computer scientists, of course, who need much more 
than that.)
These courses should include basic Unix/Linux literacy, 
shell scripting (or Perl, or Python), 
the old-fashioned but effective principles of "structured programming"
(call it "modular programming" to make it palatable),
and the rudiments of a language of choice.
This language may be Fortran,  which continues to be the dominant one
in science and engineering code, or perhaps C.

However, when these Intro to Computers courses exist,
they try to teach Java, C++, etc, often using Microsoft Studio,
or another programming environment that traps the user,
and doesn't give him/her the required computer craftsmanship (and autonomy)
for their  professional life.

For most prospective engineers and general scientists a computer
is more of a tool than a theoretical model.
OO-languages, Turing machines, cellular automata,
make nice class discussion topics,
but can't replace the development of basic computer literacy and skills.

My $0.02.

Gus Correa

Gustavo Correa, PhD - Email: gus at ldeo.columbia.edu
Lamont-Doherty Earth Observatory - Columbia University
P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA

Gerry Creager wrote:
> OUR users are willing to pony up the funds to buy Matlab.  We're 
> already running Octave but they claimed they didn't know how to use 
> it.  Even after we showed them Matlab scripts that "just ran" on Octave.
> As for Fortran vs C, "real scientists program in Fortran.  Real Old 
> Scientists program in Fortran-66.  Carbon-dated scientists can still 
> recall IBM FORTRAN-G and -H."
> Actually, a number of our mathematicians use C for their codes, but 
> don't seem to be doing much more than theoretical codes.  The guys 
> who're wwriting/rewriting practical codes (weather models, 
> computational chemistry, reservoir simulations in solid earth) seem to 
> stick to Fortran here.
> gerry
> Jeff Layton wrote:
>> I hate to tangent (hijack?) this subject, but I'm curious about your 
>> class poll. Did the people who were interested in Matlab consider 
>> Octave?
>> Thanks!
>> Jeff
>> ------------------------------------------------------------------------
>> *From:* Joe Landman <landman at scalableinformatics.com>
>> *To:* Jeff Layton <laytonjb at att.net>
>> *Cc:* Gerry Creager <gerry.creager at tamu.edu>; Beowulf Mailing List 
>> <beowulf at beowulf.org>
>> *Sent:* Saturday, December 27, 2008 11:11:20 AM
>> *Subject:* Re: [Beowulf] Hadoop
>> N.B. the recent MPI class we gave suggested that we need to re-tool it
>> to focus more upon Fortran than C.  There was no interest in Java from
>> the class I polled.  Some researchers want to use Matlab for their work,
>> but most university computing facilities are loathe to spend the money
>> to get site licenses for Matlab.  Unfortunate, as Matlab is a very cool
>> tool (been playing with it first in 1988 ...) its just not fast.  The
>> folks at Interactive Supercomputing might be able to help with this with
>> their compiler.

More information about the Beowulf mailing list