[Beowulf] Hadoop

Gerry Creager gerry.creager at tamu.edu
Fri Jan 2 04:30:16 PST 2009


Gus Correa wrote:
> Hello Beowulfers
> 
> (This thread should be renamed "Matlab and Octave".)
> 
> Matlab is the "lingua franca" for computing among students and young 
> scientists,
> at least in Earth Sciences (solid earth, atmosphere/oceans/climate,
> geochemistry, etc), as I observe it here.
> A number of our students come from Physics, Chemistry,  Biology, etc,
> hence the trend is probably more widespread.
> Some can get by graduate school even with Excel only.

Hrmph.  I know of one "numerical analysis" class taught in Excel.  I've 
more confidence in Matlab for that, but that's an aside.

Do me a favor and say "Hey!" to Rob Arko, please.

> As others observed on this thread,  Matlab is a great prototyping tool,
> which makes it very attractive.
> Integrated environment, with GUI, editor, online help,
> programming examples and tips, and instant visualization of results,
> is yet another high point of Matlab.
> For most people this type of environment is not only convenient,
> but also addictive.
> I like Octave, the command line is virtually identical to Matlab,
> but couldn't get all these GUI-sh bells and whistles to work in Octave.
> Because of this dependence, our Observatory has a Matlab site license.

I've long suspected that the GUI is the opiate of choice in these cases. 
  Especially when one thinks in terms of the reluctant... or 
untrained... programmer.

> Moreover, several top numerical models in oceans and climate depend
> heavily on Matlab scripts for post-processing and data analysis.

There's also lots of work going on in ATMO, including radar analysis. 
However, it's my opinion, having been involved with the products of a 
radar class that was taught here (the prof left for other pursuits) that 
he did the kids a disservice by making things easier with Matlab: They 
knew which scripts and building blocks to use, but had little concept of 
the signal analysis they'd invoked, nor of the underlying 
science/engineering of their code or of a doppler radar system.

> This may be the case in other areas too.
> For instance, not long ago I saw several job ads for Matlab programmers in
> the Princeton Plasma Physics Lab.
> As Matlab scripts and tasks get bigger and bigger, the positive feedback 
> created the need and market for parallel versions of Matlab.
> 
> In many cases Matlab is the only programming environment that
> science and engineering students came across with.
> It is introduced on Linear Algebra, Numerical Analysis, Signal Processing,
> and other classes, and it sticks, it settles down.
> As James, Gerry and others observed,
> a lot of people only need to do prototyping anyway: proof of concept, 
> one-time calculations of modest size,
> and for this Matlab works very well.
> 
> Matlab's cavalier approach to memory management -
> or perhaps the inadvertent cavalier approach to Matlab by naive users -
> may be the main cause for the scaling problems.
> Most failures I've seen in Matlab scripts come from exhaustion of
> computer resources,  particularly memory.
> Even when you free memory judiciously, problems may arise.
> This happens here very often with people trying to do, say,
> singular value decomposition or principal component analysis
> on huge and dense matrices / datasets, etc.
> 
> In the old days of punched cards, Fortran was part of the
> engineering and scientific training.
> Fortran was king in Intro to Computers classes or similar.
> That is no longer true.
> Fortran lost its charm and status among computer scientists
> (even John Backus abandoned it).
> In addition, today most college scientific curricula take for granted
> the computer literacy of its freshmen students.
> A mistake, I think.
> (A few students are great hackers, but most only know Skype,
> Facebook, MS Word.)

I'll second this, as well.  Fortran, as an intro language, has fallen 
from favor, often because programs DO believe that kids, today, are 
competent with computers.  Since I've yet to see a kid program a 
velocity decoding application in iTunes, or compute a Fibonacci series, 
I suspect their beliefs misplaced.  My students, unless they can prove 
via both transcript and actual code writing a previous exposure and 
degree of competency, have to take two semesters of programming. 
Depending on what they're working on, I recommend C, Fortran (we still 
have a Computer Science intro course therein) or (shudder) Java. 
Despite having Stroustrup on faculty (and actively teaching) I rarely 
recommend C++ to 'em as I just want them competent, not esoteric.

I might add that any working with me in my lab, either on their degrees 
or as research assistants, get a healthy dose of Linux and BSD, and 
appropriate open source tools.

> I think Intro to Computers courses would continue to be useful for
> engineers and science majors.
> (Not for prospective computer scientists, of course, who need much more 
> than that.)
> These courses should include basic Unix/Linux literacy, shell scripting 
> (or Perl, or Python), the old-fashioned but effective principles of 
> "structured programming"
> (call it "modular programming" to make it palatable),
> and the rudiments of a language of choice.
> This language may be Fortran,  which continues to be the dominant one
> in science and engineering code, or perhaps C.

Strongly agree.

> However, when these Intro to Computers courses exist,
> they try to teach Java, C++, etc, often using Microsoft Studio,
> or another programming environment that traps the user,
> and doesn't give him/her the required computer craftsmanship (and autonomy)
> for their  professional life.

All too true.  When I interview research assistant (grad student labor) 
candidates, I get tons who claim proficiency in MS Office, as if that's 
a programming toolset, as well as Studio.  As I've not installed Mono, I 
don't even talk to those who claim all they know is .Net.

> For most prospective engineers and general scientists a computer
> is more of a tool than a theoretical model.
> OO-languages, Turing machines, cellular automata,
> make nice class discussion topics,
> but can't replace the development of basic computer literacy and skills.
> 
> My $0.02.

Gus, good points, all.  Development and evolution of good skills is a 
key element, in my mind, in developing our students and young 
scientists, and in preparing them for their futures.

gerry


> Gerry Creager wrote:
>> OUR users are willing to pony up the funds to buy Matlab.  We're 
>> already running Octave but they claimed they didn't know how to use 
>> it.  Even after we showed them Matlab scripts that "just ran" on Octave.
>>
>> As for Fortran vs C, "real scientists program in Fortran.  Real Old 
>> Scientists program in Fortran-66.  Carbon-dated scientists can still 
>> recall IBM FORTRAN-G and -H."
>>
>> Actually, a number of our mathematicians use C for their codes, but 
>> don't seem to be doing much more than theoretical codes.  The guys 
>> who're wwriting/rewriting practical codes (weather models, 
>> computational chemistry, reservoir simulations in solid earth) seem to 
>> stick to Fortran here.
>>
>> gerry
>>
>> Jeff Layton wrote:
>>> I hate to tangent (hijack?) this subject, but I'm curious about your 
>>> class poll. Did the people who were interested in Matlab consider 
>>> Octave?
>>>
>>> Thanks!
>>>
>>> Jeff
>>>
>>> ------------------------------------------------------------------------
>>> *From:* Joe Landman <landman at scalableinformatics.com>
>>> *To:* Jeff Layton <laytonjb at att.net>
>>> *Cc:* Gerry Creager <gerry.creager at tamu.edu>; Beowulf Mailing List 
>>> <beowulf at beowulf.org>
>>> *Sent:* Saturday, December 27, 2008 11:11:20 AM
>>> *Subject:* Re: [Beowulf] Hadoop
>>>
>>> N.B. the recent MPI class we gave suggested that we need to re-tool it
>>> to focus more upon Fortran than C.  There was no interest in Java from
>>> the class I polled.  Some researchers want to use Matlab for their work,
>>> but most university computing facilities are loathe to spend the money
>>> to get site licenses for Matlab.  Unfortunate, as Matlab is a very cool
>>> tool (been playing with it first in 1988 ...) its just not fast.  The
>>> folks at Interactive Supercomputing might be able to help with this with
>>> their compiler.
>>>
>>
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University	
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843



More information about the Beowulf mailing list