[Beowulf] Hadoop
Gerry Creager
gerry.creager at tamu.edu
Fri Jan 2 04:30:16 PST 2009
Gus Correa wrote:
> Hello Beowulfers
>
> (This thread should be renamed "Matlab and Octave".)
>
> Matlab is the "lingua franca" for computing among students and young
> scientists,
> at least in Earth Sciences (solid earth, atmosphere/oceans/climate,
> geochemistry, etc), as I observe it here.
> A number of our students come from Physics, Chemistry, Biology, etc,
> hence the trend is probably more widespread.
> Some can get by graduate school even with Excel only.
Hrmph. I know of one "numerical analysis" class taught in Excel. I've
more confidence in Matlab for that, but that's an aside.
Do me a favor and say "Hey!" to Rob Arko, please.
> As others observed on this thread, Matlab is a great prototyping tool,
> which makes it very attractive.
> Integrated environment, with GUI, editor, online help,
> programming examples and tips, and instant visualization of results,
> is yet another high point of Matlab.
> For most people this type of environment is not only convenient,
> but also addictive.
> I like Octave, the command line is virtually identical to Matlab,
> but couldn't get all these GUI-sh bells and whistles to work in Octave.
> Because of this dependence, our Observatory has a Matlab site license.
I've long suspected that the GUI is the opiate of choice in these cases.
Especially when one thinks in terms of the reluctant... or
untrained... programmer.
> Moreover, several top numerical models in oceans and climate depend
> heavily on Matlab scripts for post-processing and data analysis.
There's also lots of work going on in ATMO, including radar analysis.
However, it's my opinion, having been involved with the products of a
radar class that was taught here (the prof left for other pursuits) that
he did the kids a disservice by making things easier with Matlab: They
knew which scripts and building blocks to use, but had little concept of
the signal analysis they'd invoked, nor of the underlying
science/engineering of their code or of a doppler radar system.
> This may be the case in other areas too.
> For instance, not long ago I saw several job ads for Matlab programmers in
> the Princeton Plasma Physics Lab.
> As Matlab scripts and tasks get bigger and bigger, the positive feedback
> created the need and market for parallel versions of Matlab.
>
> In many cases Matlab is the only programming environment that
> science and engineering students came across with.
> It is introduced on Linear Algebra, Numerical Analysis, Signal Processing,
> and other classes, and it sticks, it settles down.
> As James, Gerry and others observed,
> a lot of people only need to do prototyping anyway: proof of concept,
> one-time calculations of modest size,
> and for this Matlab works very well.
>
> Matlab's cavalier approach to memory management -
> or perhaps the inadvertent cavalier approach to Matlab by naive users -
> may be the main cause for the scaling problems.
> Most failures I've seen in Matlab scripts come from exhaustion of
> computer resources, particularly memory.
> Even when you free memory judiciously, problems may arise.
> This happens here very often with people trying to do, say,
> singular value decomposition or principal component analysis
> on huge and dense matrices / datasets, etc.
>
> In the old days of punched cards, Fortran was part of the
> engineering and scientific training.
> Fortran was king in Intro to Computers classes or similar.
> That is no longer true.
> Fortran lost its charm and status among computer scientists
> (even John Backus abandoned it).
> In addition, today most college scientific curricula take for granted
> the computer literacy of its freshmen students.
> A mistake, I think.
> (A few students are great hackers, but most only know Skype,
> Facebook, MS Word.)
I'll second this, as well. Fortran, as an intro language, has fallen
from favor, often because programs DO believe that kids, today, are
competent with computers. Since I've yet to see a kid program a
velocity decoding application in iTunes, or compute a Fibonacci series,
I suspect their beliefs misplaced. My students, unless they can prove
via both transcript and actual code writing a previous exposure and
degree of competency, have to take two semesters of programming.
Depending on what they're working on, I recommend C, Fortran (we still
have a Computer Science intro course therein) or (shudder) Java.
Despite having Stroustrup on faculty (and actively teaching) I rarely
recommend C++ to 'em as I just want them competent, not esoteric.
I might add that any working with me in my lab, either on their degrees
or as research assistants, get a healthy dose of Linux and BSD, and
appropriate open source tools.
> I think Intro to Computers courses would continue to be useful for
> engineers and science majors.
> (Not for prospective computer scientists, of course, who need much more
> than that.)
> These courses should include basic Unix/Linux literacy, shell scripting
> (or Perl, or Python), the old-fashioned but effective principles of
> "structured programming"
> (call it "modular programming" to make it palatable),
> and the rudiments of a language of choice.
> This language may be Fortran, which continues to be the dominant one
> in science and engineering code, or perhaps C.
Strongly agree.
> However, when these Intro to Computers courses exist,
> they try to teach Java, C++, etc, often using Microsoft Studio,
> or another programming environment that traps the user,
> and doesn't give him/her the required computer craftsmanship (and autonomy)
> for their professional life.
All too true. When I interview research assistant (grad student labor)
candidates, I get tons who claim proficiency in MS Office, as if that's
a programming toolset, as well as Studio. As I've not installed Mono, I
don't even talk to those who claim all they know is .Net.
> For most prospective engineers and general scientists a computer
> is more of a tool than a theoretical model.
> OO-languages, Turing machines, cellular automata,
> make nice class discussion topics,
> but can't replace the development of basic computer literacy and skills.
>
> My $0.02.
Gus, good points, all. Development and evolution of good skills is a
key element, in my mind, in developing our students and young
scientists, and in preparing them for their futures.
gerry
> Gerry Creager wrote:
>> OUR users are willing to pony up the funds to buy Matlab. We're
>> already running Octave but they claimed they didn't know how to use
>> it. Even after we showed them Matlab scripts that "just ran" on Octave.
>>
>> As for Fortran vs C, "real scientists program in Fortran. Real Old
>> Scientists program in Fortran-66. Carbon-dated scientists can still
>> recall IBM FORTRAN-G and -H."
>>
>> Actually, a number of our mathematicians use C for their codes, but
>> don't seem to be doing much more than theoretical codes. The guys
>> who're wwriting/rewriting practical codes (weather models,
>> computational chemistry, reservoir simulations in solid earth) seem to
>> stick to Fortran here.
>>
>> gerry
>>
>> Jeff Layton wrote:
>>> I hate to tangent (hijack?) this subject, but I'm curious about your
>>> class poll. Did the people who were interested in Matlab consider
>>> Octave?
>>>
>>> Thanks!
>>>
>>> Jeff
>>>
>>> ------------------------------------------------------------------------
>>> *From:* Joe Landman <landman at scalableinformatics.com>
>>> *To:* Jeff Layton <laytonjb at att.net>
>>> *Cc:* Gerry Creager <gerry.creager at tamu.edu>; Beowulf Mailing List
>>> <beowulf at beowulf.org>
>>> *Sent:* Saturday, December 27, 2008 11:11:20 AM
>>> *Subject:* Re: [Beowulf] Hadoop
>>>
>>> N.B. the recent MPI class we gave suggested that we need to re-tool it
>>> to focus more upon Fortran than C. There was no interest in Java from
>>> the class I polled. Some researchers want to use Matlab for their work,
>>> but most university computing facilities are loathe to spend the money
>>> to get site licenses for Matlab. Unfortunate, as Matlab is a very cool
>>> tool (been playing with it first in 1988 ...) its just not fast. The
>>> folks at Interactive Supercomputing might be able to help with this with
>>> their compiler.
>>>
>>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
--
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
More information about the Beowulf
mailing list