[Beowulf] Java vs C++ for interfacing to parallel library

Sun Aug 20 13:08:35 PDT 2006

Robert G. Brown wrote:

[...]

> Java, octave, matlab, python, perl etc. are MUCH WORSE in this regard.
> All require NONTRIVIAL encapsulation of the library into the interactive
> environment.  I have never done an actual encapsulation into any of

Cant speak to Octave/Java/Matlab.  Python and Perl make this relatively
easy.  In Perl you have the Inline:: modules.  If you have installed
Inline::C, this example

	   #!/usr/bin/perl
 	   use Inline C;
           greet('Joe');

           __END__
           __C__
           void greet(char* name) {
             printf("Hello %s!\n", name);
           }

does this

	[landman at balto:~]
	105 >./inline.pl
	Hello Joe!

Obviously this is a trivial example, but if you create a reasonable set
of API's that you can express as we have indicated, even pass function
prototypes in using a header file, and a little config stuff at the
front end to give paths to libraries, this is not generally very hard.

Only when you have some ... odd ... structures or objects passing back
and forth which require a bit more work.

Python has similar facilities.  Generally speaking the dynamic
languanges (Perl, Python, Ruby) are pretty easy to wrap around things
and link with other stuff, as long as the API/data structures are pretty
clean.

> them, but I'll wager that it is really quite difficult because each of
> them has their very own internal data types that are REALLY opaque
> objects that bear little overt resemblance to the simple "all data
> objects can be viewed as a projection onto a block of memory with either
> typed or pointer driven offset arithmetic" view of data in C or for that
> matter C++ or Fortran (with slighly different projective views in both
> cases).

Hmmm... methinks you are thinking of strongly typed languages.  In
non-strongly-typed languages, internal data types are not usually opaque
unless they are objects with well formed classes/accessors behind them.
 Even then, the data stores tend to be quite flexible. In Perl (as an
example) you have several types.  SV's (scalar variables), AV's (array
variables), and HV's (hash variables), as well as pointers to the same.
 Notice that I didn't talk about ints, floats, etc.  Python has a
similar view, though it's data types include "lower level" types (ints,
floats, ...).  In Ruby, everything is an object.

> These languages typically permit you to allocate memory by just using a
> named variable.  This is marvelously convenient for an interactive
> environment -- it is marvelously expensive in terms of program
> efficiency because the underlying environment has to manage allocating
> the memory transparently extensibly (most of the languages permit you to
> allocate whole vectors or matrices of variables by just referencing
> them), tracking instances of the memory in code, and freeing the memory
> when it is no longer referenced or being used.  Conservatively, so that

Yes.  Absolutely.  It is easy, but easy things carry a price on them for
using them.

> they tend to keep things if there is ANY CHANCE of their ever being
> referenced, making them typically memory hogs almost as bad as a C
> program would be if every memory reference in the program was to static
> global memory -- no memory allocation or freeing at all, beyond whatever
> goes on stack/heap in the course of subroutine calls or internal
> function execution.  Complicated hashes or advanced list structures are

Well... doesnt keep everything.  Once the variables reference count
drops to zero, it can be safely reclaimed.  True in any dynamic
language.  And the source for massive amounts of memory leakage in long
running programs.  Closures are great.  Just don't allocate memory in them.

> used to keep the execution itself moderately efficient (but highly
> INefficient compared to a decent compiler with flat memory outlays).

Yup.  The price you pay for easy development is inefficient execution.
Dynamic languages are quite easy to develop in (generally) for their
intended usage.  I would not want to write a large scale CG solver in
Perl, a large scale nonlinear function solver in Python, or anything
remotely computationally intensive in Ruby.  That said, these languages
make controlling a run using code written in other languages quite easy.

My first experience with Perl (my gosh, more than a decade ago) was
wrapping a long calculation that we were doing to extract total energies
by autogenerating input files and preparing data sets.  Took me a few
hours to write the code, started it running, and two weeks later, we had
our results.  The perl code did not do the calculations, that was the
fortran code.  The perl code drove the fortran, extracted the relevant
information and wrote it to a file, and prepared the next input.  I
would hate to think how long the dynamics would have take had it been
written in anything other than fortran/c.

> The point being that you have to interface these opaque and not
> obviously documented data types to the C library calls.  This is surely
> possible -- it is how all those perl libraries, matlab toolboxes, java
> interfaces come about.  It will probably require that you learn WAY more
> about how the language itself is implemented at the source level than
> you are likely to want to know, and it is probably not going to be
> terribly easy...

Hmmm.  Did speak to Perl and python above.  Not sure how to do it with
Octave, but the Matlab folks have some good connectivity with external
libraries.  I dont know if it is easy to extend.  Java likes to talk to
Java.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452 or +1 866 888 3112
cell : +1 734 612 4615