[Beowulf] Java vs C++ for interfacing to parallel library

Jonathan Ennis-King Jonathan.Ennis-King at csiro.au
Sun Aug 20 23:20:36 PDT 2006

Hash: SHA1

>> My specific question was whether anyone out there was running parallel
>> codes either written completely in Java, or with Java wrappering some
>> big numerical library for the hard part. Are there any additional issues
>> with parallel performance, or it this just a subcase of Java-C
>> interfacing in a scalar setting.

>> The other option is the Unix-like strategy suggested by rgb, where for
>> example the computational part is completely written in C, and then the
>> pre and post-processing which benefit from a GUI are written in some
>> other language (e.g. Java), or strung together from other unix tools and
>> wrapper languages.
> Or a library-based strategy.  If you take your core code and develop a
> plain old reusable library out of it with a fairly straightforward API
> (which takes a lot of programming discipline and a certain amount of
> practice, I think, but isn't particularly difficult) then you can USE
> the library in a variety of UIs.  You can write a simple
> vanilla-variable C interface to embed in java or perl.  You can write a
> tty/ascii UI for command line users.  You can write a Gtk/Gnome
> interface using glade and callbacks for native X/linux.  The UI code may
> or many not be portable (tty/ascii I/O code using standard posix and
> libc and libm calls is pretty much lowest common denominator and tends
> to run compile and run "anywhere" including on Windows boxes with
> minimal tweaking, more complex UIs become successively harder to port to
> and less portable, as a rule) but the library itself, if written in
> "boring" C with a very clear and simple API, should be able to support
> all the UIs and interactive languages in a straightforward manner.

> That's actually what I'd recommend if you really want UI flexibility and
> code maintainability.  The latter is a very important consideration that
> hasn't been touched on yet.  If you actually WRITE your application
> integrated with a complex top level language/tool like perl, python,
> java then maintaining it becomes much more complicated (as I've learned
> very much the hard way, alas). If the basic sparse matrix routine
> library changes, it will likely lead to emergent bugs.  If your C/C++
> encapsulation of that routine changes or your application goals change
> over time along with your core code, it's another thing to debug.  If
> java (or any other UI base) changes, it's yet another thing to debug.
> If you get enough layers (and languages and data interfaces) in there,
> running down bugs and deciding "whose fault" they are gets to be really
> quite difficult.  Fault aside, just finding them and fixing them can
> become painful in the extreme.
> For that reason I think it is a really good idea to have as few distinct
> layers as possible to work with and to SEPARATE those so that they can
> be separately debugged with a clean layer (API) in between.  If your
> program is wrapped up in a library with a very simple C tty/ascii UI,
> you control it all and can be pretty certain that any errors you
> encounter are in YOUR code in ONE language and ONE data representation.
> In other words, you have a decent chance of efficiently debugging
> things.

The most primitive way to separate the layers is to have stand-alone
programs to do the various parts. This works easily in a cluster
environment, when I don't want to have interaction with the computation
when it's running. Then the interface is via the file formats (which is
a maintainence issue too), and programs that extract data from the
simulation output and put in a form for 3D visualization.

At the other extreme is a parallel programming environment such as
Cactus (http://www.cactuscode.org), which can make use of PETSC, but
only supports Fortran, C and C++. This looks quite promising for PDE
solvers.   It also apparently allows "remote steering" of computations.
Only the name is a little incongruous to me --  in the Australian
vernacular "cactus" means roughly "dead, non-functional" (applied to
objects), or "in a lot of trouble" (applied to people).

The most likely way ahead is that I'll code up some simple test versions
(e.g. interfacing different languages to the solver libraries), and see
how they perform - bearing in mind the wisdom of the elders derived from
this thread.

> OTOH if the only way you observe the failure is by accessing your core
> routines through a big, complicated GUI written in an entirely different
> language with its own data representation and with all sorts of stuff
> happening at the callback level and with multiple layers of event loops
> or even multiple threads (GUI-based programs have the nasty habit of
> blocking when they are in their core work loops UNLESS they are written
> with multiple threads, and multiple threads of course are far more
> complex and enable far more subtle bugs to surface) then you're looking
> at a LOT more work to debug any problems.  I personally would rather
> tattoo the windows logo on my left bicep with a dull needle and food
> coloring than mess with it at all.
> Worse still, if you write a GUI-based program WITHOUT a relatively clean
> API between the UI part and the work part and have NO other UI or
> encapsulation to work on just the actual work routines, then god help
> you if you ever have to debug the code OR rework the GUI.  Even "simple"
> stuff like adding a new graphical display of some result can become
> nightmarish if the UI and operational code are all entangled together.
> So even if you ultimately want to integrate with java, with R, with
> perl, or write a native GUI for some platform or another, I'd strongly
> suggest writing your core code as a de facto library with its own
> #include files that define the shared interface and all externally
> visible data structures.  Develop this code with a simple ASCII front
> end -- basically a command line parser to input program parameters and
> perform any needed initialization, a work routine that takes the input
> data and calls the core library subroutine(s) that do the required work
> and produce a desired result (and does no significant work itself), and
> a minimal output layer that pulls the result out of the standard
> interface variables altered or returned by the library calls according
> to the program API and dumps it onto either stdout or into a
> command-line specified file where it can be verified for selected input
> test data.
> With this minimal encapsulation and debugging system, you can then do
> whatever you like with the core work routines, quickly and easily.  For
> example it is absolutely straightforward to replace the command line
> interface with a glade-constructed set of GUI input widgets, the work
> interface with a callback on a "run" button, and to add whatever kind of
> output interface you like (graphical or otherwise).  Or you can fancy up
> your command line interface and wrap it up in python or perl.  Or you
> can modify the command line interface so that it is suitable for turning
> the library calls into java or perl subroutine calls and obtaining the
> input from java variables and delivering the output back to java
> variables.  If you always take care to maintain your minimal C/tty UI
> along with the library, you can easily isolate any problems that emerge
> to JUST one layer in the initialization, execution, postprocessing,
> presentation sequence, in particular keeping the execution part isolated
> from the rest (that are more likely to be tied to some particular UI
> environment with quirks, an API and data representation, and even a
> language of its own).
>    rgb
- --
  Jonathan Ennis-King
  email: Jonathan.Ennis-King at csiro.au
  post: CSIRO Petroleum, Private Bag 10, Clayton South, Victoria, 3169,
  ph: +61-3-9545 8355 fax: +61-3-9545 8380

Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org


More information about the Beowulf mailing list