BERT 77: Automatic Parallelizer

Douglas Eadline deadline at
Wed Mar 28 05:26:45 PST 2001

On Thu, 22 Mar 2001, Ole W. Saastad wrote:

> Hi,
> I have some questions about the Bert 77 Automatic Parallelizer
> from Paralogic. (
> At first sight it looks nice and shiny, but how is the users
> experience?

If you need to do more than the demo version let me know.

#define software_conversion_soap_box

In general FORTRAN conversion (any conversion) is a tricky thing.

First, as far as I know, there is no "silver bullet" tool
that will take any standard F77 code and produce a nice scalable
application for a parallel computer.  As we have found, almost
every code can be better parallelized by user intervention.
In this regard, BERT provides a PROCESS for doing a parallelization.

Second, it is important to understand the difference between
concurrent and parallel (concurrent parts of your program are parts of
the program that can run independently and parallel parts of
your program are concurrent parts that run on different processors)

Presumably you want your parallel parts to run faster than
sequentially. Concurrency does not imply parallel execution.
Executing concurrent parts of a code in parallel is function
of the machine. (i.e. it is an efficiency issue)
With respect to efficiency BERT does a good job.


> Many of my colleges run climate models with OpenMP on fast
> sequential machines and would consider MPI based clusters if
> they could get some help to make the transition.
> This tool might be useful as a start to switch from OpenMP
> or sequential code to MPI based code. The task of converting
> programs from sequential to parallel is not a trivial one and
> I am very interested in how well Paralogic's program perform.
We don't support OpenMP directives. But these directives can
aid in the placement of BERT directives.

> I test would be to crunch the g98 fortran source code through
> and see if it gave any reasonable results.

We have done some large codes - in excess of 100K lines.
First thing to remember is that the global analysis required
for big codes takes some time, so some time is need to
work with these codes. (i.e. it is not a simple point and
click afternoon experience)

> Have anyone of you done test with programs this size ?
> The examples shows tremendous speedup by splitting loops over
> many nodes, but in real lift things are different.

Of course we give some examples that show good speedup.
But, what is most interesting, some of the example show very
poor results on some of the example machine profiles.

However, the thing to remember, the true achievable speed-up for your
application is based the algorithm (no tool rewrites algorithms),
the properties of the machines (i.e. CPU, interconnect, compiler,
message passing API, etc.) and how efficiently you can map the
concurrent parts of your algorithm to a specific machine.
BERT attempts to help the user do this.

It should also be noted that BERT can provide "negative"
results (i.e. it is very hard to parallelize this application
and speedup will be minimal). Although, not good news, it is
an important data point about your application. There
have been people who have spent months converting a code
only to learn that is will not work well on a specific
parallel computer.


Paralogic, Inc.           |     PEAK     |      Voice:+610.814.2800
130 Webster Street        |   PARALLEL   |        Fax:+610.814.5844
Bethlehem, PA 18015 USA   |  PERFORMANCE |

More information about the Beowulf mailing list