Apps & Design

Alan Ward award at mypic.ad
Tue Jul 4 02:01:03 PDT 2000


The distinction I wanted to make was between two
kinds of situations.

The first is when you can organize the individual calculation
goals as an oriented graph with no cycles. As you are all
aware, this can be parallelized either running concurrent
processes through separate CPUs, or serially dependent
processes through a pipeline, or a mix thereof.

This means that, as Robert Brown pointed out, you can 
parallelize at (goal =book) level, and probably also at 
(goal = paragraph) level - though I would have some doubts 
on this latter for a badly written text! :-(

But what about each individual paragraph? Many paragraphs 
belong to the second kind of situation: when the graph 
has cycles. Maybe an example would help. Say we have three 
consecutive sentences in a paragraph:

A = "He's driving a car."
B = "It's always a Mercedes."
C = "John always buys them german."

Translate these into a language where pronouns carry no 
gender - such as Chinese. You just knocked out the "He" 
in A, "It" in B, and obtained something like:

A = "(he/she) drives car."
B = "Always is Mercedes."
C = "always John german buy them."

I think our Chinese readers will agree this is more or less
a litteral translation of the Chinese equivalent.

Now translate them back to a language with gender in both
singular and plural - such as Spanish.

To get the gender of "(he/she)" back into A , you need the results of C. 
To get "them" into C ("los" or "las"?), you need B. 
To get the gender of "Mercedes" in B ("un" or "una"?), you need to resolve
A. 
Circular deadlock.

Try the same exercise with Sheila and her Suzuki motocycle.

Any program that treats this problem has a granularity that
must encompass al least all three sentences together.

Hope this clarifies my position. If you think the example is
fanciful - just let me say I have translated worse texts,
some of them legal and notarized. 

PS: My current interest in parallel programs is their 
application to run genetic algorithms. 

Best regards,
Alan Ward

----------
De: Kragen Sitaker <kragen at pobox.com>
A: beowulf at beowulf.org
Asunto: Re: Apps & Design
Fecha: dilluns, 3 / juliol / 2000 21:08

Alan Ward writes, quoting Gregory Warnes:
> > Careful.  You are equating "the problem consists of independent pieces"
> > with parallelizable. 
> 
> No. I'm saying that parallelizable implies we have to be able to split it
> into independent pieces, but not the other way round.

If, by "independent", you mean that there is no serial dependency ---
where the outputs of one piece are the inputs of the other --- then you
are right.  It sounds like you mean something different, though.  Can
you explain what you mean?

> Since the translation
> problem cannot be split into "large" independent pieces, it follows that
> the Beowulf approach in not efficient in this case. 

>From looking at SYSTRAN's output, it appears that the translation of
one section of a document has little effect on the translation of other
sections.

However, as Gregory Warnes pointed out, translating multiple documents
is trivially parallel.

> Maybe at lower level (e.g. pattern searching, looking up words in a
table)
> the problem is parallelizable. But then we need access from all
processing
> nodes to all the data, and the bandwidh necessary goes up. This would be
> more for a multiprocessor machine; say 1024 CPUs sharing RAM through
> some sort of multidimensional lattice structure. Not off-the-shelf
> components.

1024 CPUs sharing RAM is a bad idea, unless possibly they're Cray MTA
CPUs.

(What kinds of parallel programs have you written, by the way?)

> > Stating that natural language processing is inherently non-parallel
> > ignores the fact that the only system known to correctly process
natural
> > language (the brain) is made up of a lot of independent processing
units!
> 
> But do these units work in parallel? My take is they work more as a mesh.
> Then again, I'm not a biologist and can't prove this impression.

In the sense that we use the term in parallel computing circles, they
work in parallel.  This simply means that more than one of them is
working at any given time.  Each individual neuron can do nothing
interesting on a time-scale of less than ten milliseconds or so.

-- 
<kragen at pobox.com>       Kragen Sitaker     <http://www.pobox.com/~kragen/>
The Internet stock bubble didn't burst on 1999-11-08.  Hurrah!
<URL:http://www.pobox.com/~kragen/bubble.html>
The power didn't go out on 2000-01-01 either.  :)






More information about the Beowulf mailing list