Neural Network applications using Beowulf

Wed Nov 20 01:49:59 PST 2002

On Wednesday 20 November 2002 04:42 pm, Robert G. Brown wrote:
>
> Hmm, I should probably take this offline, but I'll risk one more round
> of replies on list since the topic does come up from time to time and is
> not, I think, devoid of interest:-)
>
> Let me itemize a few points:
>
>   a) NN's (both MLFFBP NNs and their more exotic cousins) are extremely
> useful in lots of problems.  As generalized multivariate nonlinear
> function approximators, they can do things that are remarkably difficult
> to do in other bases, such as infer/fit high dimensional (probability)
> distribution functions from noisy data where the distribution(s) have
> highly nontrivial high dimensional correlations.  In fact, for some of
> these problems they are one of the only games in town -- one can
> sometimes do better with specific problems if one has a Statistics
> Ph.D., armed with advanced tools, study the problem for a year and try
> to solve it, but NNs tend to produce comparable results with a lot less
> expertise because they don't care about things like covariance in the
> inputs.
>

True, NNs are useful. But I think they actually look more appealing to people 
outside computer science. A good reason is that an ANN is more powerful than 
all traditional statistical algorithms, which are the standard tools. 
However, in machine learning there are several non-linear high-dimensional 
learning algorithms for classification, clustering and regression (and those 
are not the only kinds of problems we have! see for instance "outlier 
detection", "deviation detection", "summarization"). ANN is just one of them, 
and it happens to solve only a subset of the problems well enough. In many 
knowledge discovery applications ANN or any other machine learning/data 
mining algorithms form only part of a puzzle and there really are a myriad of 
algorithms that you could use for realizing your learning model. BTW, almost 
all major problems require something more than traditional statistics, so I 
think that makes some sense. If you would require more information on the 
scope of work in practical applications I can give you a few references. One 
project I routinely mention is SKYCAT from NASA.

>   b) NN's can be worth a lot of money.  Many of the problems they can be
> applied to in pattern recognition/predictive modelling alone can yield
> really significant marginal gains over e.g. outer-product logistic
> regression (totally abused as the universal hammer for multivariate
> predictive models by corporations and health researchers alike -- turn
> this dial, sweep up this probability in a smooth S... :-p
>

See above.

>   c) There are two distinct components of the applications of NN's to
> real world problems.  Both can be parallelized, but quite differently.
>
>     i) Application of an existing trained net to a large dataset for
> e.g.  classification.  This can be parallelized trivially by
> distributing the dataset (or partitions thereof) and the net in
> question, although NN application is typically so fast that it is hardly
> worth it.  More likely useful in a HA situation, where a web site was
> attempting real-time classification of lots of incoming threads of data
> on a farm of classifier nodes, or in a simple distributed resource
> situation, where the network is installed on all the desktop systems to
> be applied by hand in parallel by worker bees seeking to classify
> particular datasets of interest to them.
>

Application, for today, is not fitting for HPC, I think I already implied that 
in my prior post.

>    ii) Creation of a trained network from data.  This is where things
> get interesting, as networks are complex systems (in the literal, Santa
> Fe institute sense) and hence highly nontrivial to create with high
> nontriviality that scales amazingly poorly with input dimension.
>

There are lots of parallel learning algorithms since that is a computationally 
expensive part in the process. You have an exponential time complexity in the 
case of ANNs which makes it a poor choice, though.

>   d) To my direct experience, the ONLY way to get good results in
> problems with high input dimensionality is to eschew anything like
> "simple" back propagation/regression or gradient descent for network
> creation and insert a preliminary genetic algorithm optimization step.
> GA's are LIKEWISE complex systems (which routinely and easily become
> trapped in "inbred" locally stable optima far from -- and not in the
> same valley as -- the real optima).
>

There are several papers in the literature on hybrid systems. Though it is not 
like "GA and NN" are brothers of any sort. They have nothing to do with one 
another, although that is not presented factually by the "connectionist 
camp". There is a subtle and meaningless implication in some literature that 
you have to use GAs and NNs together; and nothing else.

I can use a GA to evolve the topology, initial weights, or other parameters of 
an ANN or use the output of an ANN to GA but there are several other ways to 
achieve hybrid systems. For example, I can use an ANN as a heuristic repair 
device in a combinatorial optimization algorithm. Or I can use GAs to improve 
the output of any learning algorithm. I can use a fuzzy algorithm then to 
reduce to a sensible rule set. The range of possibilities is infinite.

What a machine learning researcher has in mind, though, is finding the _best_ 
way to do it. Maybe C4.5 will do much better classification than a 
GA/NN/fuzzy hybrid system, it all depends on the problem. No free lunch.

> FWIW, I have read some things that at least suggest that at least
> trivial GA's (the natural selection part) are used biologically in the
> development of real wetware -- used/successful pathways tend to be
> selected for and reinforced, unused pathways are eliminated or recycled,
> although there may not be crossover and so forth.
>

No crossover = no GA. That happens to be the one of the key ideas in GA.

> So when I talk about NN's being interesting, I mean specifically that
> training/creating NNs for problems with high nontrivial input
> dimensionality, where "high" means literally "as high as we can
> currently afford to compute at all" (an algorithm dependent,
> implementation dependent, Moore's Law dependent limit), fronted by a GA
> that is ITSELF generally the rate limiting step (although conjugate
> gradient "finishing" of the best network found with the GA can take a
> VERY long time as the valley's one is descending tend to be shallow,
> tortuous in many dimensions, and long) is an interesting problem where
> clever parallelization could concievably lead to real advances in both
> the largest problems we can tackle at all AND in the time required to
> solve problems that aren't this big but where an answer is valuable in
> as little time as possible.
>

As I said I know no such problem, but it would be great if I found one that 
could indeed benefit from a parallel ANN training algorithm. 

> Parallelizing the GA is one area where I think really interesting things
> can be done.  The (general/simple) GA suffers from a number of known
> problems -- the sorting process is limited to "gene" values available in
> the initial pool plus whatever you can introduce by mutation and local
> gradient methodologies without destabilizing the algorithm altogether;
> with small populations it converges rapidly to an inbred population that
> is pretty much the best it can do, subject to aforementioned mutational
> or gradient descent drift, with large populations it takes a long time
> to converge at ALL; plus a slew of technical details associated with the
> optimal ordering of genes in a linear vector, with various crossover
> schema, even with the optimal number of sexes.
>

Well. Lots of people have already parallelized GAs I think. Since "GA" is just 
an "idea" and not a specific algorithm, it's hard to talk about parallel GAs 
in a general way. I think the parallel algorithm depends on your gene design 
and fitness criteria a lot.

> Nature, of course, does genetic optimization in stochastic parallel all
> the time, so GA's are inherently parallelizable, and developing them IN
> a parallel environment provides one with the opportunity to see if
> algorithms that are "only" efficient in such an environment, with a full
> degree more of algorithmic complexity than a serial GA, can break
> through some of the limitations of simple GAs and extend the
> aforementioned limits in useful and valuable ways.  This would benefit
> NNs as well as many other fields that derive useful local optimax from a
> GA.
>

Well, ANNs may stumble on local optima but that is not true for every learning 
algorithm, mind you. :)

As I said before it's a routine thing to write parallel GA systems, with 
questionable value. In many cases, it is like doing a random search. That 
last sentence was underlined, btw :)

> Finally, there are still SOME questions associated with high dimensional
> NNs (including mere classifiers, but certainly extending toward more
> interesting constructs that simulate "intelligence") that literally
> cannot be sensibly answered -- yet -- because of their complexity and
> the time required to build them.  I think that it is very much
> worthwhile to meditate upon these questions, as well, while mucking
> about with parallel GAs per se.  At the very least, it might be
> worthwhile to consider the ratio of computation to communication for
> various arrangements of e.g. parallelized layer serialization.  If one
> thinks that real brains (not unreasonably) both do lots of things in
> parallel AND have all sorts of feedback loops, it could easily be
> worthwhile to implement models in "hardware" with task/layer specific
> nodes even at the expense of serial implementation speed, especially
> when faced with the daunting task of simultaneously training all the
> "independent" layer nodes with inputs that are also varying as the
> preceding layers are being optimized.
>

I think we should avoid talking about a correlation between ANNs and human 
brains since the most popular learning algorithms are, strictly speaking, 
biologically implausible. I think you can tell why easily as a physicist.

As a matter of fact, that is a very hideous failure of the "connectionist 
camp".

> I don't think it is quite time to say that we know all here and that
> there is nothing "interesting" left to do -- I think that it is truer to
> say that we've discovered the "transistor" -- a simple NN in fact can be
> a very good analog of a transistor and could even doubtless be trained
> to precisely emulate a transistor's I/O characteristics:-) -- but are
> totally clueless about how to wire lots of transistors together to get
> useful work done on complex tasks.
>

That's what the EE people thought. They thought "we now have the intelligent 
transistor, now all we have to do is to make a chip out of it". So things 
like associative memory were indeed thought to be "chip designs" that would 
be intelligent counterparts to a RAM chip. A Hopfield network would be an 
"intelligent memory" and a MLFF network would be an "intelligent processor".  
So you put them in a box, connect them on an "AI board" and there you have an 
AI.

It turns out that is not the case, though.

> The CPU on which I'm typing this reply IS isomorphic to a particular NN.
> It is an NN that so far we have proven utterly incapable of "growing" --
> it has to be microscopically designed and is built up of concatenations
> of individually designed functional components (where we cannot yet
> "grow" a NN that replicates any but the simplest of the components).
> And, as you've noted, the in-parallel, genetically optimized NN that I'm
> using to DIRECT my fingers as they type was grown, and yet is (in
> principle, not practice:-) capable of designing the CPU.
>

Well as I said before there is a difference between an ANN and an NN that 
forbids us from devising such comparisons to biology :)

ANNs are Turing-complete, so they can compute any integer function. That is 
great. But that doesn't mean they are the "one true intelligent computer". We 
don't have the theory yet.

Personally, I think our brains, logically, operate in a way more like a GA and 
less like an ANN; however I'm also sure that it is unlike anything we have 
come up with. 

> Surely it might be capable of improving and parallelizing algorithms we
> might use to build NN's and organize them so that they can accomplish
> complex tasks, without having to basically "design the CPU" by hand...
>
> Then there is the issue of time and dynamic networks.  "Most" NN's in
> application are utterly, boringly, Markovian in both their inputs and
> their processing.  Yet in reality, nearly anything of interest (such as
> the good old CPU and my wetware) is horribly, overwhelmingly
> non-Markovian in operation, with both input and state "memory" both
> internal and external that totally alters output in multiple
> applications of the "same" local time input.  There seems to me to be an
> inherently parallel, or maybe loop-serial character to the training
> process of non-Markovian networks, where one may have a non-Markovian
> sequence in the training cycle that can be performed in serialized
> parallel (node A optimizes a layer passed to node B, which optimizes a
> layer passed to node C..., where B, C ... have outputs that are fed back
> to node A for the next full pass of optimization, all in parallel).
>

For conventional networks you need to implement a typical pipeline 
parallelization, as in several standard parallel algorithms. That is obvious.

My idea about a parallel ANN training algorithm was that different topologies 
would be better suited to parallelization. That belief was reinforced when I 
read that real neural nets had interesting topologies such as conical 
surfaces, etc.

Regards,

-- 
Eray Ozkural (exa) <erayo at cs.bilkent.edu.tr>
Comp. Sci. Dept., Bilkent University, Ankara
www: http://www.cs.bilkent.edu.tr/~erayo  Malfunction: http://mp3.com/ariza
GPG public key fingerprint: 360C 852F 88B0 A745 F31B  EA0F 7C07 AE16 874D 539C