# [Beowulf] failing forward and learning

Vincent Diepeveen diep at xs4all.nl
Sun May 12 08:35:19 PDT 2013

```Jim, there was a huge difference algorithmic, at least for the
applications i have been programming for,
when moving from 36 ports to 100+ ports.

Above 100 ports life really changes.

All kind of "found while i was in the bathroom" type algorithms no
longer work.

You really need a paper model then that addresses all worst cases.

When using up to 36 ports you can do with just 1 switch. That's never
a problem to design something for.
Above 100 you really can no longer afford to do things in a
centralized manner. So 1 node that's doing critical datastructures
and puts to work all other nodes.

That master slave model functions here easily up to 32 nodes (or
ports) yet somewhere far above that, say a 100+, it no longer works
well.
Of course for workloads that need to nonstop start and stop nodes.

On May 12, 2013, at 5:04 PM, Lux, Jim (337C) wrote:

> In the context of "setups for teaching networks and clusters"..
>
> If you wanted to experiment with various interconnects and
> topologies, what is the minimum number of nodes and "ports" on each
> node.
> For instance, if each node has 2 ports, you're constrained to lines
> and rings. 3 ports gets you a hypercube with 8 nodes, etc.
>
> That's if links are "point to point" and no "switch/bus"
> interconnects.
>
> Say you wanted to do a "fabric" like the surface of a toroid.  That
> implies a 2 D connection, either with 3 or 4 ports (depending on
> whether you are using triangles or squares).  How many to be
> useful: 16 gives you a 4x4 square grid, which could be connected as
> a torus.  9 gives you 3x3.
>
> And then, there's some interesting problems..
>
> Say your problem is "invert an N by N matrix".. An Arduino only has
> 2kbytes of RAM, so that sets one bound.  If you had 9 nodes in a
> line and you put the matrix to be inverted into a node at the end
> of the line, you need to figure out how to send pieces down the
> line (and the farther you go, the more "hops" and comm overhead).
> Or you could do some sort of systolic/pipeline, and do it in stages.
> (for instance, doing a 128 point FFT.. You do the butterflies with
> stride 2, then pass the outputs to the next node which does them in
> stride 4, etc. with all sorts of opportunities to efficiently
> arrange the algorithm and do some overlap.
>
> Then, you allow the pile o' nodes to be rearranged.. How does
> making it a ring change the algorithm? Or a 2D mesh? Can you design
> an algorithm which can work with either? Can you do it so it is
> adaptive, and "discovers" the connectivity.
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf

```