The FNN (Flat Neighborhood Network) paradox

Wed Feb 21 16:46:06 PST 2001

> 
> > Thanks for great idea - I will try to check this.
> >
> > I will try to prepare first 4 node beowulf tonight :)
> > I will put to every node 3 nics and I connect this without any switch.
> > After this I will put in every node only one nic and I compare the
> > benchmark results.
> 
> If I understand you, you are planning to build a hypercube.  I've played
> briefly with hypercubes, and concluded that (like the FNN) it is an idea
> that makes sense at all only in a market topology where switches are
> very expensive compared to NICs.
>
the BNN is not quite a hypercube. it is an N-circulant graph. to be
specific it is a G(8;+/-1,4).

http://cersa.admu.edu.ph/wyu/papers/net-csp.pdf

is a paper about it. it starts off from explaining what the other
topologies and moves towards the circulant graph topology. there are also
some results on this presentation. 

i was thinking first of using a butterfly caley-graph and even the classic
hypercube. but, my adviser suggested that we use circulant graphs because
he did a thesis on it and it was promising. 

> Remember, even for a tetrahedral four node beowulf, you must spend
> perhaps $20 (or more, of course) per NIC, or $60 per node.  Four nodes
> require 12 NICs and six cables for a total cost of around $270 or more.
> Yes there are cheaper NICs available (e.g. RTL8139's) but I've had
> terrible luck with these in the one hypercube I tried to build with them
> and still have a whole stack of them sitting in my junk hardware pile as
> a consequence.
> 
> A five port switch costs perhaps $70 (or less if you shop hard) -- eight
> port switches are as little as $80.  A switched port costs LESS than a
> NIC these days.  Admittedly these switches are likely to be
> store-and-forward with mediocre latency, but even better switches aren't
> that expensive anymore.  Add in only FOUR NICs and cables @$25 each, and
> you can get effortless connections for only $170 and have an extra port
> to connect up a head node or to another switch.
> 
> The other things to consider are:
> 
>    a MUCH more complicated topology.  Routing tables have to be built to
> manage each node's path to the other hypercubical nodes.

yup. routing tables are different per group. however, the pipelining is
wonderful with this topology as compared to channel-bonding.

>    a MUCH higher latency if you you go beyond the number of NICs your
> PCI bus can hold (also you have to turn on real routing and build really
> complicated routing tables).

well i can write a small script for generating routing tables for it. in
the paper we have a routing algorithm in place too.

>    a MUCH greater human cost to build it and maintain it (see
> "complicated" in the previous two entries).

because it is.

>    finally, in order to get the advantage of possibly aggregate
> bisection bandwidth, you mustn't be blocked at e.g. the kernel level so
> that you are effectively only using one NIC at a time anyway.  Expensive
> NICs may use DMA and a carefully written application (one with
> nonblocking I/O, for example) may then allow you to get some advantage
> in terms of aggregate bandwidth, but cheap NICs or careless applications
> probably won't.
> 
so so true.

> In my own experimental hypercube, aggregate internode performance was
> actually measurably worse than on a switch and attempting to talk on
> all channels in parallel actually destabilized the kernel of that day
> and caused systems crashes (early 2.2.x's).  Which made average
> internode performance REALLY bad when the crash recovery program was
> taken into account.  This could likely all have been resolved -- with a
> lot of work.  Instead I went out and bought a (then) $220 8 port switch
> and never looked back.
> 
true. we were actually looking into IP and kernel problems. however, one
glimmer of hope was the successful implementation of the FNN.

> In conclusion, one is as likely to get WORSE overall performance (unless
> one works very hard to tune up or uses a package like the channel
> bonding package where others have done the work for you), work MUCH
> harder (which is a real cost), and pay a lot more (which is a real
> cost).  Higher cost, less benefit.
> 
> The FNN solution shares some of the features of the hypercube -- if one
> has to buy the switches it is more expensive unless you're talking about
> a really big flat network.  One has to manage routing tables so
> complicated that only an optimization (e.g. simulated annealing,
> genetic) algorithm is capable of building them -- they are analogous to
> solving the N Queens problem in chess (in fact they are probably
> equivalent to the inverse of the N Queens problem or something like
> that).
> 
> One still has to worry about the kernel's ability to manage network
> transactions on 3 channels as efficiently as on one.  It may or may not
> lead to greater aggregate bisection bandwidth, depending on DMA and how
> the application is written and how reliably the NIC device driver is
> integrated with the kernel (interrupts on multiple devices using the
> same driver obviously have to be carefully and predictably resolvable).
> A NIC without DMA will obviously just block anyway until a transmission
> is completed -- two transmissions can never be resolved in parallel.
> 
> Channel bonding, on the other hand, solves a very different problem --
> how to get more raw internode bandwidth using any given kind of NIC.
> This is also "expensive" in human time and possibly system time, but it
> may be the cheapest (or only!) way to get internode bandwidth in certain
> exotic ranges.  If you have a parallel application that is not
> particularly sensitive to latency but needs huge interprocessor
> bandwidth to scale well, it can easily be your ticket to a functional
> design (for a largish but still COTS price).  If I recall what Don
> Becker once told me, your aggregate bandwidth increases, not quite
> linearly, for up to three NICs but the fourth (at the time of the
> discussion, not necessarily now) was either not worth it or actually
> decreased aggregate bandwidth a bit.
> 

--------------------------------------
William Emmanuel S. Yu
Ateneo Cervini-Eliazo Networks (ACENT)
email  :  william.s.yu at ieee.org
web    :  http://cersa.admu.edu.ph/
phone  :  63(2)4266001-5925/5904

Man is the best computer we can put aboard a spacecraft ... and the
only one that can be mass produced with unskilled labor.
		-- Wernher von Braun