[freenet-devl] Looking for a research project?

Mon Apr 16 15:56:00 PDT 2001

I'm CCing this to the Beowulf mailing list, as there are people there who
will know a lot more about this then me.

For the benifit of the Beowulf people:  On the Freenet mailing list (see
www.freenetproject.org if you're unfamiler with Freenet), we're talking
about doing a research project involving Freenet, particularly doing
simulations of the network.

----- Original Message -----
From: "Ian Clarke" <ian at hawk.freenetproject.org>
To: <devl at freenetproject.org>
Sent: Monday, April 16, 2001 12:58 PM
Subject: Re: [freenet-devl] Looking for a research project?

>On Mon, Apr 16, 2001 at 02:21:32PM -0700, Timm Murray wrote:
>> Anyone else around here an armchair Chaos Theorist (or even a real Chaos
>> Theorist)?  If the simulations don't incorperate the size-bias they will
be
>> fately flawed.
>
>It depends what you are using the simulations for.  We have primarily
>used them to test the scalability of the general system, I don't think
>the size bias would affect this, it will really effect document's
>longevity in the system, which we haven't measured yet.
>
>> Freenet is a natural system, and thus any simuations are
>> susceptable to the Butterfly Effect.
>
>Not all complex systems are chaotic.  I disagree that Freenet is a
>chaotic system since it doesn't seem to be particularly reliant on the
>initial configuration of the network (a chaotic system would be).
>
>> Freenet is a natural system, and thus any simuations are
>> susceptable to the Butterfly Effect.
>
>Not all complex systems are chaotic.  I disagree that Freenet is a
>chaotic system since it doesn't seem to be particularly reliant on the
>initial configuration of the network (a chaotic system would be).

I would argue that the initial configuration of the /network/ isn't chaotic,
but the path a document takes upon initial insertion and the subsequent
requests are.  You were right to say that it depends on what part of Freenet
you're simulating.

I think the best simulation you could have would be a Beowulf cluster with
each node in the cluster running multiple Freenet nodes (all on a diffrent
port and data store).  If Fred (for the Beowulf people: Fred is the Freenet
Refrance program, which is written in Java) can someday work well with a GCJ
compile, you could put something like 20 Freenet nodes on a single Beowulf
node, even with each node being only a 486.  Note that our little
Freenet-in-Beowulf never talks to the Freenet on the Internet itself.

I would suggest each node being something like this:
--486 processor or low-end Pentium
--64 MB RAM
--200 MB hard drives
--10/100 Mb (or maybe just 10) Ethernet card

The head node should be beafier, as it needs to handle a lot of bandwidth:
-- Pentium III or Athlon
-- 256 MB RAM
-- ?? GB hard drive (see below)
-- Gigabit / Myrnet / mutiple 10/100 Ethernet

The general nodes NFS mount a partition on the head node and store their
Freenet data store there.  Thus, the head node needs to be able to handle
the bandwidth.  Lets give each Freenet node within the Beowulf a 200 MB
store.  So, running 20 Freenet nodes per Beowulf node, you need:

200 * 20 * n = x

ammount of hard drive space on that particular partition, where n is the
number of Beowulf nodes.  20 Beowulf nodes gives you 200 * 20 * 20 = 80,000
MB (~80 GB).  The head node will also be responsible for running a script
which parses logs from the Freenet nodes.  If done correctly, the script
should be able to figure out where each piece of data exists on the network
and which nodes know about each other (in Freenet terms, "how well connected
the network is).  This script (or set of scripts) is probably the most
complex part.

The most difficult part, as stated before, is replicating user behavior.  I
suggest the following meathod, which addmittedly isn't perfect, but it's the
best automated solution I could think of while sitting here scratching my
head:

Consider that any document has four levels of popularity (high, medium, low,
and none) and three levels of size (high, medium, low).  You may get better
results if you add more levels in between.  Then, a stream of bytes is
grabed out of /dev/random that coresponds to it's size (a la `dd
if=/dev/random of=/tmp/somefile count=1 bs=<size>`).  I suggest making the
levels of size a range of sizes (like "10-30 MB") instead of a fixed length.
It then gives it to a random Beowulf node to be inserted on one of it's
Freenet nodes (again, exactly which Freenet node is choosen at random).  The
head node now asks other, random Beowulf nodes to request the file using
random Freenet nodes.  Exactly how many Freenet nodes should be requesting
this depends on the document's preselected popularity.  Information such as
the HTL needed to request the file should be recorded and sent back to the
head node for processing (for the Beowulf people: HTL is "Hops To Live", or
how many nodes you had to pass through to get your data)

This could probably use some refinement, but it's a start.

Timm Murray
--------------------

Life is like a perl script:  Really short and messy.