[Beowulf] Help req: Building a disked Beowulf

Chaitanya Krishna icymist at gmail.com
Wed Aug 24 14:22:46 PDT 2005


Hi,

I am posting to the list for the first time.

First a little background. Please bear.

I am doing my research in Molecular Dynamics and we have very badly
running Beowulf with 10 nodes in our lab. The position of the cluster
is that the Master is able to connect to the outside world (it has two
network cards) and we can access all the nodes (each has a network
card) from the master which are all connected through a switch. We are
able to run serial jobs on all the nodes but not parallel jobs. All of
them have SuSE 9.1 installed on them but not exactly with the same
partitions and the same software as the last few nodes were added by
another person differenet from the one who originally built the
cluster.

I have been entrusted with the responsibility of getting the cluster
to run parallel jobs as I am considered to be a computer geek here
(which, as you will know, I am not). Hence the request for this help.
You can consider that I just do not know anything. I have already
Googled and found and read some interesting stuff on the net about
building a cluster. But I am writing to get the views of you
experienced guys out there in the cyber space.

Well, the resources that I have are these:

1 Intel Pentium 4 3 Ghz Procs   10
2 Intel Mother boards                10
3 200 GB SATA Hard disks       10
4 120 GB IDE Hard disks           10
5 Network cards                          10 + 1 (1 extra for master)
6 Some already present switches

All the IDE drives will be primary (the OS will reside on this) and
the SATA drives will be use as secondary drives for storage)

My plan (and requirement) is the following:

1 To get the cluster up and running parallel jobs.
2 The way I intend to do 1 is this. Install the OS (SuSE 9.3 Pro) on
the master and install barebones ( I am not sure, but may be something
like kernel, NFS and/or NIS, SSH, etc) on the rest of the nodes so
that I am able to run parallel jobs as well as serial jobs on the
nodes. Will require help on this.
3 Whatever software I install on the master should be available on the
nodes too (I guess this is possible either with NIS or NFS). Here too
some help!
4 I should have no need to propagate my executable to all the nodes
manually to run a parallel job. I guess it should be possible if 3 is
possible.
5 All the nodes should be able to store data on the drives attached to
them Storage is very important.

I haven't yet checked out the archives of the Beowulf list, but it
would be very helpful if someone can tell me if all or some of the
above are possible and some pointers as to where I can go next for
some more information.

Regards,
Chaitanya.

Indian Institute of Science.
Bangalore.
India.

-- 
To err is human, but to really screw up you need a computer.




More information about the Beowulf mailing list