[Beowulf] NFS+XFS+SMP on kernel 2.6
Suvendra Nath Dutta
sdutta at cfa.harvard.edu
Wed Jun 15 09:47:31 PDT 2005
sauron:~ # uname -a
Linux sauron 2.6.8.1-suse91-osmp #1 SMP Thu Sep 2 01:10:09 EDT 2004
x86_64 x86_64 x86_64 GNU/Linux
Robert raised a similar question just before. Usage is really
restricted to just logging in and qsubbing a job. I've asked users to
start a interactive qsub when compiling, but that rarely happens. I've
removed all visualization and analysis software so people don't do
anything bad. The users usually don't even rcp files out, because the
filesystem is NFS shared (over a different port, see below) to a
biggish analysis machine where they run their analysis jobs. They do
ssh in and use X tunneling. But that is a different network port from
the one that NFS sharing is done. One port NFS shares out the
filesystem to the nodes (plain GB ethernet). Another port NFS shares
(the same filesystem) to a multiprocessor analysis machine. A third
port is used by users to connect to the server. In addition I run
ganglia which chews up the bandwidth. I've suggested people use the
node's scratch disk to output data and copy over the files at the end
of the job, but not everyone listens.
I guess I could update the kernel. Its certainly not automated, but
updating by hand is fairly simple. I have also considered adding 2 more
GB memory to the head node. Maybe that is a cheaper solution than
buying a different machine. Also is it possible to have a redundant
head node? It sounds impossible, or is it possible?
Suvendra.
On Jun 15, 2005, at 12:27 PM, Joe Landman wrote:
>
> Hi Suvendra:
>
>
> Suvendra Nath Dutta wrote:
>> We set up a 160 node cluster with a dual processor head node with 2GB
>> RAM. The head node also has two RAID devices attached to two SCSI
>> cards. These have a XFS filesystem on them and are NFS exported to
>> the cluster. The head node runs very low on memory (7-8 MB). And
>> today I ran into a kernel bug that crashed the system. Google
>> suggests that I should upgrade to kernel 2.6.11, but that sounds very
>> unpleasant. I am thinking of putting the raid boxes on a different
>> box. Will separating the file-server and the head node give me back
>> stability on the head node?
>
> What distribution are you using today? Which kernel?
>
> what does your uasge pattern look like on the head node and on the
> file server?
>
> Joe
>
>> Suvendra.
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics LLC,
> email: landman at scalableinformatics.com
> web : http://www.scalableinformatics.com
> phone: +1 734 786 8423
> fax : +1 734 786 8452
> cell : +1 734 612 4615
More information about the Beowulf
mailing list