[Beowulf] NFS+XFS+SMP on kernel 2.6

Suvendra Nath Dutta sdutta at cfa.harvard.edu
Wed Jun 15 09:47:31 PDT 2005


sauron:~ # uname -a
Linux sauron 2.6.8.1-suse91-osmp #1 SMP Thu Sep 2 01:10:09 EDT 2004 
x86_64 x86_64 x86_64 GNU/Linux

Robert raised a similar question just before. Usage is really 
restricted to just logging in and qsubbing a job. I've asked users to 
start a interactive qsub when compiling, but that rarely happens. I've 
removed all visualization and analysis software so people don't do 
anything bad. The users usually don't even rcp files out, because the 
filesystem is NFS shared (over a different port, see below) to a 
biggish analysis machine where they run their analysis jobs. They do 
ssh in and use X tunneling. But that is a different network port from 
the one that NFS sharing is done. One port NFS shares out the 
filesystem to the nodes (plain GB ethernet). Another port NFS shares 
(the same filesystem) to a multiprocessor analysis machine. A third 
port is used by users to connect to the server. In addition I run 
ganglia which chews up the bandwidth. I've suggested people use the 
node's scratch disk to output data and copy over the files at the end 
of the job, but not everyone listens.

I guess I could update the kernel. Its certainly not automated, but 
updating by hand is fairly simple. I have also considered adding 2 more 
GB memory to the head node. Maybe that is a cheaper solution than 
buying a different machine. Also is it possible to have a redundant 
head node? It sounds impossible, or is it possible?

										Suvendra.


On Jun 15, 2005, at 12:27 PM, Joe Landman wrote:

>
> Hi Suvendra:
>
>
> Suvendra Nath Dutta wrote:
>> We set up a 160 node cluster with a dual processor head node with 2GB 
>> RAM. The head node also has two RAID devices attached to two SCSI 
>> cards. These have a XFS filesystem on them and are NFS exported to 
>> the cluster. The head node runs very low on memory (7-8 MB). And 
>> today I ran into a kernel bug that crashed the system. Google 
>> suggests that I should upgrade to kernel 2.6.11, but that sounds very 
>> unpleasant. I am thinking of putting the raid boxes on a different 
>> box. Will separating the file-server and the head node give me back 
>> stability on the head node?
>
>   What distribution are you using today?   Which kernel?
>
> what does your uasge pattern look like on the head node and on the 
> file server?
>
> Joe
>
>> Suvendra.
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org
>> To change your subscription (digest mode or unsubscribe) visit 
>> http://www.beowulf.org/mailman/listinfo/beowulf
>
> -- 
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics LLC,
> email: landman at scalableinformatics.com
> web  : http://www.scalableinformatics.com
> phone: +1 734 786 8423
> fax  : +1 734 786 8452
> cell : +1 734 612 4615




More information about the Beowulf mailing list