[Beowulf] motherboards for diskless nodes

Fri Feb 25 10:21:30 PST 2005

Craig,

Reasons to run disks for physics work.

1. Large tmp files and checkpoints.

2. Ability for distributed jobs to continue if master node fails.

3. saving network io for jobs rather than admin

I actually seldom update compute nodes (unless an update is required for 
software required for research). I mount, a /usr/global that does 
contain software. I also mount /home on each node.

An example of item 1 above are Gaussian jobs that we are now running 
that require >40GB of tmp space. For these jobs I have both an OS 20GB 
and tmp 100GB disk in each node. Due to a problematic scsi to ide 
converter, I have experienced item 2 too many times with one cluster, 
but even on the others I like knowing that work can continue even if the 
host is down (facilitated by a separate nfs server).

Of course, I am definitely old school. I use static IP's, individual 
passwd files. and simple scripts to handle administration.

Mike

Craig Tierney wrote:

>On Fri, 2005-02-25 at 01:16, John Hearns wrote:
>  
>
>>On Thu, 2005-02-24 at 18:20 -0500, Jamie Rollins wrote:
>>    
>>
>>>Hello.  I am new to this list, and to beowulfery in general.  I am working
>>>at a physics lab and we have decided to put together a relatively small
>>>beowulf cluster for doing data analysis.  I was wondering if people on
>>>this list could answer a couple of my newbie questions.
>>>
>>>The basic idea of the system is that it would be a collection of 16 to 32
>>>off-the-shelf motherboards, all booting off the network and operating
>>>completely disklessly.  We're looking at amd64 architecture running
>>>Debian, although we're flexible (at least with the architecture ;).  Most
>>>of my questions have to do with diskless operation.
>>>      
>>>
>>Jamie, 
>>  why are you going diskless?
>>IDE hard drives cost very little, and you can still do your network
>>install.
>>Pick your favourite toolkit, Rocks, Oscar, Warewulf and away you go.
>>
>>    
>>
>
>IDE drives fail, they use power, you waste time cloning, and
>depending on the toolkit you use you will run into problems
>with image consistency.
>
>I have run large systems of both kinds.  The last system was
>diskless and I don't see myself going back.  I like changing
>one file in one place and having the changes show up immediately.
>I like installing a packing once, and having it show up immediately,
>so I don't have to reclone or take the node offline to update
>the image.
>
>Craig
>
>
>  
>
>>BTW, have a look at Clusterworld http://www.clusterworld.com
>>They have a project for a low-cost cluster which is similar to your
>>thoughts.
>>
>>
>>Also, with the caveat that I work for a clustering company,
>>why not look at a small turnkey cluster?
>>I fully acknowledge that building a small cluster from scratch will be
>>a good learning exercise, and you can get to grips with the motherboard,
>>PXE etc. 
>>However if you are spending a research grant, I'd argue that it would be
>>cost effective to buy a system with support from any one of the
>>companies that do this.
>>If you get a prebuilt cluster, the company will have done the research
>>on PXE booting, chosen gigabit interfaces and switches which perform
>>well, chosen components which will last. And when your power supplies
>>fail, or a disk fails someone will come round to replace them.
>>And you can get on with doing your science.
>>
>>    
>>
>
>
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
>  
>