[Beowulf] cluster building advice?
Jörg Saßmannshausen
j.sassmannshausen at ucl.ac.uk
Tue Sep 18 01:56:56 PDT 2012
Dear all,
really good advice here!
I would like to add something: For a smaller cluster and if you don't want to
use puppet, what I am doing is I am rsync the nodes from a local directory on
the headnode. That way I can update the software easily by simply adding it to
the node-directory on the headnode and run rsync on the nodes.
As you are doing an PXE on the nodes, you might want to add something like
memtest and I also have installed an NFS boot here as well. So if there is a
problem with then node I can look into it (memtest for memory, any other issue
like disc problems the NFS boot is good for it). I am also using the NFS boot
for the installation (same as above: copy the files over via rsync).
I hope that helps a bit.
All the best from London
Jörg
On Tuesday 18 September 2012 07:42:58 Bill Broadley wrote:
> On 09/16/2012 02:52 PM, Jeffrey Rossiter wrote:> The intention is for
> the system to be
>
> > used for scientific computation.
>
> That doesn't narrow it down much.
>
> > I am trying to decide on a linux
> > distribution to use.
>
> I suggest doing it yourself based on whatever popular linux distro you
> have experience with. Assuming general linux systems administrator
> proficiency, it's not particularly hard. I'd suggest starting with
> Scientific linux (especially if your applications assume it) or
> Debian/Ubuntu (which seem to have larger repositories). I'd lean
> towards Ubuntu if you are running new hardware since Sandy Bridge (new
> intel) and Bulldozer (new AMD) seem to benefit from the latest kernels.
>
> Then add:
> * Cobbler for PXE installing (or functionally similar software), network
> configuration, dhcp, dns, mac address, IP addresses, etc.
> * Puppet/Chef for configuration management (everything post-install)
> * Torque/Slurm for batch queue
> * Environmental modules or similar to help let users easily load the
> needed libraries/apps/environment they need in a reproducible way.
> * Ganglia/cacti/munin for graphing resource utilization.
> * /share/apps/<application name>-<version number> for anything you
> install that's not in the the repositories.
>
> Get nodes to netboot, netinstall, and mount a shared /home. Once users
> start using it listen to their needs and adapt accordingly.
>
> Some suggestions:
> * If your campus has a standard username for each user, use it.
> * Use ssh certs for user authentication, you really don't want your
> user's passwords, nor do they want to type it often.
> * start a wiki for documentation, allow users to edit it.
> * Have environmental modules output the name/version on module load,
> much easier to figure out what a user has done when you have the
> exact info to reproduce a run in the run's output.
> * set hardware physically to always netboot, then depend on the
> central server to decide if it should be from local disk or a new
> install.
> * Have compute nodes use host based ssh keys for auth (not user ssh
> keys)
> * Have head node use user based keys for login, do not allow
> ~/.ssh/authorized_keys
> * Allow exactly one ssh key per user.
> * Keep your configuration files in git or similar version control. Or
> if managed by puppet/chef, keep puppet/chef files in version control.
> * Strongly encourage any users writing source code to use a distributed
> version control system like git.
> * Be very very clear on the status/lack of backups. Be clear that loss
> of files will happen and it's only a matter of time.
> * Use software RAID.
>
> > Does it matter all that much?
>
> Not particularly. Random commercial software seems to assume RHEL based
> distros. Ubuntu/Debian seems to have the largest repositories (read
> that as the most likely to have a user request handled by apt-get install).
>
> > Any advice would be
> > greatly appreciated.
>
> You didn't mention your current experience, if the above sounds daunting
> then start with warewulf.
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
--
*************************************************************
Jörg Saßmannshausen
University College London
Department of Chemistry
Gordon Street
London
WC1H 0AJ
email: j.sassmannshausen at ucl.ac.uk
web: http://sassy.formativ.net
Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
More information about the Beowulf
mailing list