[Beowulf] Re: TORQUE issues
Marc Noguera Julian
marc at klingon.uab.cat
Sun Apr 13 12:38:00 PDT 2008
Hi,
I have torque installed on my cluster (40 nodes amd opteron) and is running
fine. My advice is to change names of the nodes as you indicate. That is,
beginnin with a letter as a general practice as Reuti says. My experience with
torque is that you can't submit a job with the "name" parameter (qsub -N)
beginning with a number, so i expect a similar behaviour with node names.
Hope it helps
Marc
------------------------------------------------------
Marc Noguera i Julian, PhD
System Manager / Researcher
Despatx C7-149. Edifici Cn.
Campus UAB. Bellaterra
08193. Barcelona
email: marc_at_klingon.uab.es
web: http://klingon.uab.es/marc
Tlf/Phone: 00 34 935812173
-------------------------------------------------------
>
> Message: 2
> Date: Sun, 13 Apr 2008 19:53:53 +0200
> From: Reuti <reuti at Staff.Uni-Marburg.DE>
> Subject: Re: [Beowulf] TORQUE issues
> To: "Lance S. Jacobsen" <lance at gohypersonic.com>
> Cc: Beowulf at beowulf.org
> Message-ID:
> <22C3BC2E-BFF3-41AE-9477-196A02958541 at staff.uni-marburg.de>
> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
>
> Hi,
>
> Am 13.04.2008 um 04:52 schrieb Lance S. Jacobsen:
> > I recently put together a small cluster of Xeons using CentOS 5.1
> > x86_64. This cluster is my first real big experience with Linux
> > and administration. It took some learning and such to install NIS,
> > NFS, etc., but now the machines seem to be working well, and so I
> > am working on the next step: installing a que scheduler. I decided
> > on TORQUE 2.3.0 since its free and I don't know any better. I have
> > installed this and am having trouble getting it to detect my nodes.
> >
> > I think the problem is that I named them starting with numbers in
> > my /etc/hosts file: 1of12 , 2of12, ... 12of12. Instead of something
> > like node01, node02, ...
> >
> > After the installation, TORQUE did not create a file called 'nodes'
> > which it told me that I needed, and so after searching the web I
> > found the command to create it:
> >
> > # qmgr -c "create node 2of12"
> >
> > When I do this it gives me the following reply:
> >
> > qmgr: syntax error - checklist failed
> > create node 2of12
> > /\
> >
> > If I do this naming my node with a letter in front (n2of12) then it
> > seems to work and generate the nodes file.
> >
> > Now if I then go and do the "pbsnodes -a" command it tells me:
> >
> > n2of12
> >
> > state = down
> > np =1
> > ntype = cluster
> >
> > seems fine... should be down since there is no n2of12 in my hosts
> > file.
> >
> > Now if I then go and rename the node in the node file back to 2of12
> > and type the following to kill and restart the server:
> >
> > # qterm
> > # pbs_server
> >
> > I get the following reply:
> >
> > PBS_Server: pbsd_init(setup_nodes), token "2of12" doesn't start
> > with alpha on line 1.
> >
> > PBS_Server: PBS_Server, pbsd_init failed
> >
> > Now I am reluctant to go and change all of my node names (IP
> > aliases) since everything else about my cluster is finally working
> > well and so I have been trying to find out why pbsd_init will not
> > accept host names that start with numbers. Also, I would hate to go
> > and change this if it is not the problem.
> >
> > Does anyone know if I might be able to edit the setup files
> > associated with pbsd_init to get this to work (or any other ways to
> > do this)?
>
> I wouldn't use in general a digit as first charcter, like it's
> outlined here:
>
> http://rfc.net/rfc1178.html page 4.
>
> Some programs might simply check the first character to decide
> whether it's a hostname or TCP/IP address. Thinking in long terms
> and additional software in your cluster (maybe even parallel apps),
> I would suggest to change the names of the machines.
>
> -- Reuti
>
> BTW: Torque has a list on its own at: http://www.clusterresources.com
>
> > Thanks,
> >
> > Lance
> >
> > --
> > Lance S. Jacobsen, Ph.D.
> > President
> > GoHypersonic Incorporated
> > 714 E. Monument Ave., Suite 201
> > Dayton, OH 45402-1382
> > Tel: 937-531-6678
> > Fax: 937-531-6679
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
> > http://www.beowulf.org/mailman/listinfo/beowulf
>
> ------------------------------
>
> _______________________________________________
> Beowulf mailing list
> Beowulf at beowulf.org
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> End of Beowulf Digest, Vol 50, Issue 24
> ***************************************
More information about the Beowulf
mailing list