[Beowulf] TORQUE issues
Reuti
reuti at Staff.Uni-Marburg.DE
Sun Apr 13 10:53:53 PDT 2008
Hi,
Am 13.04.2008 um 04:52 schrieb Lance S. Jacobsen:
> I recently put together a small cluster of Xeons using CentOS 5.1
> x86_64. This cluster is my first real big experience with Linux
> and administration. It took some learning and such to install NIS,
> NFS, etc., but now the machines seem to be working well, and so I
> am working on the next step: installing a que scheduler. I decided
> on TORQUE 2.3.0 since its free and I don't know any better. I have
> installed this and am having trouble getting it to detect my nodes.
>
> I think the problem is that I named them starting with numbers in
> my /etc/hosts file: 1of12 , 2of12, ... 12of12. Instead of something
> like node01, node02, ...
>
> After the installation, TORQUE did not create a file called 'nodes'
> which it told me that I needed, and so after searching the web I
> found the command to create it:
>
> # qmgr -c "create node 2of12"
>
> When I do this it gives me the following reply:
>
> qmgr: syntax error - checklist failed
> create node 2of12
> /\
>
> If I do this naming my node with a letter in front (n2of12) then it
> seems to work and generate the nodes file.
>
> Now if I then go and do the "pbsnodes -a" command it tells me:
>
> n2of12
>
> state = down
> np =1
> ntype = cluster
>
> seems fine... should be down since there is no n2of12 in my hosts
> file.
>
> Now if I then go and rename the node in the node file back to 2of12
> and type the following to kill and restart the server:
>
> # qterm
> # pbs_server
>
> I get the following reply:
>
> PBS_Server: pbsd_init(setup_nodes), token "2of12" doesn't start
> with alpha on line 1.
>
> PBS_Server: PBS_Server, pbsd_init failed
>
> Now I am reluctant to go and change all of my node names (IP
> aliases) since everything else about my cluster is finally working
> well and so I have been trying to find out why pbsd_init will not
> accept host names that start with numbers. Also, I would hate to go
> and change this if it is not the problem.
>
> Does anyone know if I might be able to edit the setup files
> associated with pbsd_init to get this to work (or any other ways to
> do this)?
I wouldn't use in general a digit as first charcter, like it's
outlined here:
http://rfc.net/rfc1178.html page 4.
Some programs might simply check the first character to decide
whether it's a hostname or TCP/IP address. Thinking in long terms and
additional software in your cluster (maybe even parallel apps), I
would suggest to change the names of the machines.
-- Reuti
BTW: Torque has a list on its own at: http://www.clusterresources.com
> Thanks,
>
> Lance
>
> --
> Lance S. Jacobsen, Ph.D.
> President
> GoHypersonic Incorporated
> 714 E. Monument Ave., Suite 201
> Dayton, OH 45402-1382
> Tel: 937-531-6678
> Fax: 937-531-6679
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf
mailing list