[Beowulf] PBS scheduler error
Onur Destanoğlu
odestanoglu at gmail.com
Thu Aug 18 06:16:59 PDT 2005
Hi all,
this is my script file:
#PBS -N firstscp
#PBS -l nodes=1:ppn=2
#PBS -l mem=4mb
#PBS -l walltime=1:00:00
#PBS -V
#PBS -m bea
#PBS -o /home/niyazi/cikislog
cd /home/niyazi
mpirun -np 2 first
this is the configuration of server and queue:
# Create queues and set their attributes.
#
#
# Create and define queue startup
#
create queue startup
set queue startup queue_type = Execution
set queue startup acl_user_enable = True
set queue startup acl_users = niyazi at bee00.bee-hive
set queue startup resources_default.mem = 400mb
set queue startup resources_default.ncpus = 2
set queue startup enabled = True
set queue startup started = True
#
# Set server attributes.
#
set server scheduling = True
set server managers = root at bee00.bee-hive
set server operators = niyazi at bee00.bee-hive
set server operators += root at bee00.bee-hive
set server default_queue = startup
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_available.mem = 9gb
set server resources_available.ncpus = 12
set server resources_max.mem = 9gb
set server resources_max.ncpus = 12
set server scheduler_iteration = 600
set server node_ping_rate = 300
set server node_check_rate = 150
set server tcp_timeout = 6
set server default_node = bee01
set server node_pack = True
set server job_stat_rate = 30
and these are the three mails that came after my execution command
(qsup firstscp)
1.
PBS Job Id: 31.bee00.bee-hive
Job Name: firstscp
Begun execution
2.
PBS Job Id: 31.bee00.bee-hive
Job Name: firstscp
Execution terminated
Exit_status=215
resources_used.cput=00:00:00
resources_used.mem=0kb
resources_used.vmem=0kb
resources_used.walltime=00:00:00
3.
PBS Job Id: 31.bee00.bee-hive
Job Name: firstscp
File stage in failed, see below.
Job will be retried later, please investigate and correct problem.
Post job file processing error; job 31.bee00.bee-hive on host bee01/1+bee01/0
Unable to copy file 31.bee00.be.OU to bee00.bee-hive:/home/niyazi/cikislog
>>> error from copy
rcmd: getaddrinfo: Temporary failure in name resolution
>>> end error output
Output retained on that host in: /var/spool/torque/undelivered/31.bee00.be.OU
Unable to copy file 31.bee00.be.ER to bee00.bee-hive:/home/niyazi/firstscp.e31
>>> error from copy
rcmd: getaddrinfo: Temporary failure in name resolution
>>> end error output
Output retained on that host in: /var/spool/torque/undelivered/31.bee00.be.ER
So why my pretty system can not finish any job that i submit....
More information about the Beowulf
mailing list