DQS drops jobs on SuSE 6.3 cluster

Kris Thielemans kris.thielemans at csc.mrc.ac.uk
Thu Nov 2 03:52:58 PST 2000


Hi,

I'm trying to get DQS running on our cluster of 4 SuSE 6.3 systems. I tried
3 different versions of DQS
- the RPM package on the original CD
- the RPM pakcage provide on the SuSE website to update it to fix a y2k
problem (version 3.2.7)
- the newest version  (3.3.1) from ftp.scri.fsu.edu (compiled from
sources)

All 3 versions have the same problem:
jobs are occasionally dropped from the queue, or even not started

Symptoms:
qsub somejob.sh   -> works ok
qstat -f                -> lists job

(a little bit later)
qstat -f                -> job gone

This happens with the simple dqs.sh example script that they provide for
testing.

There is NO error message in the dqs err_file, or anything in the log_file.

This problem also occurs when I disable all queues except 1 (on the same
node as the qmaster).


Any ideas?

Thanks,

Kris Thielemans

MRC Cyclotron Unit,
Hammersmith Hospital,
DuCane Rd,London W12 0NN, United Kingdom





More information about the Beowulf mailing list