gaussian98/rh 6.2 trouble
Eugene.Leitl at lrz.uni-muenchen.de
Eugene.Leitl at lrz.uni-muenchen.de
Wed Mar 7 12:54:30 PST 2001
Please reply to chemistry at ccl.net as well as original requesters,
not to me.
------------------forwarded message---------------------------------------
Subject:
No Subject Given By The Author
To:
chemistry at ccl.net
Dear CCL members,
We have two clusters of intel based machines.
Cluster 1
* Intel Pentium III 600 MHz CPU with fan
* ASUS P3B-F Motheroard
* 512MB SDRAM PC100MHz (2x256MB), Cas III Type and Low Profile.
* 22GB IBM IDE HDD 7200RPM
* 4MB AGP ATI Display Card
cluster 2
* (1)Intel Pentium III 800MHz CPU with fan, 256K cache
* Intel L440GX+ Dual Pentium III Motherboard
* SCSI, Lan and VGA on Board
* 1024MB SDRAM PC100MHz (4x256MB), ECC Register
* 40GB IBM IDE HDD 7200RPM
We ran gaussian98 in parallel using linda on cluster 1
using redhat 6.1 for several months and everything appeared
to function perfectly. Cluster 2 came with redhat 6.2, and
initially everything seemed to be fine, so cluster 1 was upgraded
to 6.2. Under 6.2 gaussian jobs fail on cluster 1 with
a relatively short mean time between failure. We have
now discovered that cluster 2 also seems to have problems
under 6.2, but the mean time between failures is much
longer.
The failures come in two types: 1) the calculation appears
to run but values are all garbage (the values print as "nan")
2) one of the processes dies, most commonly, the message
is process 0 failed to complete.
Before restoring 6.1 to the machine, does anyone know
what this problem is and how to fix it?
Thanks,
Alessandra
---------------------------------------------------------------------
Alessandra Ricca Mail: NASA Ames Research Center
Senior Research Scientist Mail Stop 230-3
ELORET Corporation Moffett Field, CA 94035-1000
http://www.eloret.com
Ph: +1-650-604-5410 Email: ricca at pegasus.arc.nasa.gov
Fax: +1-650-604-0350
-= This is automatically added to each message by mailing script =-
CHEMISTRY at ccl.net -- To Everybody | CHEMISTRY-REQUEST at ccl.net -- To Admins
MAILSERV at ccl.net -- HELP CHEMISTRY or HELP SEARCH
CHEMISTRY-SEARCH at ccl.net -- archive search | Gopher: gopher.ccl.net 70
Ftp: ftp.ccl.net | WWW: http://www.ccl.net/chemistry/ | Jan: jkl at osc.edu
---------------------------------------------------------------------------------------------------------------
From:
Gerardo Andres Cisneros <andres at chem.duke.edu>
19:59
Subject:
CCL:No Subject Given By The Author
To:
Alessandra Ricca <ricca at pegasus.arc.nasa.gov>
CC:
chemistry at ccl.net
Hello
We have a similar problem, we've just built an 8 node PIII cluster running
RH6.2 (2.2.16-3 kernel) and I'm testing gaussian using Linda for this
cluster.
However, if the calculation takes more than 1 hour, invariably, one of the
slave nodes will run out of memory because gaussian will fail to kill the
processes so there will be a good number of phantom processes on the nodes
just sitting there occupying memory.
I would really appretiate if you could post a summary from any reply you
might get.
Thanks in advance
Andres
--
G. Andres Cisneros
Department of Chemistry
Duke University
andres at chem.duke.edu
-= This is automatically added to each message by mailing script =-
CHEMISTRY at ccl.net -- To Everybody | CHEMISTRY-REQUEST at ccl.net -- To Admins
MAILSERV at ccl.net -- HELP CHEMISTRY or HELP SEARCH
CHEMISTRY-SEARCH at ccl.net -- archive search | Gopher: gopher.ccl.net 70
Ftp: ftp.ccl.net | WWW: http://www.ccl.net/chemistry/ | Jan: jkl at osc.edu
More information about the Beowulf
mailing list