gaussian98/rh 6.2 trouble

Wed Mar 7 12:54:30 PST 2001

Please reply to chemistry at ccl.net as well as original requesters,
not to me.

------------------forwarded message---------------------------------------

Subject: 
         No Subject Given By The Author
     To: 
         chemistry at ccl.net

Dear CCL members,

We have two clusters of intel based machines.

Cluster 1 
 * Intel Pentium III 600 MHz CPU with fan
 * ASUS P3B-F Motheroard
 * 512MB SDRAM PC100MHz (2x256MB), Cas III Type and Low Profile.
 * 22GB IBM IDE HDD 7200RPM
 * 4MB AGP ATI Display Card
cluster 2
 * (1)Intel Pentium III 800MHz CPU with fan, 256K cache
 * Intel L440GX+ Dual Pentium III Motherboard
 * SCSI, Lan and VGA on Board
 * 1024MB SDRAM PC100MHz (4x256MB), ECC Register
 * 40GB IBM IDE HDD 7200RPM

We ran gaussian98 in parallel using linda on cluster 1 
using redhat 6.1 for several months and everything appeared 
to function perfectly.  Cluster 2 came with redhat 6.2, and 
initially everything seemed to be fine, so cluster 1 was upgraded
to 6.2.  Under 6.2 gaussian jobs fail on cluster 1 with
a relatively short mean time between failure.  We have
now discovered that cluster 2 also seems to have problems
under 6.2, but the mean time between failures is much
longer.  
The failures come in two types: 1) the calculation appears
to run but values are all garbage (the values print as "nan")
2) one of the processes dies, most commonly, the message
is process 0 failed to complete.

Before restoring 6.1 to the machine, does anyone know
what this problem is and how to fix it?

Thanks,

Alessandra

---------------------------------------------------------------------
Alessandra Ricca                 Mail:  NASA Ames Research Center    
Senior Research Scientist               Mail Stop 230-3         
ELORET Corporation                      Moffett Field, CA 94035-1000 
http://www.eloret.com

Ph:  +1-650-604-5410             Email: ricca at pegasus.arc.nasa.gov
Fax: +1-650-604-0350

-= This is automatically added to each message by mailing script =-
CHEMISTRY at ccl.net -- To Everybody  | CHEMISTRY-REQUEST at ccl.net -- To Admins
MAILSERV at ccl.net -- HELP CHEMISTRY or HELP SEARCH
CHEMISTRY-SEARCH at ccl.net -- archive search    |    Gopher: gopher.ccl.net 70
Ftp: ftp.ccl.net  |  WWW: http://www.ccl.net/chemistry/   | Jan: jkl at osc.edu

---------------------------------------------------------------------------------------------------------------

From: 
         Gerardo Andres Cisneros <andres at chem.duke.edu>

19:59

 Subject: 
         CCL:No Subject Given By The Author
     To: 
         Alessandra Ricca <ricca at pegasus.arc.nasa.gov>
     CC: 
         chemistry at ccl.net

Hello

We have a similar problem, we've just built an 8 node PIII cluster running
RH6.2 (2.2.16-3 kernel) and I'm testing gaussian using Linda for this
cluster.

However, if the calculation takes more than 1 hour, invariably, one of the
slave nodes will run out of memory because gaussian will fail to kill the
processes so there will be a good number of phantom processes on the nodes
just sitting there occupying memory.

I would really appretiate if you could post a summary from any reply you
might get.

Thanks in advance

Andres

--
G. Andres Cisneros
Department of Chemistry 
Duke University
andres at chem.duke.edu

-= This is automatically added to each message by mailing script =-
CHEMISTRY at ccl.net -- To Everybody  | CHEMISTRY-REQUEST at ccl.net -- To Admins
MAILSERV at ccl.net -- HELP CHEMISTRY or HELP SEARCH
CHEMISTRY-SEARCH at ccl.net -- archive search    |    Gopher: gopher.ccl.net 70
Ftp: ftp.ccl.net  |  WWW: http://www.ccl.net/chemistry/   | Jan: jkl at osc.edu