[Beowulf] shared memory error

Jörg Saßmannshausen j.sassmannshausen at ucl.ac.uk
Mon Apr 18 15:31:27 PDT 2016


Hi all,

sorry for the lack of reply but I done some testings and now I got some 
updates here.

I am using GAMESS version 5 DEC 2014 (R1) compiled with gfortran 4.9.2 on 
Debian Linux Jessie. ATLAS is used for BLAS/LAPACK. All the test jobs have 
passed.

What I have noticed is: the problem is not really reproducible. So, the same 
input file is running well on my machine at home (same GAMESS version but 
gfortran 4.7.2 and Debian Wheezy) but not on the machines at work. To make 
things more interesting:
- it might run perfectly ok on one machine but not on another one. They are 
identical nodes with identical OS. All the installation of the nodes are done 
from one master image.
- it might start and generates the error very soon
- it might run for ages and suddenly generates the error
- my binary from my machine does generate the error on the machines at work

- I am lost in the mist. :-)

I cannot see a pattern here. I am still wondering whether my settings of the 
shared memory might be correct as the only differences I can see between my 
machine at home (48 GB of RAM) and the machines at work (64 GB of RAM) is the 
memory. Having said that, as I got less RAM at home and it is working I would 
have thought that my settings are ok for less RAM and thus should work on more 
RAM as well.

Unfortunately I never got a reply from the GAMESS groups which usually means 
nobody knows the answer here. 

Any ideas?

All the best from a cold London

Jörg




On Dienstag 05 April 2016 Rafael R. Pappalardo wrote:
> Could you share with us the input file(s)? Which version of GAMESS-US?
> 
> On lunes, 4 de abril de 2016 22:29:11 (CEST) Jörg Saßmannshausen wrote:
> > Dear all,
> > 
> > I was wondering whether somebody might be able to shed some light on this
> > problem I am having with a chemistry code (GAMESS-US):
> > 
> > DDI Process 15: semop return an error performing 1 operation(s) on semid
> > 98307.
> > semop errno=EINVAL.
> > 
> > This sometimes happens when I need quite a bit of memory for the fortran
> > code (1550000000 words). Originally I thought it has to do with the
> > hardware I am running it on but meanwhile I found it all over the place,
> > i.e. on some older Opterons and on some newer Ivy and Haswell CPUs.
> > 
> > It is not quite reproducible, unfortunately. A run might work ok for a
> > few days and then the problem kicks in and the logfile explodes from
> > around 14 MB to 17 GB, or it might just work.
> > 
> > Some system informations: I am running Debian Jessie with gcc / gfortran
> > version 4.9.2-10. The nodes have 64 GB of RAM and 16 or 20 cores.  As the
> > shared memory default settings in Linux are not suitable for GAMESS
> > (there is a note in the documentation), I am using these settings on the
> > 64 GB RAM machines:
> > 
> > kernel.shmmax = 6923000000
> > kernel.shmall = 25165824
> > kernel.shmmni = 32768
> > 
> > I got the feeling the problem lies burried in these settings but my
> > knowledge here is not sufficient to solve the problem. Could somebody
> > point me in the right direction here?
> > 
> > All the best from London
> > 
> > Jörg


-- 
*************************************************************
Dr. Jörg Saßmannshausen, MRSC
University College London
Department of Chemistry
20 Gordon Street
London
WC1H 0AJ 

email: j.sassmannshausen at ucl.ac.uk
web: http://sassy.formativ.net

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: This is a digitally signed message part.
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20160418/92c756b8/attachment-0001.sig>


More information about the Beowulf mailing list