[Beowulf] Any Gaussian users out there?

Rafael R. Pappalardo rafapa at us.es
Tue Jan 9 00:33:42 PST 2007

On Monday 08 January 2007 04:49, Joe Landman wrote:
> I found a neat ... feature ... of Linux while getting g03 running in SMP
> on cluster nodes.  Long story, but the folks I am doing this for don't
> have/want to use Linda.  They asked us to help them get g03 operational
> in SMP parallel.  This wasn't painful.  Have it integrated into SGE and
> our SICE interface now as well.
> Basic idea is that we are getting a kernel exception in the VFS layer
> only when running with 2 or more CPUs on an SMP node.  Shows up only on
> SuSE 9.3 nodes.  The other nodes are RHEL 3 based (2.4 kernel, but hey,
> its really stable).
> I don't want to post a nasty-looking trap here.
> The problem occurs with both xfs and jfs.  Haven't had the chance to try
> ext3 yet, though if the issue is in the vfs layer, I can't see how
> changing the underlying block device is going to alter the layers (VFS)
> above it.
> The net effect of this is that it runs great on the 2.4 based machines,
> but gets SIGKILLs when running on the 2.6 based SuSE 9.3 machines.
> Looks like the app is tickling the OS bug.  I can repeatably cause this
> trap, though it seems to occur at "random" places, well, not really.
> The way Gaussian runs, it has "links" which are binary modules which
> execute a particular portion of the calculation (its pretty neat
> really).  Each link is read in from the disk.  This VFS bug gets
> triggered regardless of local or remote FS.
> Any Gaussian users out there see that?  Does a kernel upgrade fix it?
> Inquiring minds want to know ...

Don't know if it's threads related but... Sometimes setting
LD_ASSUME_KERNEL to 2.4.1 in the environment solves this kind of problems.
There are other possible values, you can have a look at:

Best regards,

Dr. Rafael R. Pappalardo
Dept. Physical Chemistry, Univ. de Sevilla (Spain)
e-mail: rafapa at us.es

More information about the Beowulf mailing list