BEOWULF cluster hangs

Josip Loncaric josip at icase.edu
Thu Sep 26 13:04:13 PDT 2002


Regarding VM/SMP/IDE issues in 2.4 kernels:

We still see some VM problems on SMP machines with memory intensive jobs, even
in the latest Red Hat kernel 2.4.18-10smp.  Single CPU machines running
2.4.18-10 are generally stable (but see below).  Also, support for ServerWorks
chipset in 2.4 kernels is worse than in 2.2 kernels, resulting in IDE
performance degradation (no UDMA) and downright crashes when kernel detects
that OSB4 is in an "impossible state".

Sincerely,
Josip

P.S.  "Optimistic memory allocation" in 2.4 kernels can misbehave.  User
application typically gets no indication of memory shortage when it asks for
memory, but when it tries to use the allocated memory, the application (or
another process) can get terminated without any warning by the kernel's
out-of-memory (OOM) killer.  Given this design, I would not want to rely on
any applications staying up under heavy memory demand.  Moreover, while this
at least seems to work as designed on uniprocessor machines, our experience is
that when swap is enabled on SMP machines, even the OOM killer often cannot
prevent system crashes during OOM conditions (the machine crashes trying to
find a free memory page).


-- 
Dr. Josip Loncaric, Research Fellow               mailto:josip at icase.edu
ICASE, Mail Stop 132C           PGP key at http://www.icase.edu./~josip/
NASA Langley Research Center             mailto:j.loncaric at larc.nasa.gov
Hampton, VA 23681-2199, USA    Tel. +1 757 864-2192  Fax +1 757 864-6134



More information about the Beowulf mailing list