<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.6000.16788" name=GENERATOR></HEAD>
<BODY>
<DIV dir=ltr align=left><SPAN lang=EN>
<P><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=082052818-10032009>I have a small test cluster built off Novell SUES
Enterprise Server 10.2 that is giving me fits. </SPAN>It seems that every
time the hardware is physically moved <SPAN class=082052818-10032009>(keep
getting kicked out of the space I'm using), </SPAN>I end up with any number of
different problems. </FONT></FONT></FONT></P>
<P><FONT><FONT><FONT face=Arial><FONT color=#0000ff><FONT size=2>Personally I
suspect some type of hardware issue (this equipment is about 5 years old), but
one of my co-workers isn't so sure<SPAN class=082052818-10032009> hardware is in
play</SPAN>.<SPAN class=082052818-10032009> I was having problems with the
RAID initializing after one move back which I resolved a while back by reseating
the RAID controller card.</SPAN></FONT></FONT></FONT></FONT></FONT></P>
<P><FONT><FONT><FONT face=Arial><FONT color=#0000ff><FONT size=2><SPAN
class=082052818-10032009>This time </SPAN>It appears that the file system &
configuration databases became corrupted after moving the equipment. Several
services aren't starting up <SPAN class=082052818-10032009>(LADP, DHCP, PBS
to name a few) </SPAN>and YAST2 hangs any time an attempt is made to use it. For
example adding a printer or <SPAN class=082052818-10032009>software
</SPAN>package. My co-worker feels the issue maybe related to the ReiserFS file
system<SPAN class=082052818-10032009> with AMD processors.</SPAN> The
ReiserFS file system was the default presented when I initially installed SLES
so I went with it.</FONT></FONT></FONT></FONT></FONT></P>
<P><FONT face=Arial color=#0000ff size=2>Do you know of any issues with using
the ReiserFS file system on AMD based systems or have any other ideas what I
maybe facing?</FONT></P></SPAN></DIV>
<DIV> </DIV>
<DIV align=left>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt" align=left><SPAN
style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial"><STRONG>Steven A.
Herborn<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office"
/><o:p></o:p></STRONG></SPAN></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><STRONG><?xml:namespace prefix =
st1 ns = "urn:schemas-microsoft-com:office:smarttags" /><st1:country-region
w:st="on"><st1:place w:st="on"><SPAN
style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial">U.S.</SPAN></st1:place></st1:country-region><SPAN
style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial"> Naval
Academy<o:p></o:p></SPAN></STRONG></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><SPAN
style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial"><STRONG>Advanced
Research Computing<o:p></o:p></STRONG></SPAN></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><SPAN
style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial"><STRONG>410-293-6480
(Desk)<o:p></o:p></STRONG></SPAN></P>
<P class=MsoNormal style="MARGIN: 0in 0in 0pt"><SPAN
style="FONT-SIZE: 10pt; COLOR: navy; FONT-FAMILY: Arial"><STRONG>757-418-0505
(Cell)<o:p></o:p></STRONG></SPAN></P></DIV>
<DIV> </DIV><BR>
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> beowulf-bounces@beowulf.org
[mailto:beowulf-bounces@beowulf.org] <B>On Behalf Of </B>gossips
J<BR><B>Sent:</B> Monday, March 09, 2009 5:08 AM<BR><B>To:</B>
beowulf@beowulf.org<BR><B>Subject:</B> [Beowulf] HPCC "intel_mpi"
error<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV><SPAN class=Apple-style-span
style="FONT-SIZE: 16px; FONT-FAMILY: 'times new roman'; -webkit-border-horizontal-spacing: 5px; -webkit-border-vertical-spacing: 5px"><PRE>Hi,
We are using ICR validation.
We are facing following problem while running below command:
cluster-check --debug --include_only intel_mpi /root/sample.xml
Problem is:
Output of cluster checker shows us that "intel_mpi" FAILED, where as by
looking into debug.out file it is seen that "Hello World" is returned from
all nodes.
I have 16 nodes configuration and we are running 8 proc/node.
Above behavior is observed with even 1 proc/node, 2 proc/node, 4 proc/node
as well. I also tried "rdma" and "rdssm" as a DEVICE in XML file but no luck.
If anyone can shed some light on this issue, it would be great help.
<BR></PRE><PRE>Another thing I would like to know is:
</PRE><PRE>Is there a way to specify "-env RDMA_TRANSLATION_CACHE" option with Intel Cluster Checker?</PRE><PRE>Awaiting for kind response,</PRE><PRE><BR></PRE><PRE>Thanks in advance,</PRE><PRE>Polk.</PRE></SPAN></DIV></BODY></HTML>