[Beowulf] Multisocket mainboard hardware problems

Jon Aquilina eagles051387 at gmail.com
Thu Jan 15 13:21:27 PST 2009


try running memtest+86 its a cd that you boot on to that tests the memory
leave it running for a few hrs to makes sure it is the ram or sockets. i am
not sure about how to test the cpu.

On Tue, Jan 13, 2009 at 10:26 AM, Francesco Pietra <
francesco.pietra at accademialucchese.it> wrote:

> Hi:
>
> I am posting here from a suggestion on the Debian amd64 site. My
> original posting to the mainboard factory/vendor in Europe only
> resulted in uninteresting suggestions, and they did not answer any
> more.
>
> My question is directed to the attention of users familiar with
> multisocket UMA-type mainboards based on 875 dual opteron AMD CPU. My
> own is Supermicro H8QC8 with chipset nVidia CK804 and AMD 8132, driven
> by Debian Linux amd64 lenny.
>
> One of the CPUs has suddenly lost viability to its
> 4-slots memory bank (shut down the machine in order, the problem arose on
> next
> loading Linux). Still, the CPU cores are OK, hypertransport links are
> fully working, parallelization to both Amber 10 and NWChem 5.1 is
> fully provided, but one of the CPUs must be slower, having to borrow
> memory from the other
> banks. The hardware status, after a period of complete darkness, is
> described in the attached lshw_deb64_7Jan2009.txt.
>
> As each bank of Kingston DDR1 is filled 2+2+1+1 GB, I identified the
> faulty bank, removed all slots from there, and replaced the 1+1 GB
> slots at another bank with 2 + 2 GB from the faulty bank, so that now
> the computer is at 20GB. The situation is described in the attached
> lshw_deb64_lessCPU2_scrambling1G_2G_CPU4_7Jan2009.txt. Actually,
> identification of the CPU (CPU2) related to the faulty mem bank is
> insecure: I just considered the nearest CPU to the faulty bank. The
> manual is not helpful to this regard .
>
> I understand that, in order to remove non-mainboard causes, I should
> be certain that a CPU has not lost memory control. Since replacing (I
> have one spare second-hand CPU) or scrambling, the CPUs is quite
> troublesome, and risky, in my context (there is very little space
> around the mainboard in the rack that I engineered to accept the
> mainboard). Ventilation is excellent, however.
>
> Therefore, is it any software way to check if the CPUs are fully in
> order, including the memory controller? lshw and other software
> provided only partial help in my hands.
>
> Also any other suggestion would be greatly appreciated.
>
> Thanks for your kind attention
>
> francesco pietra
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>



-- 
Jonathan Aquilina
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20090115/644e29ff/attachment.html>


More information about the Beowulf mailing list