i added the mailing list to this since you did not hit reply to all and i have been the only one getting the replies. i think that is not fair and you should be allowed to contact the manufacturer directly. i did that with corsair cuz of some fault ram and im rma ing the paried set that i have back to them. in all honesty i would contact the manufacturer and bypass the vendor all together.<br>
<br><div class="gmail_quote">On Fri, Jan 16, 2009 at 10:11 PM, Francesco Pietra <span dir="ltr"><<a href="mailto:chiendarret@gmail.com">chiendarret@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
To conclude, as it will be uninteresting to subscribers from here on,<br>
in Europe the customer can only contact the vendor of the Supermicro<br>
product. That gave no useful hint and the vendor does not answer any<br>
more. I asked which kind of test he wants to have in order to accept<br>
the mainboard for repair and he did not answer. Therefore, it could be<br>
a waste of time replacing the CPU (I have a spare one) unless it is<br>
just the CPU faulty, which (I believe) it is unlikely. If I prove that<br>
it was no faulty CPU, I could inform Beowulf and some friends here<br>
around about that discovery, or start a legal international action.<br>
Therefore, unless the CPU can be fully tested by software (and if<br>
faulty be replaced), I do nothing else that looking for another<br>
mainboard and assemble a new machine, this time for 16 logical<br>
processors. The more I have, the faster is the work. I understand that<br>
suggestions about the brand (obviously Supermicro is ruled out) can't<br>
be expected here.<br>
Thanks for all<br>
<font color="#888888">francesco<br>
</font><div><div class="Wj3C7c"><br>
On Fri, Jan 16, 2009 at 8:10 PM, Jon Aquilina <<a href="mailto:eagles051387@gmail.com">eagles051387@gmail.com</a>> wrote:<br>
> in that case you need to contact them by phone and request an rma<br>
><br>
> On Fri, Jan 16, 2009 at 3:48 PM, Francesco Pietra <<a href="mailto:chiendarret@gmail.com">chiendarret@gmail.com</a>><br>
> wrote:<br>
>><br>
>> That already tried. The slots from the bad bank are OK an another<br>
>> motherboard. Vice versa, good slots from another mainboard do not work<br>
>> on the bad bank.<br>
>><br>
>> I am no system expert, just a chemist, but I can only figure that the<br>
>> memory controller of the CPU is damaged. Otherwise the fault has<br>
>> arosen in the motherboard (voltage controller or something else).<br>
>><br>
>> francesco<br>
>><br>
>> On Fri, Jan 16, 2009 at 10:10 AM, Jon Aquilina <<a href="mailto:eagles051387@gmail.com">eagles051387@gmail.com</a>><br>
>> wrote:<br>
>> > dunno bout another type of motherboard but do you have another stick of<br>
>> > ram<br>
>> > you can try in those sockets instead. if so it could be that you just<br>
>> > have<br>
>> > bad ram.<br>
>> ><br>
>> > On Fri, Jan 16, 2009 at 9:46 AM, Francesco Pietra<br>
>> > <<a href="mailto:chiendarret@gmail.com">chiendarret@gmail.com</a>><br>
>> > wrote:<br>
>> >><br>
>> >> Hi:<br>
>> >> Running memtest86+ v. 2.11 is the first test I carried out, repeatedly<br>
>> >> and until completion. It did not detect the slots at the faulty bank<br>
>> >> and did not show errors for the remaining RAM (18GB). Otherwise, the<br>
>> >> 6GB at the faulty bank are OK. I would like to test via software the<br>
>> >> memory controller of the CPU at the faulty bank, which I believe is<br>
>> >> the last chance for the mainboard not being damaged. All CPUs have<br>
>> >> correct hypertransport and I have replaced two 1GB slots with 2GB<br>
>> >> slots. Though, the 20GB come short for some of my calculations.<br>
>> >><br>
>> >> As the Supermicro mainbord is only 8 months old (during which period<br>
>> >> it managed all 24GB RAM), I expected that Supermicro Europe takes<br>
>> >> action in some way. They simply stopped answering after having<br>
>> >> suggested something totally uninteresting.<br>
>> >><br>
>> >> Therefore, in assembling a new 4 quad-core UMA system, I am looking<br>
>> >> for another brand of mainboards. Suggestions?<br>
>> >><br>
>> >> francesco<br>
>> >><br>
>> >> On Thu, Jan 15, 2009 at 10:21 PM, Jon Aquilina <<a href="mailto:eagles051387@gmail.com">eagles051387@gmail.com</a>><br>
>> >> wrote:<br>
>> >> > try running memtest+86 its a cd that you boot on to that tests the<br>
>> >> > memory<br>
>> >> > leave it running for a few hrs to makes sure it is the ram or<br>
>> >> > sockets. i<br>
>> >> > am<br>
>> >> > not sure about how to test the cpu.<br>
>> >> ><br>
>> >> > On Tue, Jan 13, 2009 at 10:26 AM, Francesco Pietra<br>
>> >> > <<a href="mailto:francesco.pietra@accademialucchese.it">francesco.pietra@accademialucchese.it</a>> wrote:<br>
>> >> >><br>
>> >> >> Hi:<br>
>> >> >><br>
>> >> >> I am posting here from a suggestion on the Debian amd64 site. My<br>
>> >> >> original posting to the mainboard factory/vendor in Europe only<br>
>> >> >> resulted in uninteresting suggestions, and they did not answer any<br>
>> >> >> more.<br>
>> >> >><br>
>> >> >> My question is directed to the attention of users familiar with<br>
>> >> >> multisocket UMA-type mainboards based on 875 dual opteron AMD CPU.<br>
>> >> >> My<br>
>> >> >> own is Supermicro H8QC8 with chipset nVidia CK804 and AMD 8132,<br>
>> >> >> driven<br>
>> >> >> by Debian Linux amd64 lenny.<br>
>> >> >><br>
>> >> >> One of the CPUs has suddenly lost viability to its<br>
>> >> >> 4-slots memory bank (shut down the machine in order, the problem<br>
>> >> >> arose<br>
>> >> >> on<br>
>> >> >> next<br>
>> >> >> loading Linux). Still, the CPU cores are OK, hypertransport links<br>
>> >> >> are<br>
>> >> >> fully working, parallelization to both Amber 10 and NWChem 5.1 is<br>
>> >> >> fully provided, but one of the CPUs must be slower, having to borrow<br>
>> >> >> memory from the other<br>
>> >> >> banks. The hardware status, after a period of complete darkness, is<br>
>> >> >> described in the attached lshw_deb64_7Jan2009.txt.<br>
>> >> >><br>
>> >> >> As each bank of Kingston DDR1 is filled 2+2+1+1 GB, I identified the<br>
>> >> >> faulty bank, removed all slots from there, and replaced the 1+1 GB<br>
>> >> >> slots at another bank with 2 + 2 GB from the faulty bank, so that<br>
>> >> >> now<br>
>> >> >> the computer is at 20GB. The situation is described in the attached<br>
>> >> >> lshw_deb64_lessCPU2_scrambling1G_2G_CPU4_7Jan2009.txt. Actually,<br>
>> >> >> identification of the CPU (CPU2) related to the faulty mem bank is<br>
>> >> >> insecure: I just considered the nearest CPU to the faulty bank. The<br>
>> >> >> manual is not helpful to this regard .<br>
>> >> >><br>
>> >> >> I understand that, in order to remove non-mainboard causes, I should<br>
>> >> >> be certain that a CPU has not lost memory control. Since replacing<br>
>> >> >> (I<br>
>> >> >> have one spare second-hand CPU) or scrambling, the CPUs is quite<br>
>> >> >> troublesome, and risky, in my context (there is very little space<br>
>> >> >> around the mainboard in the rack that I engineered to accept the<br>
>> >> >> mainboard). Ventilation is excellent, however.<br>
>> >> >><br>
>> >> >> Therefore, is it any software way to check if the CPUs are fully in<br>
>> >> >> order, including the memory controller? lshw and other software<br>
>> >> >> provided only partial help in my hands.<br>
>> >> >><br>
>> >> >> Also any other suggestion would be greatly appreciated.<br>
>> >> >><br>
>> >> >> Thanks for your kind attention<br>
>> >> >><br>
>> >> >> francesco pietra<br>
>> >> >> _______________________________________________<br>
>> >> >> Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a><br>
>> >> >> To change your subscription (digest mode or unsubscribe) visit<br>
>> >> >> <a href="http://www.beowulf.org/mailman/listinfo/beowulf" target="_blank">http://www.beowulf.org/mailman/listinfo/beowulf</a><br>
>> >> ><br>
>> >> ><br>
>> >> ><br>
>> >> > --<br>
>> >> > Jonathan Aquilina<br>
>> >> ><br>
>> ><br>
>> ><br>
>> ><br>
>> > --<br>
>> > Jonathan Aquilina<br>
>> ><br>
><br>
><br>
><br>
> --<br>
> Jonathan Aquilina<br>
><br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br>Jonathan Aquilina<br>