Memory selection

Art Edwards edwardsa at plk.af.mil
Thu Jan 16 10:44:51 PST 2003


I tried to look on the amd site but the links to recommended
motherboards gave errors. I have purchased K7VTA3 motherboards for AMD
XP-2100 processors. I'm having MAJOR difficulties. The system loads
(Debian Linux with either 2.2.20 or 2.4.19 kernel) and
everything looks fine. However, when we run large jobs the systems
crash. 

Any insight would be welcomed as we have tried many things.

Art Edwards

On Thu, Jan 16, 2003 at 11:59:20AM -0500, Robert G. Brown wrote:
> On Thu, 16 Jan 2003, Dave Lane wrote:
> 
> > Hi all,
> > 
> > I'm going to be building a AMD MP-based beowulf system this spring and have 
> > a question or about memory selection for the nodes.
> > 
> > First question is which type of memory to choose: ECC, non-ECC, Registered 
> > or not. My understanding that registered memory is slower, but allows for 
> > more capacity (more chips) on a DIMM. This should not be an issue for me, 
> > since I expect that 512M in each node will be enough.
> 
> Read AMD's webpage(s) for their memory recommendations and follow them.
> AMD MP's in my experience are very sensitive, period, to just about
> everything in their engineering recommendations.  Use an "approved"
> power supply, approved memory, approved motherboard.
> 
> My recollection (without digging out a motherboard manual or rechecking
> their website myself) is that the recent dual AMD's all require
> registered ECC, and pretty high quality recc at that.  As always YMMV
> and somebody will probably chime in with how they tried other memory
> types and it worked, but our luck in that regard has not been good.
> 
> There are still plenty of memory vendors that make dimms that meet AMD's
> specs.  ECC runs a bit ($50?) above non-ECC for 512 MB PC2100 DIMMS;
> registered ECC costs about $10 more than that, or in the ballpark of
> $200 for 512 MB (probably less if you pricewatch for bleeding edge lows
> -- these are OTC retail). 
> 
> Quite a bit more than SDRAM...
> 
> > Second question is if memory errors occur with ECC memory does Linux know 
> > about and report problems in the logs? (this does occur on Sun Solaris 
> > systems)?
> 
> I think this is a FAQ -- do a search on the list archives, as I think
> Don Becker (?) may have answered it close to two or three times now, and
> has been generally discussed fairly extensively.;-)
> 
> If I recall the discussion correctly, some listvolken swear by ECC and
> run code on systems where they see memory errors turn up.  Others use
> any-old memory and don't see the errors, but neither do they see
> overwhelming evidence that their systems are constantly becoming
> corrupted.  But I could be misremembering.
> 
>    rgb
> 
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Art Edwards
Senior Research Physicist
Air Force Research Laboratory
Electronics Foundations Branch
KAFB, New Mexico

(505) 853-6042 (v)
(505) 846-2290 (f)



More information about the Beowulf mailing list