[Beowulf] NASTRAN on cluster
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joe Landman landman at scalableinformatics.comTue Apr 12 06:58:27 PDT 2005
- Previous message: [Beowulf] NASTRAN on cluster
- Next message: [Beowulf] Re: Cooler room or cooler servers?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mark Hahn wrote: >>Nastran doesn't really want to run more than one job (MPI rank) per >>node. > > I bet that isn't true on dual-opterons. NASTRAN loves memory bandwidth, and hates sharing it. A properly built dual, which does not have node-interleaving on, does pretty well. >>The distro can/will have a significant impact on allocatable memory. >>Nastran uses brk(2) to allocate memory, so the TASK_UNMAPPED_BASE is >>significant. > > can nastran run on amd64? it might even run nicely as a ia32 process Yes. This was just released. > on amd64. just for curiosity: It does. [...] >>I can't comment on SATA, but PATA disks are a really bad choice, as they >>require too much effort from the CPU to drive them--SCSI is MUCH >>preferred in that case. > > this is one of the longest-lived fallacies I've ever personally experienced. > it was true 10+ years ago when PIO was the norm for ATA disks. busmastering > has been the norm for PATA for a long while. Actually under very intensive I/O load, cheap/crappy IDE controllers flood the CPU with interrupts. Good quality ones do not. David (when he was at MSC) and the rest of the team working on this stuff used machines that had ... issues ... with their IDE interfaces. The SCSI interfaces they used were top notch, most vendors tend to toss in IDE as an afterthought. Very little attention was/is paid to building good controllers. That leads to analyses/statements like this, which are sometimes part of biases, but sometimes not (as in this case, they were not). Basically, I am saying he is right about the effect (poor PATA experience on NASTRAN on their test systems), though the cause is not because PATA is crappy, but because PATA chip sets are usually quite crappy, and flood the CPU with interrupts. Then again, I think it is arguable that the net effect of a crappy PATA chip set is a crappy PATA experience. Which would tend to re-inforce this viewpoint. Note also: I believe that when they tested this, they went in with an open mind, and real test cases. As debugging IO systems wasn't their purpose in life, they probed it as far as they needed, and drew their conclusions. > >>As for CPU v. I/O. The factors are (in no order): >> >>fp performance >>memory b/w >>disk b/w >>memory size >> >>Which of the above dominates the analysis depends on the analysis. > > for the reported symptoms (poor scaling when using the second processor), > the first doesn't fit. memory bw certainly does, and is Intel's main > weak spot right now. then again, disk bw and memory size could also fit Note where David works now. > the symptoms (since they're also resources shared on a dual), but would be > diagnosable by other means (namely, both would result in low %CPU utilization; > the latter (thrashing) would be pretty obvious from swap traffic/utilization.) I have a ... sneaking suspicion ... that David knows what he is talking about w.r.t. NASTRAN. > > like I said, I bet the problem is memory bandwidth. mainly because I just don't > see programs waiting on disk that much anymore - it does happen, but large > writes these days will stream at 50+ MB/s, and reads are often cached. No Mark, MSC.NASTRAN is in a collection of programs that you generally call IO monsters. They can and will consume every last MB/s that you can throw at them on the IO channel for good sized problems. You most definitely do not want a single spindle as your scratch space. At SGI years ago, we were looking to try to provide GB/s sustainable performance for NASTRAN on the file systems. With NASTRAN, you want the widest non-blocking IO channel you can get (widest in terms of MB/s, not physical bit depth). To get this, you need to look at various striping techniques. You really do not want a RAID5, a RAID3, or anything that calculates a parity structure getting in the way of moving data. NASTRAN does pound on the memory bandwidth (large linear algebra systems), and on the FPU. I think they have been using the Intel compilers so they get some advantage on Intels (and software disabling of fast SSE paths on Opteron) for the IA32 code. The AMD64 variant code was built with other compilers from what I understand. Here NASTRAN should benefit significantly from a non-segmented memory system (the 64bit AMD64 space is not a segment+offset addressing schema) in terms of memory bandwidth. > I should mention that if HT is enabled on these duals, the problem could be > poor HT support in your kernels. (HT creates two virtual processors for > each physical one. if the scheduler treats HT-virtual processors as real, > you will get very poor speedup. this would also be diagnosable by simply > running 'top' during a test.) HT can help a few codes, but it will not generally help IO bound codes. I have seen it as a win in a very restricted subset of computationally intensive codes, and these are usually home-grown. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615
- Previous message: [Beowulf] NASTRAN on cluster
- Next message: [Beowulf] Re: Cooler room or cooler servers?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
