[Beowulf] NASTRAN on cluster
Joe Landman
landman at scalableinformatics.com
Tue Apr 12 06:58:27 PDT 2005
Mark Hahn wrote:
>>Nastran doesn't really want to run more than one job (MPI rank) per
>>node.
>
> I bet that isn't true on dual-opterons.
NASTRAN loves memory bandwidth, and hates sharing it. A properly built
dual, which does not have node-interleaving on, does pretty well.
>>The distro can/will have a significant impact on allocatable memory.
>>Nastran uses brk(2) to allocate memory, so the TASK_UNMAPPED_BASE is
>>significant.
>
> can nastran run on amd64? it might even run nicely as a ia32 process
Yes. This was just released.
> on amd64. just for curiosity:
It does.
[...]
>>I can't comment on SATA, but PATA disks are a really bad choice, as they
>>require too much effort from the CPU to drive them--SCSI is MUCH
>>preferred in that case.
>
> this is one of the longest-lived fallacies I've ever personally experienced.
> it was true 10+ years ago when PIO was the norm for ATA disks. busmastering
> has been the norm for PATA for a long while.
Actually under very intensive I/O load, cheap/crappy IDE controllers
flood the CPU with interrupts. Good quality ones do not. David (when
he was at MSC) and the rest of the team working on this stuff used
machines that had ... issues ... with their IDE interfaces. The SCSI
interfaces they used were top notch, most vendors tend to toss in IDE as
an afterthought. Very little attention was/is paid to building good
controllers. That leads to analyses/statements like this, which are
sometimes part of biases, but sometimes not (as in this case, they were
not).
Basically, I am saying he is right about the effect (poor PATA
experience on NASTRAN on their test systems), though the cause is not
because PATA is crappy, but because PATA chip sets are usually quite
crappy, and flood the CPU with interrupts. Then again, I think it is
arguable that the net effect of a crappy PATA chip set is a crappy PATA
experience. Which would tend to re-inforce this viewpoint.
Note also: I believe that when they tested this, they went in with an
open mind, and real test cases. As debugging IO systems wasn't their
purpose in life, they probed it as far as they needed, and drew their
conclusions.
>
>>As for CPU v. I/O. The factors are (in no order):
>>
>>fp performance
>>memory b/w
>>disk b/w
>>memory size
>>
>>Which of the above dominates the analysis depends on the analysis.
>
> for the reported symptoms (poor scaling when using the second processor),
> the first doesn't fit. memory bw certainly does, and is Intel's main
> weak spot right now. then again, disk bw and memory size could also fit
Note where David works now.
> the symptoms (since they're also resources shared on a dual), but would be
> diagnosable by other means (namely, both would result in low %CPU utilization;
> the latter (thrashing) would be pretty obvious from swap traffic/utilization.)
I have a ... sneaking suspicion ... that David knows what he is talking
about w.r.t. NASTRAN.
>
> like I said, I bet the problem is memory bandwidth. mainly because I just don't
> see programs waiting on disk that much anymore - it does happen, but large
> writes these days will stream at 50+ MB/s, and reads are often cached.
No Mark, MSC.NASTRAN is in a collection of programs that you generally
call IO monsters. They can and will consume every last MB/s that you
can throw at them on the IO channel for good sized problems. You most
definitely do not want a single spindle as your scratch space. At SGI
years ago, we were looking to try to provide GB/s sustainable
performance for NASTRAN on the file systems.
With NASTRAN, you want the widest non-blocking IO channel you can get
(widest in terms of MB/s, not physical bit depth). To get this, you
need to look at various striping techniques. You really do not want a
RAID5, a RAID3, or anything that calculates a parity structure getting
in the way of moving data.
NASTRAN does pound on the memory bandwidth (large linear algebra
systems), and on the FPU. I think they have been using the Intel
compilers so they get some advantage on Intels (and software disabling
of fast SSE paths on Opteron) for the IA32 code. The AMD64 variant code
was built with other compilers from what I understand.
Here NASTRAN should benefit significantly from a non-segmented memory
system (the 64bit AMD64 space is not a segment+offset addressing schema)
in terms of memory bandwidth.
> I should mention that if HT is enabled on these duals, the problem could be
> poor HT support in your kernels. (HT creates two virtual processors for
> each physical one. if the scheduler treats HT-virtual processors as real,
> you will get very poor speedup. this would also be diagnosable by simply
> running 'top' during a test.)
HT can help a few codes, but it will not generally help IO bound codes.
I have seen it as a win in a very restricted subset of computationally
intensive codes, and these are usually home-grown.
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 734 786 8452
cell : +1 734 612 4615
More information about the Beowulf
mailing list