[Beowulf] Re: Problems scaling performance to more than one node, GbE
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Tiago Marques a28427 at ua.ptMon Feb 16 07:54:50 PST 2009
- Previous message: [Beowulf] Re: Problems scaling performance to more than one node, GbE
- Next message: [Beowulf] Re: Problems scaling performance to more than one node, GbE
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sat, Feb 14, 2009 at 6:43 PM, David Mathog <mathog at caltech.edu> wrote: > Tiago Marques <a28427 at ua.pt> > > > > I've been trying to get the best performance on a small cluster we > have here > > at University of Aveiro, Portugal, but I've not been enable to get most > > software to scale to more than one node. > > <SNIP> > > > The problem with this setup is that even calculations that take more > than 15 > > days don't scale to more than 8 cores, or one node. Usually performance > is > > lower with 16cores, 12 cores, than with just 8. From what I've been > reading, > > I should be able to scale fine at least till 16 cores and 32 for some > > software. > > <SNIP> > > > > I tried with Gromacs to have two nodes using one processor each, to > check if > > 8 cores were stressing the GbE too much, and the performance dropped too > > much compared with running two CPUs on the same node. > > Lots of possibilities here. Most of them are probably coming down to the > code not being written to make good use of a cluster environment, and/or > there not being any way to do that (single threaded code with a lot of > unpredictable branching). > > For Gromacs I suggest you ask on that mailing list. My recollection is > that it was known to scale poorly, but that was a couple of years ago, > and maybe they have improved it since then. If it doesn't scale you can > always get more throughput by running one independent job on each of > your nodes, using local storage to avoid network contention to the file > server. It may take 15 days to finish a run, but at least you'll have N > times more work completed. Running N independent jobs will give you at > least as much throughput as running 1 job on N cores. Admittedly it is > nice to have the results in 1/Nth the time. > Already did that, not too many helpful people on Gromacs list... They just told me to wait for 4.0 version, which I did, which scales better, though still not as I hoped. Were already running a single job per node for months but it would be good to have the chance to run jobs faster, sometimes it's needed. > Some of what you may be seeing with poorer performance on more cores on > one node is probably related to the effect on memory access, especially > through cache. Code that can go in and out of cache runs much faster > than anything which has to go to main memory, and as soon as you run two > competing (which depends on architecture) processes you may find that > the two programs are throwing each other's data out of any shared cache, > which can result in dramatic slowdowns. > > Give gprof a shot too. You want to see where your code is spending most > of its time. If it spends 95% of its time in routines with no network > IO, then the network is likely not your issue. And vice versa. > I have thought of that, but I didn't manage to do it on the more important codes. It compiles but just doesn't spit out the profiling output. I have used "iftop" to measure network usage and it's probably around 300-400Mbit/s, so I was poiting the problem at latency, throughput seems fine. While copying files with "scp", I can get 93MB/s. > > unexpected for me, since the benchmarks I've seen on Gromacs website > state > > that I should be able to have 100% scaling on this case, sometimes more. > > Contact the person who said that, get the exact conditions, and see if > you can replicate them. You might have a network issue, but unless you > are comparing apples to apples it may be hard to figure it out. True. Thanks for the help. I must ask, doesn't anybody on this list run like 16 cores on two nodes well, for a code and job that completes like in a week? Or most code that gets done in a week/two weeks only scales with InfiniBand and the like? For like 99% of the cases. Best regards, Tiago Marques > > > Regards, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20090216/0c7f46c5/attachment.html
- Previous message: [Beowulf] Re: Problems scaling performance to more than one node, GbE
- Next message: [Beowulf] Re: Problems scaling performance to more than one node, GbE
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
