[Beowulf] Shared memory
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Vincent Diepeveen diep at xs4all.nlThu Jun 23 10:33:15 PDT 2005
- Previous message: [Beowulf] passwordless rsh/ssh
- Next message: [Beowulf] Shared memory
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Of course apart from the embarrassingly nature of certain software, where game tree search and artificial intelligence in general doesn't fall under, there is a simple programming skill difference between the average unix tool and commercial software. Probably the difference is that i've put 7 years of programming into the parallel algorithm of DIEP and 2 years fulltime to modify an excellent algorithm to work fine under NUMA conditions. It is commercial software, the best you can get in its fields, especially with respect to the actual speedup the algorithm gets out of a NUMA environment such as a quad opteron dual core provides. A single dual core cpu costs $823 or something similar clocked at 1.8Ghz, i'll leave it up to you to call that 'expensive'. This, whereas the average 'mpi' program, first gets slowed down factor 40 or so, but then the scientist in question just puts some signatures and gets 512 processors for a long period of time. Effectively 512 / 40 = factor 12 speedup. The scientist just cares shit, goes take a holiday and comes back, and writes a positive report. That's the difference between that scientist and me. I try to make software that also runs real fast at a SINGLE cpu for my clients. If you try to run over a cluster software that's utmost optimized with 50 algorithms to run ultimately fast single cpu, then getting a good speedup out of a cluster is not so easy. It's a simplistic programming skill difference nothing else. Now it is of course possible to get some sort of speedup out of a cluster, but you cannot compare a 500Mhz MIPS R14000 cluster of 512 processors with a quad dual core 2.2Ghz opteron. The quad dual core 2.2Ghz opteron just eats it alive. This where the SGI origin3800 cluster is a factor 1000 cheaper than the first. I deliberately call it cluster because at 512 cpu's the latency is similar to what todays network cards deliver and a factor 50 away from the latency speeds a quad opteron dual core delivers (one way ping pong TLB trashing memory reads and writes). Just suppose the speedup of say 10%-30% or something effectively at such a cluster at slow time controls i managed to get (let's not discuss the first 30 seconds as that's not fair for a cluster which needs 3 hours starting time to just allocate shared memory, let alone wake up 500 processors). I actually used 460 cpu's when running at that partition, as with 500 cpu's didn't work out real well. The scientist will claim then 20% in his report, which indeed was the average speedup i had, but the worst case is what counts in competative environments. The worst case was in fact around 10% (still guessed, could be worse, didn't have enough system time to do ANY statistical significant test as that would run for a week). 10% * 460 * 0.5Ghz = 23Ghz By any measure my average speedup of 20% was real good. Deep Blue team claimed around 5% speedup (no evidence given though). The 20% speedup i calculated for my program at the big machine also is based upon a lot of statistical inaccuracy, at such big chicken machines you never get enough system time to do some serious testing! When you calculate that to opteron speeds. With the improvements of compilers lately, an opteron is 2 times faster per cycle like that R14000 (off chip L2 cache, YES BABY!). So that 23Ghz ==> 11.5Ghz opteron. Actually a quad opteron dual core 2.2Ghz = 8 x 2.2 = 17.2Ghz And the speedup even worst case is real good at it, for sure far superior to 11.5Ghz effectively. Exact speedup numbers i'll have for you in not too many days from now. So a quad opteron dual core just completely outperforms such hardware, simply because you never can test seriously at a 512 processor origin3800. Of course this 512 processor machine would completely outperform the quad dual core opteron on the left and on the right, if each processor inside that machine would be a good cpu... ..like a dual core opteron inside! However that's not the case, such big iron machines usually have outdated cpu's. This where such a quad dual core opteron you order and you have it at home within a few work days. It's this where the beowulf system engineers have to fight against. "If you were plowing a field, which would you rather use? Two strong oxen or 1024 chickens?" Seymour Cray Vincent At 09:14 AM 6/23/2005 -0700, Michael Will wrote: >Michael Will wrote: > >> I was just yesterday benchmarking our A3400 quad-opteron with dual cores >> using UnixBench 4.1 which is not really an SMP benchmark except for the >> 8 and 16-concurrent shell script runs, and was not too impressed with >> the speed >> increase of those runs either, judging how much more the CPUs cost. >> >> Compare A3140 (dual opteron 248 single core) with A3400 (quad opteron >> 875 dual core): >> >> >> A3150/raid5 dual opteron 248 8G FC3 668 859 443 >> A1300 dual opteron 852 4G FC3 806 964 497 >> A1300 dual opteron 875 4G RHEL3u5 724 1329 744 >> A3400 quad opteron 875 32G RHEL3u5 736 1691 1030 > > >Sorry for the messed up table. Here we go. Index Score is a compound of >the weighted >results of UnixBench 4.1, 8-scripts and 16-scripts is the specific >result in lines per seconds >achieved when running 8 resp. 16 shell-scripts concurrently, which are >two partial tests of >the Benchmark Suite. > >Machine CPU RAM OS Index >Score 8-scripts lps 16-scripts lps >A3150/raid5 dual opteron 248 8G FC3 668 > 859 443 >A1300 dual opteron 852 4G FC3 >806 964 497 >A1300 dual opteron 875 4G RHEL3u5 >724 1329 744 >A3400 quad opteron 875 32G RHEL3u5 >736 1691 1030 > >Michael Will > > >
- Previous message: [Beowulf] passwordless rsh/ssh
- Next message: [Beowulf] Shared memory
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
