[Beowulf] Stream numbers for SiCortex's MIPS based SOC ...
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Larry Stewart larry.stewart at sicortex.comThu Dec 20 12:19:13 PST 2007
- Previous message: [Beowulf] Stream numbers for SiCortex's MIPS based SOC ...
- Next message: [Beowulf] Stream numbers for SiCortex's MIPS based SOC ...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
richard.walsh at comcast.net wrote: > All, > > Anyone seem Stream numbers for one and/or more cores from SiCortx, say > a SiCortex > Catapult System. The chip has two memory controllers, and I have > heard provides: > > "more than 10 Terabytesof bandwidth" > > in the largest configuration, but have not seen any measured memory > bandwidth numbers > for this box. Come to think of it, I have not seen measured number > for its interconnect > performance either. Sustaining a reasonable ratio bytes delivered from > memory to flops > should be easier on this processor with its lower clock, but is does > have 2 cores. I am > interested in how looks compared to Opteron, etc. It is supposed to > be a balanced > design, but it seems there are few measured results available to > validate this. > > As always your thoughts are appreciated ... > > Regards, > > rbw > -- The usual caveats apply: these are microbenchmarks, delivered application peformance and scalability are what matter. The metrics of interest may include absolute performance, cost/performance, and power/performance. The SiCortex machines have a substantially different balance of processing, memory, and communications than desktop machines. And don't forget they use about 600 milliwatts per core or 12 watts per node including 4 GB memory and the interconnect. Read on... Regarding the interconnect, we've got some published results in the 2007 Euro/PVM conference last October. I've just realized that that paper is not on our website, but I'll get that fixed. We've measured short message latency at 1.4 microseconds half-round trip (ping pong). This isn't as fast as some ping pong results, but when running at scale, the HPCC Random Ring latency is under 2 microseconds when all 648 cores of an SC648 are active at once. The fastest machine with 512 or more cores in the current HPCC results reports 2.3 microseconds. For large messages, the point to point bandwidth off-node is about 1.1 gigabytes/sec. That aggregate capacity seems to be shared fairly among all cores reading and writing, so HPCC random ring gets about 600 MB/sec per node on 108 nodes (1 core/node) and about 100 MB/sec per core when all 648 cores of the SC648 are running at once. Looking on the HPCC Results page for machines of that scale I find that the NEC SX-8, the Cray XT-3's, Columbia (Altix) and the new Intel Endeavor cluster are faster. Stream Triad gives 360 megabytes/sec when one core is active, and 340 megabytes/sec per core when all six cores are active at once. We're pleased that we can run all six cores at once with little degradation. The core we are currently using supports only a single outstanding cache miss and does not have a prefetch unit. The memory controllers themselves have enough bandwidth to supply all six cores, the DMA engine running the interconnect and the PCI express on I/O nodes, all at once. (At SC07 we measured 1100 MB/sec to a Myricom 10G running MX.) The main memory latency is about 104 nanoseconds, load to use, so the number of clock cycles to main memory is quite low. As a consequence of this balance: moderate speed cores, reasonably low latency memory (although not extreme bandwidth), and quite fast communications, benchmarks like HPCC Random Access run very well. The SC5832, for example, measures around 2.25 using the Sandia Labs version of the code, putting it sixth in current rankings behind the big BlueGene, the big XT3s, a Cray X1, and the Intel Endeavor cluster. Cost and power consumption comparisons are left as an exercise for the reader. -Larry
- Previous message: [Beowulf] Stream numbers for SiCortex's MIPS based SOC ...
- Next message: [Beowulf] Stream numbers for SiCortex's MIPS based SOC ...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
