[Beowulf] CPU Benchmark Result
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduWed Dec 8 09:50:57 PST 2004
- Previous message: [Beowulf] CPU Benchmark Result
- Next message: [Beowulf] NEC4 and beowulf
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, 8 Dec 2004, Rajiv wrote: > Dear Sir, > I would like to get CPU benchmark results for various architectures. > Any good sites where I could find this information. I found the site > http://www.unc.edu/atn/hpc/performance/ useful. But I am unable to > access - I am asked for username and password. How can I access this > site. Here are at least some of the the primary/famous benchmarks: SPEC -- probably the "best" of the application-level benchmark suites. Fairly tight rules, but deep-pocketed vendors doubtless maintain an edge beyond just having decent hardware. lmbench -- I think without question the best of the microbenchmark suites. If you want to find out how fast the CPU does any basic operation, this is probably the first place to look. This suite is heavily used by the linux kernel developers including Linus Himself because it provides accurate and reproducible timings of things like interrupt handling rates, context switch rates, memory latency and bandwidth, and some selected CPU operational rates. stream -- If you are interested in CPU-memory combined rates in operations on streaming vectors (e.g. copy, add, multiply-add) stream is the microbenchmark of choice. Its one weakness is that it doesn't provide one (easily) with a picture of rates as a function of vector size, so that one cannot observe the variation as one increases the vector size across the various CPU cache sizes. It is therefore better suited (as a predictor of application performance) for people running large applications involving linear algebra than for people operating on small blocks of data. Oh, and another weakness is that it doesn't provide any measure that includes the division operation. This is important because some code REQUIRES division in a streaming context or otherwise, and division is often several times slower than multiplication. linpack -- Another linear algebra type benchmark. Not terribly relevant at the application level any more, and a bit too complex to be a microbenchmark -- IMHO this is a benchmark that could be retired without anyone really missing it for practical reasons. However, it is has been around a long time and there is a fair bit of data derived from it. When someone tells you how many "MFLOPS" a system has, they are probably referring to Linpack MFLOPS. Historically, this has been a highly misleading predictor of relative systems performance at the application level and has also proven relatively easy to "cheat" on a bit at the hardware and software level, but there it is. savage -- This is a nearly forgotten benchmark that measures how fast a system does transcendental function evaluations (e.g. sin, tan). These are typically library calls, but some CPUs have had them built into microcode so that they execute several times faster (typically) than library code. Libaries can also exhibit some variation depending on the algorithms used for evaluation. Some of these benchmarks are wrapped into one another. For example, the HPC Challenge suite will contain stream, and I recall that lmbench has stream available in it as well now (don't shoot me if that is wrong -- I'm just remembering and could be mistaken). My own benchmark wrapper, cpu_rate (available on my website below under either General or Beowulf, can't remember which) contains stream WITH a variable length vector size, a stream-like measure of "bogomflops" (arithmetic mean of +-*/ times/rates), savage, and a memory read/write test that permits one to shuffle the order of access to compare streaming with random access rates. It is still a bit buggy and is on my list for more work (along with about four other projects:-) over Xmas break, but what it is really designed to be is a shell for drop-in microbenchmarks of your own design (arbitrary code fragments). Benchmarking whole applications is easy -- just use wall-clock time. Benchmarking small code fragments is remarkably difficult, especially if their execution time is comparable to the time required to read the most accurate system clock avaiable (typically the onboard CPU cycle counter). Benchmarking e.g. library calls is difficult to do completely accurately, but you can get a decent idea from using the -p flag and gmon (profiler) where there is a bit of heisenberg uncertainty in all of these -- the process of measurement can change the results, hopefully not too much to be useful. I'm not providing URLs because all of the above can easily be found with google, and because I don't know the exact URLs of lists of results derived from the benchmarks anyway. SPEC is pretty good about publishing a result list per submitted architecture. stream has started to do this as well, although it is also (unfortunately) playing a variant of the "Top X" Game where vendors get to tune and are "ranked". lmbench has the strictest rules of them all -- no vendor tuning whatsoever and you have to publish a whole SUITE of results if you publish any one. The more I look at and write about this stuff, the more I appreciate what Larry (McVoy) is fighting against... rgb > Regards, > Rajiv -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: [Beowulf] CPU Benchmark Result
- Next message: [Beowulf] NEC4 and beowulf
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
