Athlon SDR/DDR stats for *specific* gaussian98 jobs
Robert G. Brown
rgb at phy.duke.edu
Fri May 4 08:13:54 PDT 2001
On Thu, 3 May 2001, Velocet wrote:
> IIRC, Moore's law was at 18months now. From everything2.com (because I knew
> I'd find it there, not because its authoritative):
>
> The observation that the logic density of silicon integrated circuits has
> closely followed the curve (bits per square inch) = 2^(t - 1962) where t is
> time in years; that is, the amount of information storable on a given amount
> of silicon has roughly doubled every year since the technology was invented.
> This relation, first uttered in 1964 by semiconductor engineer Gordon Moore
> (who co-founded Intel four years later) held until the late 1970s, at which
> point the doubling period slowed to 18 months. The doubling period remained
> at that value through time of writing (late 1999).
>
> This doesnt talk about the speed of the chips. Assuming it applies,
> however, as you have:
It does and it doesn't. Chip design over this period introduced the
notions of cache, on-chip parallelism, RISC (allowing less logic on the
CPU for greater effect), and much more.
Nothing like direct anecdotes. My own personal measurements on the Intel
architecture (with a very early precursor of cpu-rate:-) are:
~end of 1982, Original IBM PC, 8088 @ 4.77 MHz (8 bit), 10^4 flops
(peak double precision), basica (I didn't have access to a real
numerical compiler -- IBM's Fortran -- for a year or so and it still
yielded order of 10^4 flops, which went up to 10^5 or so with an
8087).
2001, P3 @ 933 MHz ~2x10^8 flops (cpu-rate peak double precision).
If we allow for 1.4 GHz in the P4 (which I haven't yet benched, but
maybe this weekend or next week) and multiply by a bit for architectural
improvements, we might reasonably call this a factor of 30,000 to 40,000
over around 18-19 years. Log base 2 of this is around 15, so Intel has
been just off a doubling a year. However, the pattern has been very
irregular; if the Itanium is released before year end at a decent clock
and doubles rates at constant clock (yielding perhaps 1 GFLOP?) then we
get log base 2 of 10^5 or more like 17. If we include Athlons as
"Intel-like" CPUs we are already at about 16, although there are
better/faster AMD's waiting in the wings as well likely to arrive before
year's end. Of course other people with other benchmarks may get other
numbers as well.
So a speed doubling time of a year is perhaps optimistic, but only by
weeks and even the weeks can depend on what year (sometimes what month
of what year) you measure in.
> So its close, but slightly losing. There are a few caveats here of course.
>
> - The people controling the money may want to see a fair number of results
> early on, instead of waiting the full 3 years.
<etc -- lots of good observations>
All fair enough. Still, all things being equal production will be
optimized finding a suitable purchase schedule that properly tracks the
ML curve, whatever it might be. This is a substantial advantage of the
beowulf architecture. It is one of the FEW supercomputing architectures
around with a smooth, consistent upgrade path at remarkably predictable
cost. One of my big early mistakes in this game involved buying a
"refrigerator" SGI 220S with two processors, thinking that in a few
years we could upgrade to 8 processors at a reasonable cost. Never
happened. One could buy single CPU systems that were as fast as all six
upgrade CPUs put together would have been for what was STILL the very
high cost of the upgrade when we saw the COTS light and just quit. When
we finally sold the $75000 box (only five years old) for $3000, we could
get a system that was faster on a single CPU basis than both processors
put together and then some for just about the cost of its software
maintenance agreement. Not to dis SGI -- they were filling a niche and
COTS clusters were still an idea in the process of happening (inspired
in part by the ubiquity of this general experience). However, Moore's
Law is particularly cruel to big-iron style all at once purchases.
If we'd spent that $75K at the rate of $15K/year over five years, we
would have gotten MUCH more work done, as by the end of that period we
were just getting to where clusters with $5K/nodes were really a decent
proposition, with Sun workstations (usually) being the commodity nodes
or COW components.
BTW, a related and not irrelevant question.
You have said that G98 is your dominant application -- are you doing
e.g. Quantum Chemistry? There is a faculty person here (Weitao Yang)
who is very interested in building a cluster to do quantum chemistry
codes that use Gaussian 98 and FFT's and that sort of thing, and he's
getting mediocre results with straight P3's on 100BT. I'm not familiar
enough with the problem to know if his results are poor because they are
IPC bound (and he should get a better network) or memory bound (get
alphas) or whatever. But I'd like to. Any general list-wisdom for
quantum chemistry applications? Is this an application likely to need
a high end architecture (e.g. Myrinet and e.g. Alpha or P4) or would a
tuned combination of something cheaper do as well?
rgb
--
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf
mailing list