[Beowulf] Power calculations , double precision, ECC and power of APU's

Craig Tierney - NOAA Affiliate craig.tierney at noaa.gov
Mon Mar 18 12:49:44 PDT 2013

On Mon, Mar 18, 2013 at 1:04 PM, Mark Hahn <hahn at mcmaster.ca> wrote:
>> flame-wars?  The people in HPC who care about SP gflops are those who
>> understand the mathematics in their algorithms and don't want to waste
>> very precious memory bandwidth by unnecessarily promoting their
> I'm not disagreeing, but wonder if you'd mind talking about SP-friendly
> algorithms a bit.  I can see it in principle, but always come up short
> when thinking about, eg simulation-type modeling with such low-precision
> numbers.  does someone really comb through all the calculations, looking for
> iffy stuff?  (iffy stuff can be pretty subtle - ordering and theoretically
> equivalent substitutions.  maybe there are compiler-based
> tools that perform this sort of analysis automatically?)
> your mention of precious bandwidth is actually relevant to one of the
> earlier threads - that there is an upcoming opportunity for cpu vendors
> to integrate decent (much bigger than cache) amounts of wide storage
> within the package, using 2.5d integration.  if there were enough fast
> in-package ram, it would presumably not be worthwhile to drive any
> off-package ram - any speculation on that threshold?
> regards, mark hahn.


My experience has been with Weather and Ocean models.  In the current
weather business, we care about models that can run out to about 14
days, so I am not talking about the 100s of years of simulation done
in climate.  Our next generation weather and hurricane models are
still single precision, with a scattering of double precision in those
places that is required.  I don't have an exact answer as to how the
computational scientists figure out where double precision is needed,
but I know they do (I will go ask for some examples).

I know in my previous life (15 years ago), I have done the same thing.
 I was working with an ocean model and tailoring it to do ocean tide
prediction.  The model came from a Cray where all ops were DP to start
with.  While optimizing and porting (ie. bug fixing), I found that on
non-Cray systems the code ran just fine (within precision) in SP
except for one routine, the calculation of the tidal forcing.
Comparisons to real data showed no significant difference in the
answer with the SP/DP version than the only DP.  How did I find it?  I
was a grad student, so trial and error.  Thinking back about it now I
probably could have thought about which quantities were small and
would have issues with SP precision, and started with those.

We have been working with GPUs and now MIC for the last 5 years to
find more cost effective architectures for our dwindling hardware
budgets.  We liked GPUs in the beginning because the memory bandwidth
between what a GPU can do (internally) and a socket of Intel/AMD was
about a 10x difference.  As the years go on the ratio continues to get
closer and closer and we get less and less out of the new
architectures.  Yes, the new stackable memory is something we are
looking forward to, but I will wait until I can run on it to determine
if we get our memory bandwidth performance curve back (which I highly
doubt).  Our memory footprint generally is not that high, so keeping
the memory in package can work (like it generally does with GPUs and
MICs now).  I will wait until I can actually runs things before
passing judgement.

I know that there are many domains where DP (or more) is going to be
needed.  I know that as we extend our model from weather prediction (<
14 days) to climate prediction ( ~ 90-180 days), then SP may not be
adequate.  My point was that we do real HPC, and SP performance is
something we still care about and saying HPC == DP is not accurate.


More information about the Beowulf mailing list