[Beowulf] difference between accelerators and co-processors

Sun Mar 10 13:03:14 PDT 2013

> Is there any line/point to make distinction between accelerators and
> co-processors (that are used in conjunction with the primary CPU to boost
> up the performance)? or these terms can be used interchangeably?

IMO, a coprocessor executes the same instruction stream as the
"primary" processor.  this was the case with the x87, for instance,
though the distinction became less significant once the x87 came onchip.
(though you certainly notice that FPU on any of these chips is mostly
separate - not sharing functional units or register files, sometimes even
with separate micro-op schedulers.)

> Specifically, the word "accelerator" is used commonly with GPU. On the
> other hand  the word "co-processors" is used commonly with Xeon Phi.

I don't think it is a useful distinction: both are basiclly independent
computers.  obviously, the programming model of Phi is dramatically more
like a conventional processor than Nvidia.

there is a meaningful distinction between offload and coprocessor approaches.
that is, offload means you use the device to accelerate a set of libraries
(offload matrix multiply, eig, fft, etc).  to use a coprocessor, I think the
expectation is that the main code will be very much aware of the state of the
PCIe-attached hardware.

I suppose one might suggest that "accelerator" to some extent implies 
offload usage: you're accelerating a library.

another interesting example is AMD's upcoming HSA concept: since nearly all
GPUs are now on-chip, AMD wants to integrate the CPU and GPU programming
models (at least to some extent).  as far as I understand it, HSA is based
on introducing a quite general intermediate ISA that can be executed using
all available hardware resources: CPU and/or GPU.  although Nvidia does have
its own intermediate ISA, they don't seem to be trying to make it general,
*and* they don't seem interested in making it work on both C/GPU.  (well,
so far at least - I wouldn't be surprised if they _did_ have a PTX JIT for
their ARM-based C/GPU chips...)

I think HSA is potentially interesting for HPC, too.  I really expect 
AMD and/or Intel to ship products this year that have a C/GPU chip mounted on
the same interposer as some high-bandwidth ram.  a fixed amount of very high
performance memory sounds very tasty to me.  a surprising amount of power
in current systems is spend getting high-speed signals off-socket.

imagine a package dissipating say 40W containing a, say, 4 CPU cores,
256 GPU ALUs and 2GB of gddr5.  the point would be to tile 32 of them
in a 1U box.  (dropping socketed, off-package dram would probably make 
it uninteresting for memcached and some space-intensive HPC.

then again, if you think carefully about the numbers, any code today 
that has a big working set is almost as anachronistic as codes that use 
disk-based algorithms.  (same conceptual thing happening: capacity is 
growing much faster than the pipe.)

regards, mark hahn.