> I mean that all of the memory on the card can be seen from the host via the
> PCI, and,
> therefore, that PCI transfers to the memory on the card can be directly to
> the final destination.

How about turning this on its head and installing a Phi directly on
the mainboard, with lots of memory attached (unless it has some
in-built limits, haven't checked...) and using the PCIe connection
only for I/O - for which it should be fast enough. To avoid changing
the architecture too much, attach the whole thing to an Atom CPU - low
power, but fast enough to handle the I/O and to control the
computation on the Phi.


