You may be right.  The system I described is vintage mid-late 80s 
(when hot rod numeric computing was adding a coprocessor to the 
286).  However, the comment was more directed towards the observation 
of one poster that interconnect speed is unimportant to most 
developers.. I suspect this isn't the case, and neither is compiler 
technology at a point (some 20 years later!) that it can be done 

The early hypercubes from Intel were essentially an early poke at the 
Beowulf concept.  Lots of computation with basically stock compute 
nodes and basically stock interconnects (albeit on Intel Multibus or 
something similar.. I don't recall, and I'm not motivated enough to 
go out to the garage to find the docs)

In any case, I think we're a ways away from "automatic utilization of 
parallelizable resources", at least for general applications.  For 
some applications, the return on investment of figuring out how to do 
it is sufficiently great that it's worth doing, but those are, I 
think, more in the nature of "point designs", rather than generalized 
solutions.  Perhaps the existence of paralleled libraries for things 
like BLAS are a step in that direction.

For  myself.. I'd love to be able to spend my time figuring out how 
to automatically utilize a given interconnect method (given the 
interconnect properties)  It's a fascinating optimization problem; 
gosh, just figuring out how to describe the interconnect properties 
in an abstract way is an interesting problem. Sadly, there are other 
claims on my time, and they're the ones that pay me to eat.

