[Beowulf] fast interconnects, HT 3.0 ...

Richard Walsh rbw at ahpcrc.org
Wed May 24 07:09:23 PDT 2006

Jim Lux wrote:
> At 06:49 AM 5/23/2006, Richard Walsh wrote:
>> Eugen Leitl wrote:
>>> In regards to keeping the wires short, does this IBM trick of 
>>> keeping all
>>> wires equal-length work well on 3d lattices, and above? This would 
>>> seem to
>>> be a must for those coming (hopefully) Hypertransport motherboards 
>>> with connectors.
>>    Speaking of Hyper Transport 3.0 and its AC chassis-to-chassis 
>> capabilities and 10 to 20
>>    Gbps performance maximums one-way (non-coherent, off chassis I 
>> believe), what do the
>>    people that know say about scalability.  Are we looking at 
>> coherency within the board complex
>>    and basic reference ability off board or something else?
> WHere coherency means precisely what?  I don't see lockstep execution, 
> because when you're talking tens or hundreds of picoseconds per bit, 
> "simultaneous" is hard to define.  Two boxes a foot apart are going to 
> be many bit times different.  Something's got to take up the timing 
> slack, and while the classic "everything synchronous to a master 
> clock" has a simplicity of design, it has real speed limits (the light 
> time across the physical extent of the computational unit being but one).
    Jim, I meant cache coherence.  As we know, HT provides cache 
coherent and non-cache coherent
    memory management.  Typically within the board complex on an SMP 
device we want cache coherency.
    The HT 3.0 standard, as I understand it, offers off-chassis memory 
access at lower bit rates using AC power,
    but without cache coherence.  This is quite similar to the approach 
taken on the Cray X1 with cache coherent
    on-board images and non-coherent access off-board.  The Cray X1 
support the partitioned Global Address
    Space (pGAS) programming models of UPC and CAF.   The question here 
was: What do those that under
    stand HT 3.0 better than I do think about its ability to similarly 
support the pGAS programming style
    efficiently?  The follow up question was:  What might be the 
implications for commodity parallel programming
    in MPI.  I want to get a feel for HT 3.0s scalability in this 
context, the need/density of potential HT switches,

    The discussion on signal coherence was of course interesting ... ;-) ...
>>    Sounds like the Cray X1E pGAS memory model.  Is there a role for 
>> switches?  And then there is the
>>    its intersection with the  pGAS language extensions (UPC and CAF) 
>> ... raising the prospect of
>>    much better performance in a commodity regime, with possible 
>> implications for MPI use.
> Get to nanosecond latencies for messages, and corresponding fine 
> grained computation, and you're looking at algorithm design that is 
> latency tolerant, or, at least, latency aware.  As long as your 
> "computational granules" are microseconds, propagation delay can 
> probably be subsumed into a basic "computation interval", some part of 
> which is the time required to get the data from one place to another.
> At finer times, you're looking at things like systolic arrays and 
> pipelines.
>>    Anyone have a crystal ball or insights on this?
>>    rbw
> Jim


Richard B. Walsh

Project Manager
Network Computing Services, Inc.
Army High Performance Computing Research Center (AHPCRC)
rbw at ahpcrc.org  |  612.337.3467

This message (including any attachments) may contain proprietary or
privileged information, the use and disclosure of which is legally
restricted.  If you have received this message in error please notify
the sender by reply message, do not otherwise distribute it, and delete
this message, with all of its contents, from your files.

More information about the Beowulf mailing list