[Beowulf] fast interconnects, HT 3.0 ...
Richard Walsh
rbw at ahpcrc.org
Wed May 24 07:09:23 PDT 2006
Jim Lux wrote:
> At 06:49 AM 5/23/2006, Richard Walsh wrote:
>> Eugen Leitl wrote:
>>> In regards to keeping the wires short, does this IBM trick of
>>> keeping all
>>> wires equal-length work well on 3d lattices, and above? This would
>>> seem to
>>> be a must for those coming (hopefully) Hypertransport motherboards
>>> with connectors.
>>>
>> Speaking of Hyper Transport 3.0 and its AC chassis-to-chassis
>> capabilities and 10 to 20
>> Gbps performance maximums one-way (non-coherent, off chassis I
>> believe), what do the
>> people that know say about scalability. Are we looking at
>> coherency within the board complex
>> and basic reference ability off board or something else?
>
> WHere coherency means precisely what? I don't see lockstep execution,
> because when you're talking tens or hundreds of picoseconds per bit,
> "simultaneous" is hard to define. Two boxes a foot apart are going to
> be many bit times different. Something's got to take up the timing
> slack, and while the classic "everything synchronous to a master
> clock" has a simplicity of design, it has real speed limits (the light
> time across the physical extent of the computational unit being but one).
Jim, I meant cache coherence. As we know, HT provides cache
coherent and non-cache coherent
memory management. Typically within the board complex on an SMP
device we want cache coherency.
The HT 3.0 standard, as I understand it, offers off-chassis memory
access at lower bit rates using AC power,
but without cache coherence. This is quite similar to the approach
taken on the Cray X1 with cache coherent
on-board images and non-coherent access off-board. The Cray X1
support the partitioned Global Address
Space (pGAS) programming models of UPC and CAF. The question here
was: What do those that under
stand HT 3.0 better than I do think about its ability to similarly
support the pGAS programming style
efficiently? The follow up question was: What might be the
implications for commodity parallel programming
in MPI. I want to get a feel for HT 3.0s scalability in this
context, the need/density of potential HT switches,
etc.
The discussion on signal coherence was of course interesting ... ;-) ...
>
>
>
>> Sounds like the Cray X1E pGAS memory model. Is there a role for
>> switches? And then there is the
>> its intersection with the pGAS language extensions (UPC and CAF)
>> ... raising the prospect of
>> much better performance in a commodity regime, with possible
>> implications for MPI use.
>
> Get to nanosecond latencies for messages, and corresponding fine
> grained computation, and you're looking at algorithm design that is
> latency tolerant, or, at least, latency aware. As long as your
> "computational granules" are microseconds, propagation delay can
> probably be subsumed into a basic "computation interval", some part of
> which is the time required to get the data from one place to another.
>
> At finer times, you're looking at things like systolic arrays and
> pipelines.
>
>
>> Anyone have a crystal ball or insights on this?
>>
>> rbw
>
> Jim
>
--
Richard B. Walsh
Project Manager
Network Computing Services, Inc.
Army High Performance Computing Research Center (AHPCRC)
rbw at ahpcrc.org | 612.337.3467
-----------------------------------------------------------------------
This message (including any attachments) may contain proprietary or
privileged information, the use and disclosure of which is legally
restricted. If you have received this message in error please notify
the sender by reply message, do not otherwise distribute it, and delete
this message, with all of its contents, from your files.
-----------------------------------------------------------------------
More information about the Beowulf
mailing list