[Beowulf] Questions regarding interconnects

Fri Mar 25 13:02:24 PST 2005

Hi Vincent,

Vincent Diepeveen wrote:
> I feel very important to look at is 'shmem' capabilities. 

> In order for B to receive, it has to have either a special thread 
> that regurarly polls. If you have a thread that polls say each 10
> milliseconds, then what's the use of using a highend network 
> card (other than it's DMA capabilities)?

You are in a situation where you don't have to wait for the message to 
arrive, you can move on and check 10 ms later. In this case, you don't 
care about network speed.

> However, it's very expensive to poll.

No, it's not. No in the OS-bypass world.

> On the other hand using the 'shmem', what happens is that A ships a
> nonblocking write to B of just a few bytes. The network card in B simply
> writes it in the RAM.
 >
> Now and then the searching process at B only has to poll its own main
> memory to see whether it has a '1'. So sometimes you lose a TLB trashing
> call to it, but other times it comes from L2 cache.

It's still polling. With message passing, you actually poll a queue in 
the MPI lib instead of a specific location in the user application. That 
helps when you are looking for several messages from several sources 
(got to poll several locations in you model).

> So for short messages which are latency sensitive that 'shmem' of quadrics
> is just far superior.

You are getting confused with words. "SHMEM" is a legacy shared memory 
interface that was used on Cray machines like the T3D. It's not a 
standard per se, it's a software interface. The implementations usually 
rest on top of remote memory operations (PUT/GET).

It always stike mean when people put "one-sided" and "latency sensitive" 
in the same sentence. "one-sided" means that you don't want to involve 
the remote side in the communication and "latency sensitive" means the 
other side is waiting for the communication.

In your example, you will be looking if someone has written in your 
memory every X ms. In this case, what do you care about latency ?

> Do other cards implement something similar?

You can do PUT on most high speed networks, this is a pretty basic 
functionality. The SHMEM interface may not be used because it makes 
sense only for former Cray customers, but look for portable RMA 
implementations like ARMCI for example.

> As far as i know they do not.

Do more research.

> The overhead of the MPI implementation layer *receiving* bytes is just so
> so huge. A cards theoretic one-way pingpong latency is just irrelevant to
> that, because that one way pingpong programs at all cards is eating 100%
> system time, effectively losing a full cpu.

You are mistaken about the MPI receive overhead. You are also mistaken 
in your belief than one-sided operations are the Silver bullets. RMA 
operations may be more appropriate to an application design, but it 
shares many constraints with message passing: you have to poll to know 
when it's done, you have to tell the other side where to write 
(equivalent to posting a recv). It has drawbacks like usually not 
scaling in space (each sender should write to a different location).

Patrick
-- 

Patrick Geoffray
Myricom, Inc.
http://www.myri.com