[Beowulf] 1.2 us IB latency?
Steffen Persvold
steffen.persvold at scali.com
Fri Apr 20 08:08:17 PDT 2007
> -----Original Message-----
> From: Patrick Geoffray [mailto:patrick at myri.com]
> Sent: Thursday, April 19, 2007 6:47 PM
> To: Steffen Persvold; Beowulf Mailing List
> Subject: Re: [Beowulf] 1.2 us IB latency?
>
> greg.lindahl at qlogic.com wrote:
> >> Back then we were struggling with PIO transfers and how they were
> >> treated in the CPU/North bridge (write combining and all that). I
> >> believe this might still be an issue, correct ?
>
> WC is well implemented on Opteron, it will aggregate consecutive PIO
> writes at 16, 32 and 64 Bytes smoothly. On Intel processors, this is
> more painful: WC is only 64 Bytes. If you flush the WC buffer with
less
> than 64 bytes in it, you will see multiple 8-byte PIO writes, and not
> always in order.
Yeah, that rings a bell... :)
So I'm guessing, both Myrinet MX and Qlogic Infinipath (confirmed) is
using PIO for "small" messages. Are we sure that Mellanox ConnectX
doesn't ? It seems they would have to in order to get the 1.2us numbers.
There's nothing that stops them from doing :
verbs_post_rdma_write() {
...
if (msg_size < MAX_PIO_TRESHOLD) {
copybuffertoremotewithpio();
} else {
setupdmaengine();
}
...
}
Or something of that order.. However, they claim that it's "fully
offloaded", so I'm not sure..
>
> > cases we can manipulate the mtrrs after boot to fix this. Getting
> > formal support for PAT in the Linux kernel is the long-term fix for
> > this.
>
> It's interesting to note that most current OSes have native PAT
support,
> except Linux. Even Windows does it well :-)
>
Hmm, I seem to remember having PAT support working fine with SCI on
Linux a couple of years ago. We started using PAT on x86_64 because of
the nightmare with MTRR and memory holes/overlapping regions (BIOSes
never seemed to get it right) especially on boxes with >4GB memory
(which became more and more common with the introduction of x86_64).
Cheers,
Steffen Persvold
Technical Director Americas
tel. 508-281-7100 x401
fax. 508-281-7171
http://www.scali.com/
Higher Performance Computing
More information about the Beowulf
mailing list