[Beowulf] Performance tuning for Jumbo Frames
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Patrick Geoffray patrick at myri.comSat Dec 12 08:40:49 PST 2009
- Previous message: [Beowulf] Performance tuning for Jumbo Frames
- Next message: [Beowulf] Performance tuning for Jumbo Frames
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Rahul, Rahul Nabar wrote: > I have seen a considerable performance boost for my codes by using > Jumbo Frames. But are there any systematic tools or strategies to > select the optimum MTU size? There is no optimal MTU size. This is the maximum payload you can fit in one packet, so there is no drawback to a bigger MTU. Actually, there is one in terms of wormhole switching, but switch contention is an issue happily ignored by most HPC users. > external world required of the interfaces) Have you guys found > performance to be MTU sensitive? A large MTU means fewer packets for the same amount of data transfered. In all stack processing, there is a per-packet overhead (decoding header, integrity, sequence number, etc) and a per-byte overhead (copy). A large MTU reduces the total per-packet overhead because there are less packets to process. Most 10GE NIC have no problems reaching line rate at 1500 Bytes (the standard Ethernet MTU), the problem is the host OS stack (mainly TCP) where the per-packet overhead is important. One trick that all 10GE NICs worth their salt are doing these days is to fake a large MTU at the OS level, while keeping the wire MTU at 1500 Bytes (for compatibility). This is called TSO (Transmit Send Offload) and LRO (Large Receive Offload). The OS stack is using a virtual MTU of 64K and the NIC does segmentation/reassembly in hardware, sort of. > Also, are there any switch side parameters that can affect the > performance of HPC codes? Specifically I was trying to run VASP which > is known to be latency sensitive. A large MTU has little to no impact on latency. > I have a 10 Gig E network with a > RDMA offload card and am getting average latencies (ping pong) using > rping of around 14 microsecs in the MPI tests. It is most likely due to the switch. Try back-to-back to measure without it. I don't know what hardware you are using, but you can get close to 10us latency over TCP with a standard 10GE NIC and interrupt coalescing disabled. With a NIC supporting OS-bypass (RDMA only make sense for bandwidth), you should get at least half that, ideally below 3us. Patrick
- Previous message: [Beowulf] Performance tuning for Jumbo Frames
- Next message: [Beowulf] Performance tuning for Jumbo Frames
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
