[Beowulf] how fast can NFS run?
Joe Landman
landman at scalableinformatics.com
Tue Jan 31 20:56:40 PST 2006
Hi Bruce
Bruce Allen wrote:
> I'd like to know the fastest that anyone has seen an NFS server run,
> over either a 10Gb/s ethernet link or a handful of link aggregated
> (channel-bonded) Gb/s ethernet lines.
If you allow us to go into the world of NFS-alike things, the Panasas
file system and server has hit about 2 GB/s in some of the testing we
had done more than a year ago.
We ran the same problem with NFS on the same hardware (different code
paths/file system name space) and it was suffering along at about 300 MB/s.
>
> This would be with a small number of clients making large file
> sequential reads from the same NFS host/server. Please assume that the
> NFS server has 'infinitely fast' disks.
This was ~32 compute nodes talking over a gigabit switch of some sort
(Nortel I think).
> I am told by one vendor that "NFS can't run faster than 100MB/sec". I
Hmmmm....
Maybe theirs can't ...
... or they are trying to sell you something ... :)
> don't understand or believe this. If the server's local disks can
> read/write at 300MB/s and the networking can run substantially faster
> than 100 MB/s, I don't see any constraint to faster operation. But
> perhaps someone on this list can provide real-world data (or say why it
> can't work).
.... ok, a number of different issues going on here
a) the 300 MB/s (SATA II, right?) is the max theoretical speed. You are
going to get something close to this in pure buffer to memory
transactions in specialized cases. Normally you will see 50-70 MB/s for
these disks for large block sequential reads. SATA also does a bit of
interrupting... you need a *good* SATA controller, or you will see your
interrupt rate go up 10x in heavy disk load times. Software RAID will
increase this a bit as well.
b) If this is gigabit, you get about 110 MB/s max in best case
scenarios, with the wind at your packets, along with a nice
gravitational potential, an a good switch to direct packets by. If this
is IB, you should be able to see quite a bit higher, though your PCI is
going to limit you. PCI-e is better (and HTX is *awesome*).
> Note: I am free to use modern versions of the NFS protocol, jumbo
> frames, large rsize/wsize, etc.
We had some issues about a year ago (not revisited recently) with RHEL3,
jumbo frames, and Broadcom gigabit adapters (tg3 was flaky, and bcm5700
was much more stable/faster). We reported it to RH, whose response at
the time was basically "go away". Wasn't an issue on the same hardware
using other distros.
With NFS, you are moving through a protocol stack (NFS) as well as a
transport stack (TCPIP). This is not cheap. However, there can be a
number of reasons why NFS appears slow for you or your vendor.
FWIW, we have customers with units we have built out that happily
support 2-400 MB/s over NFS without complaining, over gigabit (multiple
simultaneous clients hammering on the server). There are multiple
problems to overcome to get this working correctly and efficiently.
<speculation>
From what I can see on a 4 way system, I think it could support at
maximum about 2 GB/s of disk IO ( DMA access to ram ) per CPU connected
to an IO channel (most 4 ways have a single CPU connected to their IO
channels). The protocol is not cheap, and the processing overhead could
easily pare this down to 600-900 MB/s over a fast enough network fabric.
With some tweaking and tuning, you might be able to get this going a
little faster. You would need to speak to the IB folks, or the 10 Gbe
folks to see what they are really seeing. 1GB/s per adapter (10Gbe) is
doable over PCIe/HTX (if there were HTX cards for it). If they have
RDMA and TCP offload capability, you will likely get a win and some
better performance.
</speculation>
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 734 786 8452
cell : +1 734 612 4615
More information about the Beowulf
mailing list