NFS Performance (was Re: [Beowulf] GPFS on Linux (x86))
Joe Landman
landman at scalableinformatics.com
Fri Sep 15 09:39:52 PDT 2006
Michael Will wrote:
> We are not using jumbo packets (they are not on by default, are they?)
do an
ifconfig | grep -i mtu
and see if anything is above 1500
> and
> I have tried both udp and tcp mounts, same symptom. The nics tested
> where
> syskonnect as well as tg3 and it behaved identically.
We had intel nics that worked great (on the server), and broadcom (tg3)
on the client. When I moved to an Intel nic it went away (kind of
expensive for a cluster if you have to buy the cards and insert them).
When we moved to the bcm5700 and forced tcp mounts and "normal" MTU the
problems went away. I seem to remember trying a syskonnect card using
the sk98lin module on the clients, and it worked fine, but I pulled down
the new version of the driver for some reason.
You might also be filtering RPC and portmap, or the "smart" switch could
be doing some of that as well.
What does rpcinfo report? This is for a Centos 4.3 looking at a SuSE
10.1 server.
[root at crunch-r ~]# rpcinfo -p dualcore
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 32768 status
100021 1 udp 32768 nlockmgr
100021 3 udp 32768 nlockmgr
100021 4 udp 32768 nlockmgr
100024 1 tcp 53220 status
100021 1 tcp 53220 nlockmgr
100021 3 tcp 53220 nlockmgr
100021 4 tcp 53220 nlockmgr
[root at crunch-r ~]#
I am not exporting any drives right now, so if I do a showmount ...
[root at crunch-r ~]# showmount -e dualcore
mount clntudp_create: RPC: Program not registered
>
> -----Original Message-----
> From: Joe Landman [mailto:landman at scalableinformatics.com]
> Sent: Friday, September 15, 2006 9:28 AM
> To: Michael Will
> Cc: Brent Franks; Chris Samuel; beowulf at beowulf.org
> Subject: Re: NFS Performance (was Re: [Beowulf] GPFS on Linux (x86))
>
> Michael Will wrote:
>> I am puzzled by an sles9sp3 (2.6.9 kernel) nfs server that serves
>> rhel3
>> (2.4.21 kernel) compute nodes. For some reason a lot of times the
>> mounts fail (with default as well as modified parameters). The symptom
>
>> is
>> mount: rpc timeout. The server logs all authentification requests as
>> successful. The switch is an oversubscribed hp 4108gl.
>
> Yes. This is what we ran into last year. A SuSE box serving a Rocks
> cluster (Rocks 4.0). Basic idea: use tcp mounts and turn off jumbo
> packets. Also, we had major issues with the tg3 driver, and moved the
> RHEL units to a BCM5700 driver. After this, most of the problems went
> away.
>
>
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 734 786 8452 or +1 866 888 3112
cell : +1 734 612 4615
More information about the Beowulf
mailing list