[Beowulf] automount on high ports
Perry E. Metzger
perry at piermont.com
Wed Jul 2 07:26:13 PDT 2008
Skip to the bottom for advice on how to make NFS only use non-prived
ports. My guess is still that it isn't priv ports that are causing
trouble, but I describe at the bottom what you need to do to get rid
of that issue entirely. I'd advise reading the rest, but the part
about how to disable the stuff is after the --- near the bottom.
Carsten Aulbert <carsten.aulbert at aei.mpg.de> writes:
> Well, I understand your reasoning, but that's contradicted to what we do see
>
> netstat -an|awk '/2049/ {print $4}'|sed 's/10.10.13.41://'|sort -n
>
> shows us the follwing:
Are those all mounts to ONE HOST? Because if they are, you're going to
run out of ports. If you're connecting to multiple hosts should
you be okay, but you certainly could run out of ports between two
hosts -- you only have 1023 prived connections from a given host to a
single port on another box.
Of course, one might validly ask why the other 650 odd ports aren't
usable -- clearly they should be, right? The limit is 1023, not
358. It might be that there is some Linux oddness here.
Anyway, this shouldn't be a problem if you're connecting to MANY
servers, but maybe there's some linux weirdness here. See below.
> Which corresponds exactly to the maximum achievable mounts of 358 right
> now. Besides, I'm far from being an expert on TCP/IP, but is it possible
> for a local process to bind to a port which is already in use but to
> another host?
Of course! You can use the same local port number with connections to
different remote hosts. You can even use the same local port number
with multiple connections to the same remote host provided the remote
host is using different port numbers on its end.
Every open socket is a 4-tuple of localip:localport:remoteip:remoteport
Provided two sockets don't share that 4-tuple, you can have both.
Now, a given OS may screw up how they handle this, but the *protocol*
certainly permits it. Perhaps you're right and Linux isn't dealing
with this gracefully. We can check that.
> I don't think so, but may be wrong.
Then how does an SMTP server handle thousands of simultaneous
connections all coming to port 25? :)
In any case, this is what the NFS FAQ says. It does mention the priv
port problem, but only in a context in which makes me think it is
talking about two given hosts and not one client and many
hosts. However, I might be wrong. See below:
>From http://nfs.sourceforge.net/
B3. Why can't I mount more than 255 NFS file systems on my client?
Why is it sometimes even less than 255?
A. On Linux, each mounted file system is assigned a major number,
which indicates what file system type it is (eg. ext3, nfs,
isofs); and a minor number, which makes it unique among the file
systems of the same type. In kernels prior to 2.6, Linux major and
minor numbers have only 8 bits, so they may range numerically from
zero to 255. Because a minor number has only 8 bits, a system can
mount only 255 file systems of the same type. So a system can
mount up to 255 NFS file systems, another 255 ext3 file system,
255 more iosfs file systems, and so on. Kernels after 2.6 have
20-bit wide minor numbers, which alleviate this restriction.
For the Linux NFS client, however, the problem is somewhat worse
because it is an anonymous file system. Local disk-based file
systems have a block device associated with them, but anonymous
file systems do not. /proc, for example, is an anonymous file
system, and so are other network file systems like AFS. All
anonymous file systems share the same major number, so there can
be a maximum of only 255 anonymous file systems mounted on a
single host.
Usually you won't need more than ten or twenty total NFS mounts on
any given client. In some large enterprises, though, your work and
users might be spread across hundreds of NFS file servers. To work
around the limitation on the number of NFS file systems you can
mount on a single host, we recommend that you set up and run one
of the automounter daemons for Linux. An automounter finds and
mounts file systems as they are needed, and unmounts any that it
finds are inactive. You can find more information on Linux
automounters here.
You may also run into a limit on the number of privileged network
ports on your system. The NFS client uses a unique socket with its
own port number for each NFS mount point. Using an automounter
helps address the limited number of available ports by
automatically unmounting file systems that are not in use, thus
freeing their network ports. NFS version 4 support in the Linux
NFS client uses a single socket per client-server pair, which also
helps increase the allowable number of NFS mount points on a
client.
Now, until you brought this up, I would have guessed that this meant
you could run out of priv ports between host A and host B -- i.e. host
B is the client, is connecting to one port on host A, and is trying to
mount more than 1023 file systems on host A and fails because it runs
out of priv ports. However, if your test is not between two hosts but
is rather between multiple hosts, perhaps for whatever reason Linux is
braindead and is not allowing you to re-use the same local socket
ports. We can diagnose that later.
---
So, here are the things you need to do to totally remove the priv
ports thing from the situation:
1) On the server, in your exports file you have to put the "insecure"
option onto every exported file system. Otherwise the mountd will
demand that the remote side use a "secure" mount. You've already
done this according to the initial mail message. However, that only
tells the server not to care if the client comes in from a port
above 1024
2) The client side is where the action is -- the client picks the port
it opens after all. Unfortunately, Linux DOES NOT have an option to
do this. BSD, Solaris, etc. do, but not Linux. You need to hack the
source to make it happen.
On a reasonably current source tree, go to:
/usr/src/linux/fs/nfs/mount_clnt.c
and look for the argument structure being built for rpc_create. You
need to or-in RPC_CLNT_CREATE_NONPRIVPORT to the .flags member, as
in (for example, depending on your version, this is 2.6.24):
.flags = RPC_CLNT_CREATE_INTR,
to
.flags = RPC_CLNT_CREATE_INTR | RPC_CLNT_CREATE_NONPRIVPORT,
This is a bloody ugly hack that will make ALL connections unprived,
so you might have trouble with "normal" mounts. This can be done
more cleanly, but it would require more than a one line
patch. However, it would get you through testing. If it works for
you and you really need it, a clean mount option could be added.
My guess is that this is not your problem! However, can check and see
if I'm wrong, and if I am, then we can move on to fixing it better.
Perry
--
Perry E. Metzger perry at piermont.com
More information about the Beowulf
mailing list