[Beowulf] automount on high ports

Carsten Aulbert carsten.aulbert at aei.mpg.de
Wed Jul 2 00:26:58 PDT 2008


Hi Perry,

Perry E. Metzger wrote:

> 
> All NFS clients are connecting to a single port, not to a different
> port for every NFS export. You do not need 1400 listening TCP ports on
> a server to export 1400 different file systems. Only one port is
> needed, whether you are exporting one file system or one million, just
> as only one SMTP port is needed whether you are receiving mail from
> one client or from one million.
> 
That's clear and not the problem

> The clients are connecting from ports below 1024 because Berkeley set
> up a hack in the original BSD stack so that only root could open ports
> below 1024. This way, you could "know" the process on the remote host
> was a root process, thus you could feel "secure" [sic]. It doesn't add
> any real security any more, but it is also not the cause of any
> problem you are experiencing.

We might run out of "secure" ports.

> We can help you figure this out, but you will have to give a lot more
> detail about the problem. Please describe your network setup. How many
> servers do you have? How many clients? How many file systems are those
> servers exporting? How many is a typical client mounting, and why?
> Start there and we can try to move forward.
> 

OK, we have 1342 nodes which act as servers as well as clients. Every
node exports a single local directory and all other nodes can mount this.

What we do now to optimize the available bandwidth and IOs is spread
millions of files according to a hash algorithm to all nodes (multiple
copies as well) and then run a few 1000 jobs opening one file from one
box then one file from the other box and so on. With a short autofs
timeout that ought to work. Typically it is possible that a single
process opens about 10-15 files per second, i.e. making 10-15 mounts per
second. With 4 parallel process per node that's 40-60 mounts/second.
With a timeout of 5 seconds we should roughly have 200-300 concurrent
mounts (on average, no idea abut the variance).

Our tests so far have shown that sometimes a node keeps a few mounts
open (autofs4 problems AFAIK) and at some point is not able to mount
more shares. Usually this occurs at about 350 mounts and we are not yet
100% sure if we are running out of secure ports.

All our boxes export now with "insecure" option (NFSv3), but our clients
all connect from a "secure" port, anyone here who might give us a hint
how to force this in Linux?

Thanks a lot

Carsten




More information about the Beowulf mailing list