[Beowulf] automount on high ports

Perry E. Metzger perry at piermont.com
Wed Jul 2 05:28:48 PDT 2008


Carsten Aulbert <carsten.aulbert at aei.mpg.de> writes:
>> The clients are connecting from ports below 1024 because Berkeley set
>> up a hack in the original BSD stack so that only root could open ports
>> below 1024. This way, you could "know" the process on the remote host
>> was a root process, thus you could feel "secure" [sic]. It doesn't add
>> any real security any more, but it is also not the cause of any
>> problem you are experiencing.
>
> We might run out of "secure" ports.

A given client would need to be forming over 1000 connections to a
given server NFS port for that to be a problem. This is not going to
happen. The protocol doesn't work in such a way as to cause that to
occur.

>> We can help you figure this out, but you will have to give a lot more
>> detail about the problem. Please describe your network setup. How many
>> servers do you have? How many clients? How many file systems are those
>> servers exporting? How many is a typical client mounting, and why?
>> Start there and we can try to move forward.
>
> OK, we have 1342 nodes which act as servers as well as clients. Every
> node exports a single local directory and all other nodes can mount this.

Okay. In this instance, you're not going to run out of ports. Every
machine might get 1341 connections from clients, and every machine
might make 1341 client connections going out to other machines. None
of this should cause you to run out of ports, period. If you don't
understand that, refer back to my original message. A TCP socket is a
unique 4-tuple. The host:port 2-tuples are NOT unique and not an
exhaustible resource. There is is no way that your case is going to
even remotely exhaust the 4-tuple space.

> What we do now to optimize the available bandwidth and IOs is spread
> millions of files according to a hash algorithm to all nodes (multiple
> copies as well) and then run a few 1000 jobs opening one file from one
> box then one file from the other box and so on. With a short autofs
> timeout that ought to work.

I think there is no point in having a short autofs timeout, and you're
likely to radically increase the overhead when you open files.

> Our tests so far have shown that sometimes a node keeps a few mounts
> open (autofs4 problems AFAIK) and at some point is not able to mount
> more shares. Usually this occurs at about 350 mounts and we are not yet
> 100% sure if we are running out of secure ports.

You probably aren't running out of ports per se. You may be running
out of OS resources, like file descriptors or something similar.


-- 
Perry E. Metzger		perry at piermont.com



More information about the Beowulf mailing list