[Beowulf] Fwd: NIS limitations question

Mon Feb 6 14:36:51 PST 2006

On Sun, 5 Feb 2006, Walid wrote:

> I belive i have seen on this maling list*, and other internet fourms** some
> limitation of NIS, but i have failed to find a documented limiation from
> SUN, or from the various linux distrubutions, did any one try to research
> the scalability of NIS servers?

The scalability depends on many details of your environment, and the 
timing of the requests.

Remember that NIS was designed for a workstation environment, where 
humans-rate requests generate asynchronous events.  It wasn't designed for 
a cluster environment where a single application generates queries from 
every node simultaneously, and where the system state (e.g. number of 
nodes) might change frequently and new applications expect the state to 
current.

There are ways to tune NIS (increase the backlog) to minimize the 
observable problems.  But you haven't fixed it, you have only made them 
less obvious for the current cluster scale and application set.

> The reason i am asking on a 256 nodes cluster using GigE with two nis Linux
> slaves we do see lots of rpc timeouts, the moment we added, an extra slave
> we have not experinced much, but in the other hand our solairs Linux slaves
> handles triple the amount of clients, and users have not reported problems.
> 
> so my question in these big clusters that have 256 nodes and more, what do
> people use for host, and name lookups?, and how much NIS slaves if any do
> they deploy? does any one know how many concurrent connections an NIS can
> handle ?

We developed a cluster-specific name service/ directory service called
BeoNSS.  It uses knowledge about the cluster structure to cache, compute
or avoid name lookups.  Some examples

Host map
   We number cluster compute nodes sequentially starting at '0', and 
     map them to sequential IP addresses.
   We then use names based on these numbers: node 23 is named
   ".23" with aliases "cluster.23" "23.cluster", "23.cluster0" and
   "<prefix>23".   BeoNSS knows these formats, and returns the address
   calculated from the known IP address of node 0 and other info (node 
   count, netmask, preferred interface, cluster name).

Netgroup map
   Netgroups are used for file server exports and security.
   We use much the same approach to generate a list of compute nodes
   names in the cluster.

Password and group
   We send credentials out with each job, so that the process has a
   preserved passwd and group entry.  BeoNSS uses the information to
   generate getpwent() entry for the user and a synthetic entry for 
   "root".  (Note that this approach automatically handles disjoint user 
   sets from multiple masters, and is one element of highly secure
   servers since the process doesn't have access to the list of
   other users.)

These are not the only name services that BeoNSS provides, but they are 
good examples of how a cluster-specific name service can make the cluster 
faster, easier to scale and more consistent.

BeoNSS works with other name services.  If an cluster requires other name 
services, it's easy to configure them as fall-back services.  This 
works very well, since BeoNSS handles the really troublesome queries (an 
application generating an all-to-all IP address map on each node 
simultaneously, or libc looking up a user name at start-up from a 10,000 
entry passwd map), while taking a negligible amount of time to return a 
soft fail ("don't know, ask the next service on the list").

There are other approaches that clusters have used:

The most obvious is copying out files to each /etc/.  This has the problem 
of consistency and synchronization.  You might think that you'll remember 
to push out new copies with each update.  But what about machines that are 
down?  Or booting?  Or up but not responding right now?

I've seen systems that use NSCD, the Name Service Caching Daemon.  
It's another "it seems to work for me, at least today" solutions. Like 
most caching systems, it reduces traffic in the common case.  But it 
doesn't handle update consistency, and won't handle the start-up backlog 
and dropped-request problem.  

-- 
Donald Becker				becker at scyld.com
Scyld Software	 			Scyld Beowulf cluster systems
914 Bay Ridge Road, Suite 220		www.scyld.com
Annapolis MD 21403			410-990-9993