From saville at comcast.net Sat Dec 8 09:16:24 2007 From: saville at comcast.net (Gregg Germain) Date: Tue Nov 9 01:14:29 2010 Subject: [scyld-users] Cluster up - no action on slaves Message-ID: <475AD168.4040207@comcast.net> Hi all, I have the freeware version of SCYLD Beowulf up and running on a 5 node system. I've added the 4 slaves to the Master using Beosetup. The slaves boot and the status monitor shows them as being up. I can ping them using their IP address. I ran the beofdisk, beoboot-install, and bpctl commands as instructed by SCYLD. I have a number of questions, but basically I think all processes are running onthe Maser and none on the slaves: 1) What are the node names of the slaves? Are they 0,1,2,3? Or are they .0, .1, .2 and .3? 2) I can't ssh into a slave from the master - connection refused. Is this normal? Is there an account on each slave that I can log into? What would it's username and password be? 3) I ran a simple Hello World program (on the Master and two slaves), using MPI calls (not BeoMPI) and I get the following output: $ mpirun -np 3 HelloWorld I am the Master! Rank 0, size 3, name localhost.localdomain Rank 1, size 3, name .0 Rank 2, size 3, name .1 So things SEEM to be working. However the Beowulf Status Monitor statistics portion of the Slave nodes never budge. Ok maybe the program runs too quickly to get a reaction. 3) I run the program shown below. I don't have confidence that any process is actually running on a slave. So I have the slave (rank > 0) do an ifconfig and send the results to a file. I have it open the file and extract the IP address, and send that back to the Master for printing. I always get the Master's IP address - never the slaves: the Master's IP address is 192,168.0.3 the slave's IP addresses are: 192.168.1.100 192.168.1.101 Program output: I am the Master! Rank 0, size 3, name localhost.localdomain Rank 1, size 3, name .0 Extracted IP address: inet addr:192.168.0.3 Bcast:192.168.0.255 Mask:255.255.255.0 Rank 2, size 3, name .1 Extracted IP address: inet addr:192.168.0.3 Bcast:192.168.0.255 Mask:255.255.255.0 // // stand alone program to extract an IP address from an ifconfig call // #include #include #include #include #include // // MPI includes // #include /*using namespace std;*/ int main(int argc, char **argv) { int rank, size, partner; int namelen; char name[MPI_MAX_PROCESSOR_NAME]; char greeting[sizeof(name) + 100]; char IPline[sizeof(name) + 100]; char IPaddress[256]; char *startstring, *startpos, *endpos; int cmpval; FILE *IPfile; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Get_processor_name(name, &namelen); sprintf(greeting, "Rank %d, size %d, name %s\n", rank,size,name); // // Now do the important stuff based upon rank // if(rank == 0) { sprintf(greeting, "I am the Master! Rank %d, size %d, name %s\n", rank,size,name); fputs(greeting, stdout); for(partner=1; partner IP.txt"); IPfile = fopen("IP.txt", "r"); if (!IPfile) { sprintf(greeting, "\n ERROR - cannot find the file!\n"); //return -1; } else { fgets(IPline, 128, IPfile); fgets(IPline, 128, IPfile); startpos = strstr(IPline, "192.168"); if( startpos == NULL) sprintf(greeting, "\n sorry didn't find the IP address\n"); else { endpos= strstr(IPline, "Bcast" ); startstring = IPline; sprintf(greeting, "\nRank %d, size %d, name %s Extracted IP address: %s\n", rank, size, name, IPline); } } // end if (!IPfile) else fclose(IPfile); MPI_Send(greeting, strlen(greeting)+1, MPI_BYTE, 0, 1, MPI_COMM_WORLD); }/* end you are a slave */ MPI_Finalize(); exit(0); }//end MAIN 4) Lastly, I took the above program and inserted a 3 level, time wasting loop (for all ranks > 0) which causes the program to take 24 seconds to run. When I run it, the stats for the slaves in the Beo Status Monitor never budge. The Master's stats fluctuate. In short I think all the processes are running on the Master and none on the slaves. Running "top" on the Master shows all 3 processes on the Master. What is missing? Is there a networking step that I have to perform to get all this to work? /etc/hosts shows only one line: 127.0.0.1 localhost.localdomain localhost There's nothing in the SCYLD instructions that indicate any other setup steps that I have to do. Thanks for any help you can offer. Gregg From becker at scyld.com Sat Dec 8 10:42:39 2007 From: becker at scyld.com (Donald Becker) Date: Tue Nov 9 01:14:29 2010 Subject: [scyld-users] Cluster up - no action on slaves In-Reply-To: <475AD168.4040207@comcast.net> Message-ID: On Sat, 8 Dec 2007, Gregg Germain wrote: > I have the freeware version of SCYLD Beowulf up and running on a 5 > node system. I've added the 4 slaves to the Master using Beosetup. The > slaves boot and the status monitor shows them as being up. I can ping > them using their IP address. I ran the beofdisk, beoboot-install, and > bpctl commands as instructed by SCYLD. > > I have a number of questions, but basically I think all processes are > running onthe Maser and none on the slaves: > 1) What are the node names of the slaves? Are they 0,1,2,3? Or are they > .0, .1, .2 and .3? The slave nodes are permanently numbered, starting with 0. The ".23" names are interpreted by the Scyld tools and the BeoNSS name service. The first time a node boots to the point where it's a usable slave node, it's assigned a node number. That node number is associated with the network MAC address and written to the /etc/beowulf/config file. It stays the same until it's manually changed in the config file. (The 'beosetup' program changes the configuration file and then signals the booting and BProc subsystems so that they immediately notice the changes.) Most of our tools accept just the node number, but other tools and applications expect host names. The BeoNSS subsystem is a plug-in name service that translates ".23" into the proper IP address. Over time BeoNSS has evolved to accept a wider range of node names. Older versions only accepted ".23", "cluster.23" and "23.cluster", and returned ".23" as the only name on reverse look-ups. New versions allow specifying the node name format in the config file so you can name your nodes "myclusternode23", and that will be returned as the preferred host name. The 'hostname' part of BeoNSS uses the knowledge that the Scyld boot system assigns nodes sequential IP addresses. It only needs to know the IP address of node '0' (changing this will trigger a cluster reboot), and the maximum node number (updated consistently in the rare case it changes) to calculate node IP addresses. Not only is this a consistency and reliability improvement over the traditional approaches, it's a major performance win when establishing all-to-all communication on large clusters. As a convenience, we have a few special cases. The current master has a number and nickname of '.-1'. The current node is '.-2'. This allows you to store node numbers as integers without special-casing the output. > 2) I can't ssh into a slave from the master - connection refused. Is > this normal? Yes, it's the expected behavior. It part of the most interesting aspect of the Scyld cluster architecture. On the Scyld system instead using 'ssh' or 'rsh', you start an interactive shell on the slave node just as you would start an application. bpsh .23 /bin/sh -i Your will automatically be using the master's current version of /bin/sh, your current environment variables. You will be placed into your current working directory (or '/' if the path doesn't exist on the slave node). We call them "slave nodes" instead of "compute nodes" in the basic configuration because they are directly controlled by a master node. One aspect of this is that basic slave nodes have no deamons or services running on them. They are running only the programs the master started. This results in a very fast boot and lightweight environment, leaving almost all of the memory and a clean environment to applications to run in. Having a very simple slave node environment doesn't limit what you can do or run on the slave node, but it does mean you have to change your perspective slightly, and sometimes create a more traditional environment for some applications. Starting an interactive shell on the compute node is actually a more simpler and more natural solution if you aren't already experienced with 'ssh' or 'rsh'. Traditionally you would need create an account on the compute node (perhaps by copying out /etc/passwd and /etc/group), make certain that you have a home directory, and make certain that you have 'rsh' or 'ssh' configured (which can be quite tricky from scratch). And when you do log into the compute node using 'ssh', you may not have the environment you expect. The user shell might be different, your environment variables are probably not the same, and any environment variables you set interactively are certainly not the same. I could continue with all of the other reasons why this is a much better approach (security, no scheduling interference, consistency) if you are interested. > 3) I ran a simple Hello World program (on the Master and two slaves), > using MPI calls (not BeoMPI) and I get the following output: > > $ mpirun -np 3 HelloWorld > I am the Master! Rank 0, size 3, name localhost.localdomain > Rank 1, size 3, name .0 > Rank 2, size 3, name .1 > So things SEEM to be working. However the Beowulf Status Monitor > statistics portion of the Slave nodes never budge. Ok maybe the program > runs too quickly to get a reaction. It's likely that you are not seeing anything on the display because your program is so trivial. Compounding that is that the display tool, Beostatus, is set to a 5 second display update period by default. Beostat is reporting from each node once per second, so it's best to change the Beostatus update period to once per second. 'Beostat' is the name of the subsystem that gathers per-node state, status and statistics. It's also the name of the user-level program to display some of those statistics. The Beostat system allows nodes to send only one performance report. They report once per second, and a 'recvstat' process on the master writes the report in a shared memory region. Any program on the master can read this memory, so you can have many schedulers, display tools and mappers running without increasing the load on the compute nodes. > 3) I run the program shown below. I don't have confidence that any > process is actually running on a slave. So I have the slave (rank > 0) > do an ifconfig and send the results to a file. I have it open the file > and extract the IP address, and send that back to the Master for > printing. I always get the Master's IP address - never the slaves: The run above certainly spread the processes out over the cluster. This run didn't seem to. You can test how a job will be 'mapped' (spread out over the cluster) by running the 'beomap' program. This calls the mapping function that MPI will use and shows the output With four single-core nodes the output will likely look like: prompt> beomap -np 4 -1:0:1:2 prompt> beomap -np 4 -no-local 0:1:2:3 prompt> export NP=4 NO_LOCAL=1 prompt> beomap 0:1:2:3 -- Donald Becker becker@scyld.com Penguin Computing / Scyld Software www.penguincomputing.com www.scyld.com Annapolis MD and San Francisco CA