[Beowulf] LAM -beowulf problems
Reuti
reuti at staff.uni-marburg.de
Sun Dec 24 13:53:02 PST 2006
Hi,
first of all I would suggest to look into the most recent version of
LAM/MPI, which is 7.1.2 or OpenMPI.
Which shell are you using? For bash maybe you have to add the PATH to
your LAM/MPI binaries in .bashrc i.e. a file, that is sourced during
a non-interactive login.
-- Reuti
Am 20.12.2006 um 17:45 schrieb Mr. Sumit Saxena:
> Hi
> I am new to linux as well as beowulf, please help me.
> I tried to hook up two machines and run LAM but I am not able to
> lamboot. I can lamboot on each machine individually but not from
> master
> to master and slave. I have provided the link of the libraries of LAM
> in my ld.so.conf as wellas .bash_profile, still I see the following
> error message. Also I am able to ssh into machines without
> passwords. I
> followed the following document to setup my machines
> http://tldp.org/HOWTO/html_single/Beowulf-HOWTO/
> ++++++++++++++++++++++++++++++++++++++++++++++
> LAM 6.5.9/MPI 2 C++ - Indiana University
>
> Executing hboot on n0 (surya01 - 1 CPU)...
> Executing hboot on n1 (surya02 - 1 CPU)...
> bash: line 1: hboot: command not found
> ----------------------------------------------------------------------
> -------
> LAM failed to execute a LAM binary on the remote node "surya02".
> Since LAM was already able to determine your remote shell as "hboot",
> it is probable that this is not an authentication problem.
>
> LAM tried to use the remote agent command "ssh"
> to invoke the following command:
>
> ssh -x surya02 -n hboot -t -c lam-conf.lam -v -s -I "-H
> 192.168.13.1 -P
> 33628 -n 1 -o 0 "
>
> This can indicate several things. You should check the following:
>
> - The LAM binaries are in your $PATH
> - You can run the LAM binaries
> - The $PATH variable is set properly before your
> .cshrc/.profile exits
>
> Try to invoke the command listed above manually at a Unix prompt.
>
> You will need to configure your local setup such that you will *not*
> be prompted for a password to invoke this command on the remote node.
> No output should be printed from the remote node before the output of
> the command is displayed.
>
> When you can get this command to execute successfully by hand, LAM
> will probably be able to function properly.
> ----------------------------------------------------------------------
> --
> -----
> ----------------------------------------------------------------------
> --
> -----
> lamboot encountered some error (see above) during the boot process,
> and will now attempt to kill all nodes that it was previously able to
> boot (if any).
>
> Please wait for LAM to finish; if you interrupt this process, you may
> have LAM daemons still running on remote nodes.
> ----------------------------------------------------------------------
> --
> -----
> wipe ...
>
> LAM 6.5.9/MPI 2 C++ - Indiana University
>
> Executing tkill on n0 (surya01)...
>
> ++++++++++++++++++++++++++++++++++++++++++++++
> please help
> kind regards
> Sumit
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf
mailing list