[Beowulf] Cannot use more than two nodes on cluster

Andrew Holway andrew.holway at gmail.com
Thu Sep 20 13:10:16 PDT 2012


Hi,

Are you sure that you replicated your hostfile to all of your nodes?

Please can I see the output of your hosts file?

Thanks,

Andrew


2012/9/20 Antti Korhonen <akorhonen at theranos.com>:
> Hi Vincent
>
> Master works with all slaves.
> M0+S1 works, M0+S2 works, M0+S3 works.
> All nodes work fine as single nodes.
>
> Here is my start command (trying to use 3 nodes with 4 cores on each):
>
> Executing: /mirror/OpenFOAM/ThirdParty-2.1.x/platforms/linux64Gcc/openmpi-1.5.3/
> bin/mpirun -np 12 -hostfile machines /mirror/OpenFOAM/OpenFOAM-2.1.x/bin/foamExe
> c -prefix /mirror/OpenFOAM interFoam -parallel | tee log
>
> I will search for node limitations in configs.
>
>   Antti
>
>
> -----Original Message-----
> From: Vincent Diepeveen [mailto:diep at xs4all.nl]
> Sent: Thursday, September 20, 2012 8:01 AM
> To: Antti Korhonen
> Cc: 'Jörg Saßmannshausen'; beowulf at beowulf.org
> Subject: Re: [Beowulf] Cannot use more than two nodes on cluster
>
> Hi Antti,
>
> You describe just 1 master and 1 slave works.
> Is it 1 specific slave that works and not the other slaves?
>
> So you have machines M0, S0,S1,S2
>
> Is only M0 + S0 working and not M0+S1 nor M0+S2 ?
>
> What parallel shell are you using to start the jobs?
> Is it the free pdsh?
>
> What command do you issue to start the jobs?
> How many processes do you start at once and are the 3 slave nodes having the same number of cores?
>
> Somewhere there must have a limit set; in most environments it's possible to restrict users in how many processes they're allowed to execute simultaneously.
>
> Maybe the default of the environment you use has this limit set to 2 nodes.
>
> What network is your cluster using?
>
> On Sep 20, 2012, at 4:37 PM, Antti Korhonen wrote:
>
>> I tested ssh with all combinations and that part is working as
>> designed.
>>
>> I can start job manually on any single node.
>> I can start jobs on any two  nodes , as long as other node is master.
>> All other combinations hang  and jobs do not start.
>>
>> I read through few install guides and did not find any steps I missed.
>> I am using Ubuntu 12.04, in case that makes any difference.
>>
>>   Antti
>>
>> -----Original Message-----
>> From: beowulf-bounces at beowulf.org [mailto:beowulf-
>> bounces at beowulf.org] On Behalf Of Jörg Saßmannshausen
>> Sent: Thursday, September 20, 2012 1:42 AM
>> To: beowulf at beowulf.org
>> Subject: Re: [Beowulf] Cannot use more than two nodes on cluster
>>
>> Hi all,
>>
>> have you tried the following: ssh master -> node1 -> node2, i.e.
>> ssh from the master to node1 and from there to node2?
>> You do not have a situation where the remote host-key is not in the
>> database and hence you get asked about adding that key to the local
>> database?
>>
>> If that is working with all permutations, another possibility is that
>> your host list is somehow messed up when you are submitting parallel
>> jobs. Can you start the jobs manually by providing a host list to the
>> MPI program you are using? Does that work or do you have problems here
>> as well?
>>
>> My two pennies
>>
>> Jörg
>>
>>
>> On Thursday 20 September 2012 07:40:56 Antti Korhonen wrote:
>>> Passwordless SSH works between all nodes.
>>> Firewalls are disabled.
>>>
>>>
>>> From: greg at r-hpc.com [mailto:greg at r-hpc.com] On Behalf Of Greg Keller
>>> Sent: Wednesday, September 19, 2012 8:43 PM
>>> To: beowulf at beowulf.org; Antti Korhonen
>>> Subject: Re: [Beowulf] Cannot use more than two nodes on cluster
>>>
>>> I am going to bet $0.25 that SSH or TCP/IP is configured to allow the
>>> master to get to the nodes without a password, but not from one
>>> Compute to the other Compute.
>>>
>>> Test by sshing to Compute1, then from Compute1 to Compute2.
>>> Depending
>>> on how you built the cluster, it's also possible there is iptables
>>> running on the compute nodes but, my money is on the ssh keys need
>>> reconfiguring.
>>> Let us know what you find.
>>>
>>> Cheers!
>>> Greg
>>>
>>> Date: Wed, 19 Sep 2012 16:11:21 +0000
>>> From: Antti Korhonen
>>> <akorhonen at theranos.com<mailto:akorhonen at theranos.com>> Subject:
>>> [Beowulf] Cannot use more than two nodes on cluster
>>> To: "beowulf at beowulf.org<mailto:beowulf at beowulf.org>"
>>> <beowulf at beowulf.org<mailto:beowulf at beowulf.org>> Message-ID:
>>>
>>> <B9D51F953BEE5C42BC2B503D288542992DD935FE at SRW004PA.theranos.local<mai
>>> l
>>> to:B
>>> 9D51F953BEE5C42BC2B503D288542992DD935FE at SRW004PA.theranos.local>>
>>> Content-Type: text/plain; charset="us-ascii"
>>>
>>> Hello
>>>
>>> I have a small Beowulf cluster (master and 3 slaves).
>>> I can run jobs on any single nodes.
>>> Running on two nodes sort of works, running jobs on master and 1
>>> slave works. (all combos, master+slave1 or master+slave2 or
>>> master+slave3) Running jobs on two slaves hangs.
>>> Running jobs on master + any two slaves hangs.
>>>
>>> Would anybody have any troubleshooting tips?
>>
>> --
>> *************************************************************
>> Jörg Saßmannshausen
>> University College London
>> Department of Chemistry
>> Gordon Street
>> London
>> WC1H 0AJ
>>
>> email: j.sassmannshausen at ucl.ac.uk
>> web: http://sassy.formativ.net
>>
>> Please avoid sending me Word or PowerPoint attachments.
>> See http://www.gnu.org/philosophy/no-word-attachments.html
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>> Computing To change your subscription (digest mode or unsubscribe)
>> visit http://www.beowulf.org/mailman/listinfo/beowulf
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>> Computing To change your subscription (digest mode or unsubscribe)
>> visit http://www.beowulf.org/mailman/listinfo/beowulf
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list