[Beowulf] ssh connection problem
Ruhollah Moussavi Baygi
ruhollah.mb at gmail.com
Fri Jun 1 13:41:05 PDT 2007
Hi,
Thank you for your answers,
But, please ignore the content of the 'links' I have posted, I didn't mean
to send you those links. I just did google to find a solution for our
cluster's problem 'Disconnecting:…'. However, because I couldn't find a
proper solution via googling, I posted it to Beowulf, so, I just did
copy-paste the sentence 'Disconnecting:…' in my gmail. That's why you can
see 'links' in my email.
Returning to our problem, the results of 'netstat –i' and '-s' are as
follows, respectively.
Please note that:
a) I use cat 6,
b) it is nearly improbable to have electricity noise
c) the head-node has two NICs, eth0 is for internal zone, i.e. computing
nodes, which is running with no problem. eth1 is for external zone, i.e. to
be connected by our users via ssh. This one has disconnecting problem.
d) it doesn't seem that there is any SW/router problem. Because in the
same network, there is some other machine, which is connected by users via
ssh with no problem.
___________________________________________________________________
*[root at node01 ~]# netstat -i***
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR
Flg
eth0 1500 0 586745989 0 0 0 598858710 0 0
0 BMRU
eth1 1500 0 701868 0 0 0 325542 0 0
0 BMRU
lo 16436 0 1959 0 0 0 1959 0 0
0 LRU
*[root at node01 ~]# netstat -s***
Ip:
585891011 total packets received
0 forwarded
0 incoming packets discarded
585887228 incoming packets delivered
597668214 requests sent out
Icmp:
34 ICMP messages received
21 input ICMP message failed.
ICMP input histogram:
destination unreachable: 25
timeout in transit: 5
echo requests: 4
601 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
destination unreachable: 597
echo replies: 4
Tcp:
78 active connections openings
360 passive connection openings
0 failed connection attempts
18 connection resets received
8 connections established
585798178 segments received
597666644 segments send out
16197 segments retransmited
94 bad segments received.
1682 resets sent
Udp:
1005 packets received
596 packets to unknown port received.
0 packet receive errors
1019 packets sent
TcpExt:
2 resets received for embryonic SYN_RECV sockets
26 packets pruned from receive queue because of socket buffer overrun
ArpFilter: 0
60 TCP sockets finished time wait in fast timer
1 packets rejects in established connections because of timestamp
734435 delayed acks sent
127 delayed acks further delayed because of locked socket
Quick ack mode was activated 7963 times
724 packets directly queued to recvmsg prequeue.
6030 packets directly received from backlog
164431 packets directly received from prequeue
571897537 packets header predicted
138 packets header predicted and directly queued to user
TCPPureAcks: 44870
TCPHPAcks: 458279645
TCPRenoRecovery: 0
TCPSackRecovery: 2875
TCPSACKReneging: 0
TCPFACKReorder: 0
TCPSACKReorder: 0
TCPRenoReorder: 0
TCPTSReorder: 0
TCPFullUndo: 0
TCPPartialUndo: 0
TCPDSACKUndo: 1
TCPLossUndo: 7099
TCPLoss: 626
TCPLostRetransmit: 0
TCPRenoFailures: 0
TCPSackFailures: 1635
TCPLossFailures: 169
TCPFastRetrans: 4294
TCPForwardRetrans: 23
TCPSlowStartRetrans: 1130
TCPTimeouts: 8329
TCPRenoRecoveryFail: 0
TCPSackRecoveryFail: 279
TCPSchedulerFailed: 0
TCPRcvCollapsed: 2731
TCPDSACKOldSent: 8194
TCPDSACKOfoSent: 0
TCPDSACKRecv: 7125
TCPDSACKOfoRecv: 0
TCPAbortOnSyn: 0
TCPAbortOnData: 28
TCPAbortOnClose: 8
TCPAbortOnMemory: 0
TCPAbortOnTimeout: 12
TCPAbortOnLinger: 0
TCPAbortFailed: 0
TCPMemoryPressures: 0
___________________________________________________________________
--
Best,
Ruhollah Moussavi Baygi
On 5/29/07, Robert G. Brown <rgb at phy.duke.edu> wrote:
>
> On Sun, 27 May 2007, Ruhollah Moussavi Baygi wrote:
>
> > Hi everybody at Beowulf,
> >
> > I have a serious problem with ssh connection to our cluster. Every
> > hint/help/suggestion, which can help me to solve it, is highly
> appreciated.
> >
> > Most of the time, when users want to connect and run their programs from
> > their own PCs, the ssh connection failed, especially during transfer
> files
> > from/to head-node. Our user's PCs are mainly WindowsXP, so they use
> packages
> > like SSH Secure Shell for connection and file transfer, or Putty for
> > connection and WinSCP for file transfer.
> >
> >
> > The error massage is as follows:
> >
> > 'Disconnecting: Corrupted MAC on input'
>
> This sounds to me like hardware problems. What does your physical
> network look like? Is it built with the right cables, within spec, with
> decent switches? Do you see other evidence of network packet
> corruption?
>
> > <
> http://www.google.com/history/url?url=http://ubuntuforums.org/showthread.php%3Ft%3D202076&ei=wkJZRsGfHZf-0gTehKXrDQ&sig2=lIzQGYq3zN0Tz2EC8b4dAw&zx=JGkABbsjtaA&ct=w
> >
> >
> > or
> >
> > 'Disconnecting: bad packet
>
> Yes, sounds like bad hardware. Perhaps your cables aren't cat 5?
> Perhaps your electrical power has noise? Perhaps your switch(es) are
> broken or have been taken over by trolls? This sounds like you're
> failing packet checksum tests or experiencing pretty serious TCP
> collision problems.
>
> What do the network statistics look like on the interfaces in question?
>
> rgb
>
> > length...<
> http://www.google.com/search?q=disconnecting:+bad+packet+length+from+windows+to+linux+machine&hl=en
> >',
> > followed by a long integer.
> >
> >
> > This problem has practically made our cluster unusable. So, I would be
> > thankful for any coming advice.
> >
>
> --
> Robert G. Brown http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
>
>
>
--
Best,
Ruhollah Moussavi Baygi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20070602/11a5e345/attachment.html>
More information about the Beowulf
mailing list