Subject: [libssh2] #198: _libssh2_channel_close() may hang

[libssh2] #198: _libssh2_channel_close() may hang

From: libssh2 Trac <trac_at_libssh2.stuge.se>
Date: Wed, 17 Nov 2010 10:18:40 -0000

#198: _libssh2_channel_close() may hang
---------------------+------------------------------------------------------
  Reporter: fd64 | Owner:
      Type: defect | Status: new
  Priority: normal | Milestone: 1.2.7
 Component: API | Version: 1.2.7
  Keywords: | Blocks:
Blocked By: |
---------------------+------------------------------------------------------
 An application is using libssh2 to execute commands remotely. It makes
 intensive use of libssh2 with many threads to maintain a lot of
 connections to several test openssh server boxes. We are using iptables to
 randomly drop packets on the remote box to simulate very bad network
 conditions. We want to be sure that the application will never hang or
 crash even when connections are unstable.

 The socket has been created in non-blocking because we want to be sure
 that the application will never hang if something wrong happens. When the
 applications detects a problem with a connection it disconnects using
 these functions: libssh2_channel_close() + libssh2_channel_free() +
 libssh2_session_disconnect() + libssh2_session_free().

 In one of our test, one thread has been not responding for at least 60
 seconds in _libssh2_transport_read() when the application called
 libssh2_channel_close().

 (gdb) thread 6
 [Switching to thread 6 (Thread 10476)]#0 0x00000037f6219752 in
 _libssh2_transport_read (session=<value optimized out>) at transport.c:601
 601 }
 (gdb) bt
 #0 0x00000037f6219752 in _libssh2_transport_read (session=<value
 optimized out>) at transport.c:601
 #1 0x00000037f620548e in _libssh2_channel_close (channel=0x7f0124040550)
 at channel.c:2257
 #2 0x00000037f62056f8 in libssh2_channel_close (channel=0x7f0124040550)
 at channel.c:2292

 _libssh2_channel_close() seems to be hanging in the following loop:

     if (channel->close_state == libssh2_NB_state_sent) {
         /* We must wait for the remote SSH_MSG_CHANNEL_CLOSE message */

         while (!channel->remote.close && !rc) {
             rc = _libssh2_transport_read(session);
         }
     }

 Maybe this problem only happens in very specific conditions. For instance
 it may only happen if the connection is still alive when
 _libssh2_channel_close() starts and is lost in the middle of this
 function. Anyway, I think it would be great if you could either check that
 the connection is still alive in this loop, or if you could implement a
 sort of timeout (give up after X retries/seconds).

 The program is running on a very up to date Fedora-12-amd64 system with
 libssh2-1.2.7 (RPM package rebuilt for fedora-12 from official non patched
 1.2.7).

 I can send you the core dump + binary to a private mail address if it can
 help.

 Here are the two iptables commands that runs on the openssh client and
 that makes the connection very unstable to simulate bad network. It's a
 quite very aggressive configuration, but half of the connections manage to
 successfully execute a simple command anyway.

 iptables -A INPUT -i ${IFACE} -s ${SRCIP} -p tcp --dport 22 -m statistic
 --mode random --probability 0.60 -j breakpkt
 iptables -A INPUT -i ${IFACE} -s ${SRCIP} -p tcp --dport 22 ! --syn -m
 statistic --mode nth --every 10 -j breakpkt

 The test machine is also running Fedora-12-amd64 and the following ssh
 server:
 kernel-2.6.32.23-170.fc12.x86_64
 openssh-clients-5.3p1-19.fc12.x86_64
 openssh-server-5.3p1-19.fc12.x86_64
 openssh-5.3p1-19.fc12.x86_64

 Many thanks

-- 
Ticket URL: <http://trac.libssh2.org/ticket/198>
libssh2 <http://trac.libssh2.org/>
C library for writing portable SSH2 clients
_______________________________________________
libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel
Received on 2010-11-17