#198: _libssh2_channel_close() may hang
---------------------+------------------------------------------------------
Reporter: fd64 | Owner:
Type: defect | Status: new
Priority: normal | Milestone: 1.2.7
Component: API | Version: 1.2.7
Keywords: | Blocks:
Blocked By: |
---------------------+------------------------------------------------------
An application is using libssh2 to execute commands remotely. It makes
intensive use of libssh2 with many threads to maintain a lot of
connections to several test openssh server boxes. We are using iptables to
randomly drop packets on the remote box to simulate very bad network
conditions. We want to be sure that the application will never hang or
crash even when connections are unstable.
The socket has been created in non-blocking because we want to be sure
that the application will never hang if something wrong happens. When the
applications detects a problem with a connection it disconnects using
these functions: libssh2_channel_close() + libssh2_channel_free() +
libssh2_session_disconnect() + libssh2_session_free().
In one of our test, one thread has been not responding for at least 60
seconds in _libssh2_transport_read() when the application called
libssh2_channel_close().
(gdb) thread 6
[Switching to thread 6 (Thread 10476)]#0 0x00000037f6219752 in
_libssh2_transport_read (session=<value optimized out>) at transport.c:601
601 }
(gdb) bt
#0 0x00000037f6219752 in _libssh2_transport_read (session=<value
optimized out>) at transport.c:601
#1 0x00000037f620548e in _libssh2_channel_close (channel=0x7f0124040550)
at channel.c:2257
#2 0x00000037f62056f8 in libssh2_channel_close (channel=0x7f0124040550)
at channel.c:2292
_libssh2_channel_close() seems to be hanging in the following loop:
if (channel->close_state == libssh2_NB_state_sent) {
/* We must wait for the remote SSH_MSG_CHANNEL_CLOSE message */
while (!channel->remote.close && !rc) {
rc = _libssh2_transport_read(session);
}
}
Maybe this problem only happens in very specific conditions. For instance
it may only happen if the connection is still alive when
_libssh2_channel_close() starts and is lost in the middle of this
function. Anyway, I think it would be great if you could either check that
the connection is still alive in this loop, or if you could implement a
sort of timeout (give up after X retries/seconds).
The program is running on a very up to date Fedora-12-amd64 system with
libssh2-1.2.7 (RPM package rebuilt for fedora-12 from official non patched
1.2.7).
I can send you the core dump + binary to a private mail address if it can
help.
Here are the two iptables commands that runs on the openssh client and
that makes the connection very unstable to simulate bad network. It's a
quite very aggressive configuration, but half of the connections manage to
successfully execute a simple command anyway.
iptables -A INPUT -i ${IFACE} -s ${SRCIP} -p tcp --dport 22 -m statistic
--mode random --probability 0.60 -j breakpkt
iptables -A INPUT -i ${IFACE} -s ${SRCIP} -p tcp --dport 22 ! --syn -m
statistic --mode nth --every 10 -j breakpkt
The test machine is also running Fedora-12-amd64 and the following ssh
server:
kernel-2.6.32.23-170.fc12.x86_64
openssh-clients-5.3p1-19.fc12.x86_64
openssh-server-5.3p1-19.fc12.x86_64
openssh-5.3p1-19.fc12.x86_64
Many thanks
-- Ticket URL: <http://trac.libssh2.org/ticket/198> libssh2 <http://trac.libssh2.org/> C library for writing portable SSH2 clients _______________________________________________ libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-develReceived on 2010-11-17