Subject: [libssh2] [ libssh2-Bugs-1826989 ] libssh2_sftp_readdir() hangs forever

[libssh2] [ libssh2-Bugs-1826989 ] libssh2_sftp_readdir() hangs forever

From: SourceForge.net <noreply_at_sourceforge.net>
Date: Thu, 08 Nov 2007 06:10:22 -0800

Bugs item #1826989, was opened at 2007-11-06 16:50
Message generated for change (Comment added) made by tonyspoken
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=703942&aid=1826989&group_id=125852

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: SFTP
Group: None
Status: Open
Resolution: Fixed
Priority: 5
Private: No
Submitted By: antonio (tonyspoken)
Assigned to: Daniel Stenberg (bagder)
Summary: libssh2_sftp_readdir() hangs forever

Initial Comment:
An abnormal situation of hanging is noted when doing a remote dir read on a particular directory with the ibssh2_sftp_readdir_ex().
In attach the complete stdout log.

----------------------------------------------------------------------

>Comment By: antonio (tonyspoken)
Date: 2007-11-08 14:10

Message:
Logged In: YES
user_id=177324
Originator: YES

Yes. That fixed our problems. We made some basic functional tests and it
worked nicely.
Could you put the status of this as "Closed"?
Thanks for all the support

----------------------------------------------------------------------

Comment By: Daniel Stenberg (bagder)
Date: 2007-11-08 13:52

Message:
Logged In: YES
user_id=1110
Originator: NO

It makes a lot of sense. Thanks for the hard work and suggested fix. I
committed it just now. I take it this fixes this problem of yours?

----------------------------------------------------------------------

Comment By: antonio (tonyspoken)
Date: 2007-11-07 19:35

Message:
Logged In: YES
user_id=177324
Originator: YES

Hi.

Here it is only an idea of how we could manage, but i expect feedbacks
from Daniel.
Probably the check "if (numbytes < blocksize)" should be done, but only
the very first time we get a chunk, not always.

So my suggestion would be to move this check after we're doing the other
check (the one that controls whether we're at first chunk or not)

The resulting code would look like this:
............

if (!p->total_num) {
            /* No payload package area allocated yet. To know the
               size of this payload, we need to decrypt the first
               blocksize data. */

// Beginning of the moved piece
if (numbytes < blocksize) {
            /* we can't act on anything less than blocksize */
            return PACKET_EAGAIN;
        }
// End of the moved piece
if (!p->total_num) {

...........

I don't know if it makes sense.

----------------------------------------------------------------------

Comment By: antonio (tonyspoken)
Date: 2007-11-07 17:38

Message:
Logged In: YES
user_id=177324
Originator: YES

Hi Daniel,

i'm utilizing the latest CVS version.
from a further analysis it seems that the problem relies on transport.c.
I explain you briefly what happens.
After a call to libssh2_sftp_readdir(), the libssh2_packet_read() is
called.
The SSH server is responding with a packet of 4100 Bytes.
Since we put 4096 as embedded PACKETBUFSIZE in libssh2_priv.h and the
recv()in libssh2_packet_read() takes exactly PACKETBUFSIZE bytes from
network.
When the 5 initial bytes are decoded, looking at the size we have to deal
with it is clear that we have to do another recv(), because there are more
bytes waiting for us.
These are 4100 - 4096 = 4 bytes more to take.

Here it is the code that gives problem
numbytes = remainbuf; // Here numbytes = 4

        if (numbytes < blocksize) { // blocksize is 16 here.
            /* we can't act on anything less than blocksize */
            return PACKET_EAGAIN;
        }

So we enter in a PACKET_EAGAIN loop. Deleting this check everything seems
working, but i don't know which side effects can rise from this deletion.
This means that every time the fragmentation of packets is in a number
inferior to 16 the read will hang.
In our case this faulty situation will rise if SSH server return 4096+X
byte where 1 < X <= 15.
Don't know if the situation is now clear. I can send network dump for a
follow-up, or other details if it's needed.

----------------------------------------------------------------------

Comment By: Daniel Stenberg (bagder)
Date: 2007-11-07 15:41

Message:
Logged In: YES
user_id=1110
Originator: NO

Well, I wrote the transport.c file and functions but what you describe
sounds like there's a bug somewhere that makes the code get stuck in a way
it shouldn't. And then I doubt it helps much that I wrote it, since I
didn't mean to introduce bugs of this kind and I can't say I know where
they are now...

Your report doesn't say what libssh2 version you're using, but if it's not
the current CVS (recent daily snapshot) please try that since I've fixed a
few bugs since 0.17 that would result in something like this.

I would also be interested in knowing how to repeat this problem so that I
can debug/assist in my end, should I get time for it.

----------------------------------------------------------------------

Comment By: antonio (tonyspoken)
Date: 2007-11-07 14:25

Message:
Logged In: YES
user_id=177324
Originator: YES

I'm further investigating this problem.
I got one interesting thing from logs and network snoops.
The problem seems connected to the amount of datas we're grabbing from the
network.
In particular after calling libssh2_sftp_readdir() the SSH server is
responding us with a 4100 bytes packet, that from libssh2 point of view is
got in two different stages (4096 bytes raw data + further 4 bytes raw
data, as can be seen from the logs) because 4096 is the actual fixed
dimension of the chunk size, as defined in lissh2_priv.h (PACKETBUFSIZE).

The faulty and hanging situation is not appearing anynmore in two
different cases:

- for example if we add a new file in the directory
- we change the PACKETBUFSIZE value

Both of the two options leads to a change in the number of bytes actually
exchanged between server and client (for example when adding a new file in
the remote directory to be ls-ed, the received 4100 bytes returned from SSH
server became 4196, and everything went fine).

I can deep analyze it more, but probably someone who's inside the
transport and packet routines could speed up things a lot.
Any help is appreciated.

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=703942&aid=1826989&group_id=125852

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
libssh2-devel mailing list
libssh2-devel_at_lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libssh2-devel
Received on 2007-11-08