Subject: Re: [libssh2] How to increase performance of libssh2 SFTP Read/Write

Re: [libssh2] How to increase performance of libssh2 SFTP Read/Write

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Wed, 13 Jun 2007 13:38:47 +0200 (CEST)

On Wed, 13 Jun 2007, Mononen Jussi wrote:

Thanks for your patience and explanations.

> The most critical part is file transfer.

You're then referring to SFTP here. But AFAIK, libssh2 is also slower on pure
SCP transfers... Isn't it? Did anyone actually measure libssh2 transfers
recently and compared them to the openssh tools?

> Consider that you are transferring a 2MB file, you send one request
> (libssh2_sftp_read, SSH_FXP_READ) for 32kb of data. Then you wait for the
> response packet (libssh2_sftp_packet_requirev, SSH_FXP_DATA,
> SSH_FXP_STATUS). Then you send the request again and wait a while to get the
> response. And so on.

I get it, it is quite similar to doing HTTP with or without pipelining. Each
request/response adds a protocol roundtrip.

> If the first request is successfull you could send three requests in a row
> and then wait for the responses and data. If they are successfull, then
> let's send 10 requests and wait for responses. This would consume the
> bandwidth more efficiently as the bandwidth would be in use while the server
> processes our request. This is what I mean with asynchronous transfer.

I wouldn't call it asynchronous, I would all it pipelining (since the data
would still be synchronously sent) but then I might just be http damaged! ;-)
But yeah, I can see how that approach can boost performance a fair amount.
Especially on connections with high latency.

> This could be implemented by adding a full function that gets/puts one file
> similarly to command line tools. Now that the API provides read/write
> functions to get a certain amount of data we are bound to use synchronous
> approach.

I disagree with that conclusion. First, there's of course nothing that
prevents us from adding another or a modified API that somehow makes this
easier, but I could very well consider just an added function/option to the
library that would make libssh2 ask for several chunks of data at once and
just build an internal buffer to return to the API when asked for (of course
limited to a certain extent beyond what has been asked for). The option would
be needed because it would of course risk asking for and transferring more
data than what the application would otherwise want. And when sending data to
a peer, the function could accept a large chunk of data split it up and send
it off in many pieces more or less at once without necessarily waiting for
each packet to first receive success. That will of course require that apps
send very large data chunks to the libssh2 API, but should be a matter of
documentation.

Of course this kind of pipelining approach will make error handling somewhat
tricker.

Looking at the code, the current approach is even more limited than what has
been explained here. Each call to libssh2_sftp_read() (and libssh2_sftp_write)
is directly mapped to an underlying SFTP packet data request size, so if you
happen to call libssh2_sftp_read() with a buffer size less than 32KB or 40KB
or so, you simply will get worse performance since every call to this function
will cause a protocol round-trip.

Personally I would prefer to start with improving SCP as that's a much simpler
protocol and if that is still slower than openssh I think we should focus on
getting that up to speed first and then attack SFTP.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
libssh2-devel mailing list
libssh2-devel_at_lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libssh2-devel
Received on 2007-06-13