Subject: Re: Pipelining and recent sftp upload improvements

Re: Pipelining and recent sftp upload improvements

From: Will Cosgrove <will_at_panic.com>
Date: Mon, 29 Nov 2010 11:02:45 -0800

Hi Daniel (alt all),
The pipelining API I created has two calls, one for the initial sending/requesting data and one for the ack/response of data. It's modeled loosely on the API in libssh and code found in openssh. The API looks like this:

LIBSSH2_API ssize_t
libssh2_sftp_write_async_begin(LIBSSH2_SFTP_HANDLE *handle, libssh2_uint64_t offset,
                                                                   char *buffer, size_t buffer_len, unsigned long *request_id);
        
LIBSSH2_API int
libssh2_sftp_write_async(LIBSSH2_SFTP_HANDLE *handle, unsigned long request_id);

LIBSSH2_API int
libssh2_sftp_read_async_begin(LIBSSH2_SFTP_HANDLE *handle, libssh2_uint64_t offset, size_t buffer_maxlen, unsigned long *out_request_id);

LIBSSH2_API int
libssh2_sftp_read_async(LIBSSH2_SFTP_HANDLE *handle, char *buffer, size_t buf_len, unsigned long request_id);

For uploading, the write_async_begin call writes the buffer of a given size. This is not guaranteed, so the ack call write_async returns the actual size of the buffer handled. It is then up to the caller to correctly re-send the remaining buffer at the correct offset. Writes are paired using the request ID.

For downloading, it's the same concept. read_async_begin requests bytes of a given size at a given offset, read_async then actually returns that data; this may not be the entire requested bytes, so the caller needs to correctly re-request the remaining buffer at the correct offset. Reads are paired using the request ID.

The advantages of this is that you can call write_async_begin say, 10 times in a row, then starting draining the acks using write_async, this minimizes network latency (aka sitting on a select() call) and does a better job maxing out available bandwidth. The same can be said about downloading. The advantages of this method to the new write pipelining on the 1.2.8 branch is that you don't have to pre-read a large buffer of data into memory. The disadvantage is it's more leg-work for the implementor because you have to track the offsets and drain replies manually. Thinking out loud, it might be worth adding a convenance API that takes a file path and does all this 'behind-the-scences' like the openssh do_upload/do_download methods do.

Now for the ever-so-important speed improvements. I'm testing against an internal server (RAID'ed) on a gigabit ethernet. My before benchmarks were about 12 MB/sec upload using 1.2.7 stable release. My after is about 53 MB/sec upload. When shelling out to openssh's sftp, I can get about 73 MB/sec. Ideally I'd like to get libssh up to ssh's performance levels, but that's for another day.

Cheers,
Will

On Nov 29, 2010, at 5:09 AM, Daniel Stenberg wrote:

> On Wed, 17 Nov 2010, Will Cosgrove wrote:
>
>> The reason I'm posting is I recently added my own upload/download pipelining API to libssh2 modeled after libssh's download pipeline API. It pushes a bit more state management onto the user, but it seems to work fairly well and doesn't require a large input buffer to be filled before hand. I was wondering if there was any interest in my additions (which admittedly need to be code-reviewed by someone more familiar to libssh2 than me) or if this current method was going to be applied to downloading at some point and I should just keep my changes to my projects.
>
> I might be interested. I have no idea how that pipeline API works so you'd have to explain that, and then I would also like to see some numbers or metrics that show it making a difference!
>
> --
>
> / daniel.haxx.se
> _______________________________________________
> libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel

_______________________________________________
libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel
Received on 2010-11-29