Subject: Enhanced buffering made faster uploads!

Enhanced buffering made faster uploads!

From: Daniel Stenberg <>
Date: Sun, 24 Oct 2010 19:43:00 +0200 (CEST)


I've committed the changes I've done so far to the buffering/sending parts of
the code. I'll present some speed test results below but first I'll briefly
describe what I've done:

transport send

   - sends everything without doing any extra alloc, and if it compreses the
     sent data it also avoids an extra memcpy. This is achieved by having a
     large buffer area within the session handle struct that is used for this.

   - the compression functions are now split in separate compression and
     decompression functions

   - _libssh2_transport_write was converted to _libssh2_transport_send which
     takes two data areas two send, as this makes it easier for lots of code to
     pass on the payload part without doing an extra memcpy and by doing a
     smaller or no allocation at all

channel write

   - I cleaned up and simplified the function even further, now it approaches

   - I removed the 32500 size limit and instead made the
     _libssh2_channel_write function have a looping logic that splits up larger
     input buffer into smaller fragments that are passed one by one to

   - Removed the 32500 limit from the SFTP write function so now it can create
     much larger SFTP packets and pass those on to _libssh2_channel_write as
     that now supports them properly.

Speed Comparisons

- SFTP upload

I built libssh2 1.2.7 and my current dev version with CFLAGS=-O2 and I ran
this example on both:

  time ./sftp_write_nonblock user password /bigfile /tmp/remove

The 1.2.7 version it averaged at:

  1024000000 bytes in 43 seconds = 23813953 bytes/sec

The current git version averages at:

  1024000000 bytes in 30 seconds = 34133333 bytes/sec

... roughly 43% faster. OpenSSH's sftp tool still makes the same upload in 19

- SCP upload

   I only did a few tests with SCP and there wasn't a very big difference, even
   if I consistently measured the git version to be faster than 1.2.7. With SCP
   uploads we're much closer to OpenSSH speeds already so there's a much
   smaller motivation for me to make a lot of efforts there.

Further Improvements

I have some more ideas of how to reach further:

  - I wanted to prevent the memcpy done in the sftp_write function as well, but
    that will require that we make a channel_write() function in a similar
    style to the transport_send() and that isn't as straight forward so I've
    decided to wait with that and consider another area instead...

  - The multiple outgoing packets thing. I think I'll proceed and do some
    experiments with a SFTP write function that (assuming that more than a
    certain amount of data is sent) sends the data in two SFTP packets, and as
    soon as the first packet is ACKed the function will return that amount.

    When the function is called again with a data pointer pointing to the
    second chunk, the SFTP write function is already waiting for the ACK for
    that so it sends off the second part only and returns as soon as the first
    part is ACKed etc.

    The idea here being that while waiting for an ACK we are better kept busy
    by sending the next part than by just waiting.

    I've seen that OpenSSH does something similar to this, but of course this
    operation is much easier for a "simple" command line tool rather than a
    library function like ours.

  - There's now a bunch of places in the code where we can skip the
    alloc/free for channel stuff and instead use a fixed sized array within the
    session struct. They're not likely to be sigificant or even measurable,
    but that still fits my general idea of restricting the number of mallocs
    to a minimum for the "normal" code flows, for all functions that are used

  - If anyone has any (other?) bright idea, I'm all ears!

It is a bit tricky to figure out exactly what to do to go faster. Doing
profiling on the code doesn't really help much, as already before the vast
majority of the time is spent on the crypto parts and it doesn't reveal much
where we waste time as described above.

Received on 2010-10-24