Hi
I've committed the changes I've done so far to the buffering/sending parts of
the code. I'll present some speed test results below but first I'll briefly
describe what I've done:
transport send
- sends everything without doing any extra alloc, and if it compreses the
sent data it also avoids an extra memcpy. This is achieved by having a
large buffer area within the session handle struct that is used for this.
- the compression functions are now split in separate compression and
decompression functions
- _libssh2_transport_write was converted to _libssh2_transport_send which
takes two data areas two send, as this makes it easier for lots of code to
pass on the payload part without doing an extra memcpy and by doing a
smaller or no allocation at all
channel write
- I cleaned up and simplified the function even further, now it approaches
readable
- I removed the 32500 size limit and instead made the
_libssh2_channel_write function have a looping logic that splits up larger
input buffer into smaller fragments that are passed one by one to
_libssh2_transport_send().
- Removed the 32500 limit from the SFTP write function so now it can create
much larger SFTP packets and pass those on to _libssh2_channel_write as
that now supports them properly.
Speed Comparisons
=================
- SFTP upload
I built libssh2 1.2.7 and my current dev version with CFLAGS=-O2 and I ran
this example on both:
time ./sftp_write_nonblock 127.0.0.1 user password /bigfile /tmp/remove
The 1.2.7 version it averaged at:
1024000000 bytes in 43 seconds = 23813953 bytes/sec
The current git version averages at:
1024000000 bytes in 30 seconds = 34133333 bytes/sec
... roughly 43% faster. OpenSSH's sftp tool still makes the same upload in 19
seconds.
- SCP upload
I only did a few tests with SCP and there wasn't a very big difference, even
if I consistently measured the git version to be faster than 1.2.7. With SCP
uploads we're much closer to OpenSSH speeds already so there's a much
smaller motivation for me to make a lot of efforts there.
Further Improvements
====================
I have some more ideas of how to reach further:
- I wanted to prevent the memcpy done in the sftp_write function as well, but
that will require that we make a channel_write() function in a similar
style to the transport_send() and that isn't as straight forward so I've
decided to wait with that and consider another area instead...
- The multiple outgoing packets thing. I think I'll proceed and do some
experiments with a SFTP write function that (assuming that more than a
certain amount of data is sent) sends the data in two SFTP packets, and as
soon as the first packet is ACKed the function will return that amount.
When the function is called again with a data pointer pointing to the
second chunk, the SFTP write function is already waiting for the ACK for
that so it sends off the second part only and returns as soon as the first
part is ACKed etc.
The idea here being that while waiting for an ACK we are better kept busy
by sending the next part than by just waiting.
I've seen that OpenSSH does something similar to this, but of course this
operation is much easier for a "simple" command line tool rather than a
library function like ours.
- There's now a bunch of places in the code where we can skip the
alloc/free for channel stuff and instead use a fixed sized array within the
session struct. They're not likely to be sigificant or even measurable,
but that still fits my general idea of restricting the number of mallocs
to a minimum for the "normal" code flows, for all functions that are used
frequently.
- If anyone has any (other?) bright idea, I'm all ears!
It is a bit tricky to figure out exactly what to do to go faster. Doing
profiling on the code doesn't really help much, as already before the vast
majority of the time is spent on the crypto parts and it doesn't reveal much
where we waste time as described above.
-- / daniel.haxx.se _______________________________________________ libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-develReceived on 2010-10-24