Subject: Libssh2 usage from cURL with various buffer sizes.

Libssh2 usage from cURL with various buffer sizes.

From: Patrik Thunström <patrik.thunstrom_at_bassetglobal.com>
Date: Fri, 18 Nov 2011 11:46:24 +0100

Hi!

 

I just sent over a report of our findings to the libcURL dev mailing list,
mainly telling that when using libcURL with various internal buffer sizes,
download and upload performance is going in in opposite directions, upload
is only improved by a larger buffer, whereas download is severely decreased.

 

As far as I can tell from the libcURL code there does not seem to be any
large difference in the calls done to the sftp_recv and sftp_send functions
which in turn calls libssh2_sftp_read and libssh2_sftp_write
correspondingly, but I believe there’re more knowledgeable people in this
mailing list who know better if there’s any large differences in the
handling the two paths.

 

I give a bit more info in the curl mailing list, but the start of the
situation was a customer that had performance issues with SFTP uploads
against a low latency high bandwidth OpenSSH4.3 server. Speeds was roughly
450kB/s with libcURL where FileZilla did ~20MB/s. Increasing
CURL_MAX_WRITE_SIZE from default 16kB to 16MB gave great upload speeds of
upwards ~25MB/s in customer’s environment.

 

This using libcURL 7.22.0, libssh 1.3.0 and OpenSSL 1.0.0e, on a Win32
platform (customer was running Windows Server 2003, but our local testing
was with Windows 7).

Target machine which local testing against was Win7 with CoreSFTP server and
a Xubuntu virtual machine with a OpenSSH 5.8 sshd.

Any numbers mentioned here includes a bit of local file shuffling, file
verification and a few other tasks, so they’re no good as exact indications
of transfer speeds. However, the only difference between the tests we ran is
the libcURL internal buffer size, so the local file operations present only
a minimal and negligible difference in total time used.

 

Uploads was as said only getting better and better performance with larger
buffer size. We did not see any disadvantages or bad behavior if simply
uploading data.

 

Doing downloads with a larger buffer size however severely impacted the
performance.

 

The worst case test scenario was a set of 100 x 1MB files, which starts
choking straight away with bad performance when running with larger buffers.
16kB buffer up to 1MB buffer still always finished, with increasingly bad
performance (13 seconds with 16k, 212 seconds with 512k, 847 seconds with
1M). Moving up to 4M buffer and beyond the full set of files did not want to
finish against the CoreSFTP server (not sure of the error code given from
libssh2, was reported onwards as CURLE_SSH from libcURL), but the OpenSSH
server only had even worse performance (peaking out at 45 minutes with a 16M
buffer, where the 16k buffer finished in 18 seconds against the same
server). This was also seemingly random, as it could sometimes fail on the
first file, and sometimes 15-20 files into the set.

 

One thing that can be noticed is that it does not seem to be the actual
transfer that is taking time; libcURL’s progress callback returns with info
that it has downloaded the full amount of data a lot earlier than returning
control to the program.

 

Another notice is that the performance seems to be quite related to the
number of files already transferred without tearing down and resetting the
connection. Running a set of 1000 x 20kB files works gradually worse for
each repetition. It is not a straight correlation to the number of files
transferred however, as the 100 x 1MB set gives really bad performance from
the start, no matter how many files has been transferred before.

 

I did a profiling run of our application halfway through our test suite
(which consists of first uploading four sets of files; 1 x 1.8GB, 10 x 12MB,
100 x 1MB and 1000 x 20kB, and then downloading the same files back again),
just when it hits the download of some 12MB files. To do this profiling I
ran Very Sleepy, which is just a simple sampling profiler.

For some reason it does not want to show the symbol information for libcURL
properly, even though it’s built with the same flags as the other libraries,
but as the profiler shows almost none of the time is spent in actual curl
code, but inside of libssh2.

 

If anyone would want a copy of the complete sleepy session, just pass me a
mail and I can send you a copy. I’ll just attach the parts I believe are
relevant to this mail as a CSV export.

The profiling was isolated to the thread which is calling libcURL, to avoid
extra noise in sampling. For the numbers included here, I also selected the
transferFile function in our code as the root for all time calculations,
since this is where all of the libcURL calls are made, so that’s why
transferFile’s inclusive time is 100%. List is then sorted on Inclusive %.

 

Call stack looks like the following (including it in the mail, since it’s
not included in the csv export):

 

sftp_packet_ask

sftp_packetlist_flush

sftp_read

libssh2_sftp_read

curl_global_cleanup <- These are most probably due to messed up debug
symbols as mentioned

curl_global_cleanup

curl_global_cleanup

curl_global_cleanup

curl_global_cleanup

curl_global_cleanup

curl_global_cleanup

curl_global_cleanup

transferFile

 

As for sftp_packet_ask which is using 95% inclusive time and a whopping 87%
exclusive time according to the profiler, it’s list of callers is:

 

sftp_packetlist_flush (49.97%)

sftp_packet_requirev (28.89%)

sftp_packet_require (21.14%)

 

The difference between inclusive and exclusive time for sftp_packet_ask
seems to point to the calls to _libssh2_list_next (whose caller list states
that 99.92% of the calls was from sftp_packet_ask).

 

The profiling was performed over 450 seconds, during which time a total of
three files was transferred as far as I can recall (could’ve been only two
as well, but not sure).

 

When doing the profiling, the CPU usage was not all that high, but when not
profiling the CPU usage is pretty constant at 25% (on a 4 core CPU, so it’s
using all processing power it can).

 

As I’m no expert in libssh2 hacking, I’m passing this info on, to hopefully
help you in finding what the issue might be. If there’re any flags I should
enable for a libssh2 build and do any more debugging/profiling, please let
me know.

 

Maybe the bottom line is that it’s not a good approach to simply use a
static buffer with same size both for download and upload, which then needs
to be handled by cURL, but I believe there is some unnecessary overhead
going on here.

 

Luckily the bad upload performance for our customer is at a single customer,
and on that setup the need is only to upload data. If one would need to both
download and upload, there is currently no way to accommodate when going via
libcURL (which of course can be improved and fixed there).

 

Best regards

Patrik Thunström / patrik.thunstrom_at_bassetglobal.com

_______________________________________________
libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel

Received on 2011-11-18