Hi friends,
I've been working on introducing the asynch approach for SFTP downloads.
Downloads are a bit different in nature than uploads as for example we will
normally get a largish buffer in each call and we will read until EOF.
I first experimented with sending very large FXP_READ requests, and I learned
than OpenSSH only sends back 64K data and never more. The SFTP spec is a bit
vaguely written but seems to say that implementations are only obliged to
support 32K.
So, to read SFTP really fast we create a queue of outgoing READ packets and
send them out one by one and return data as soon as we get such. This way we
get the pipelining effect we want and thus circumvent the waiting. The fact
that we don't know the size before-hand combined with this sort of pre-reading
makes us send a lot of READs beyond the end of the file. That's definately
room for improvement.
SOME NUMBERS
(As usual all numbers are rough, possibly wrong due to my mistakes and I've
ran all libssh2 tests using debug builds (without any kind of compiler
optimizations enabled) and some of them used a fair about of printf outputs
during their operations.)
- I made the pre-reading use 'buffer_size * 4' as maximum outstanding reads.
- I used the sftp_nonblock.c code and bumped its own buffer to 1MB - yes that
makes libssh2 pre-read 4MB! Setting down the max to 'buffer_size * 4' did
cut the transfer speed by 20%! 4MB makes ~135 outstanding READ packets when
30000 is requested in each.
- I modified the code to not write() the received data anywhere
- For my test with the 1.2.7 code, I modified the buffer to 100K just to make
sure it was as big as possible for that code.
HIGH LATENCY
To simulate a far away server, I used a nice new trick I've learned to add
RTT time:
$ tc qdisc add dev lo root handle 1:0 netem delay 100msec
Restore it back to normal again with:
$ tc qdisc del dev lo root
The added 100 millisecond delay here is once for each way, so this makes a
200ms RTT when I ping localhost.
A test with the original 1.2.7 code first:
Got 10240000 bytes in 64238 ms = 159407.2 bytes/sec
Yes, it really does perform that terribly bad. OpenSSH's sftp tool does the
upload at 7.5MB/sec over the same connection.
My first test with my new code, using the 4MB/30000 sizes:
Got 102400000 bytes in 20585 ms = 4974496.0 bytes/sec
Correct. Check the number of zeroes. Ten times the data in a third of the
time: 31 times faster in total...
So I started to experiment with sizes. My thinking is that with a 200 ms
latency, we might want more than 200 requests in the pipe to be really
efficient. And what do you know? If I cut down the outgoing data requests to
ask for just 2000 bytes per "piece" I'm able to bump it up another 40%:
Got 102400000 bytes in 14695 ms = 6968356.6 bytes/sec
At almost 7MB/sec we're now very close to OpenSSH and roughly 43 times faster
than 1.2.7...
ZERO LATENCY
When I removed the added latency again and ran the test against localhost my
test app seemed to get quite stable 25MB/sec while OpenSSH run like the wind
at 44MB/sec. I've tried changing the packet sizes between 2000 and 30000 as I
suspected that localhost might perform better with larger sizes there, but I
didn't see any significant difference. I believe this difference is more due
to something in our regular transport/channel handling as we are noticably
slower than openssh already with plain SCP and as long as we are that, we
can't make SFTP compare either.
DOWNSIDE
When we use this approach we have a significant over-read for small files. If
we for example were to write an application that moves over a directory with
100 files, each being 20 bytes, we would perform terribly slow and waste a
lot of bandwidth.
IMPROVEMENTS
I think that we should consider having the SFTP code do an SSH_FXP_STAT
query first to figure out the size of the remote file so that _no_
"over-read" will be done and thus there will be no punishment for small
files. Of course this will then not work exactly like today in cases when for
example the file is being written to while the download begins.
I think we should consider an API that limits or disables this read-ahead
concept for small memory situations or just situations where it doesn't
behave in a way that is favourable to the application.
WHAT NOW
I'll be committing my changes soonish. I have come to think of a few quirks I
want to look over first - not really related to my changes but I think my
changes expose these problems more.
I will really appreciate if everyone would consider getting the new
code for a little spin to see in which ways it breaks and what mistakes I
haven't yet found myself. My tests seems to run rather solidly, but I have a
rather limited test environment and quite likely too bad imagination to cause
the real disasters!
-- / daniel.haxx.se _______________________________________________ libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-develReceived on 2010-12-14