In my previous post, I write about how to download an image from URL using requests. In this post, I want to share ways to make the download speed faster.
Various ways exist for downloading an image using the requests package.
Persistent session or not?#
There are two ways to make an HTTP request using requests:
- using
requests.get()
- using
session.get()
wheresession
is arequests.Session()
object.
The difference is that with requests.get()
,
a new session
object is created each time when we make a request.
It means that each time we make the request,
we need to establish a new TCP connection to the remote host.
The connection is closed when we finish the request.
Since establishing a connection to the server takes some time,
downloading images using requests.get()
causes extra overhead.
With session.get()
, if we are making several requests to the same host,
the connection will be reused so that no time is wasted re-establishing new connections.
This is also known as HTTP persistent connection.
In summary, using session will reduce the image download time.
stream or not?#
As said in the previous post, for large files,
we may want to use stream
parameter when making the request,
which will reduce the memory overhead.
So we have two ways to get the binary image from the response:
- using
response.content
- using
response.raw.read()
Benchmark#
By combining session options and stream options, we have four different ways to download images using requests:
r.raw
with sessionr.content
with sessionr.raw
without sessionr.content
without session
In order to find which is faster, I have run a small benchmark.
I combine concurrent.futures
and requests to download several images concurrently
using the above four different settings of requests.
The complete code can be found here.
According to benchmark result, using sessions is faster than requests without explicit sessions.
For the stream option, using r.raw
is generally faster than using r.content
,
but it is not always the case. If the image size not big enough,
using either r.raw
or r.content
is fine.
The benefit of using explicit sessions is more apparent when we are downloading more images concurrently. On my Mac, when downing 20 images concurrently, I get the following result:
avg time (r.raw with session): 0.2751150131225586
avg time (r.content with session): 0.2750370740890503
avg time (r.raw no session): 1.5932393550872803
avg time (r.content no session): 0.8408806085586548
With 40 images, the output is:
avg time (r.raw with session): 0.3991949200630188
avg time (r.content with session): 0.42252095937728884
avg time (r.raw no session): 1.937515652179718
avg time (r.content no session): 1.912631618976593
Conclusion#
From the above results, we can conclude that using requests with sessions will reduce image download time immensely.