Downloading Images Faster with requests Sessions
In my previous post, I write about how to download an image from URL using requests. In this post, I want to share ways to make the download speed faster.
Various ways exist for downloading an image using the requests package.
Persistent session or not?
There are two ways to make an HTTP request using requests:
The difference is that with
session object is created each time when we make a request.
It means that each time we make the request,
we need to establish a new TCP connection to the remote host.
The connection is closed when we finish the request.
Since establishing a connection to the server takes some time,
downloading images using
requests.get() causes extra overhead.
session.get(), if we are making several requests to the same host,
the connection will be reused so that no time is wasted re-establishing new connections.
This is also known as HTTP persistent connection.
In summary, using session will reduce the image download time.
stream or not?
As said in the previous post, for large files,
we may want to use
stream parameter when making the request,
which will reduce the memory overhead.
So we have two ways to get the binary image from the response:
By combining session options and stream options, we have four different ways to download images using requests:
In order to find which is faster, I have run a small benchmark.
concurrent.futures and requests to download several images concurrently
using the above four different settings of requests.
The complete code can be found here.
According to benchmark result, using sessions is faster than requests without explicit sessions.
For the stream option, using
r.raw is generally faster than using
but it is not always the case. If the image size not big enough,
r.content is fine.
The benefit of using explicit sessions is more apparent when we are downloading more images concurrently. On my Mac, when downing 20 images concurrently, I get the following result:
avg time (r.raw with session): 0.2751150131225586 avg time (r.content with session): 0.2750370740890503 avg time (r.raw no session): 1.5932393550872803 avg time (r.content no session): 0.8408806085586548
With 40 images, the output is:
avg time (r.raw with session): 0.3991949200630188 avg time (r.content with session): 0.42252095937728884 avg time (r.raw no session): 1.937515652179718 avg time (r.content no session): 1.912631618976593
From the above results, we can conclude that using requests with sessions will reduce image download time immensely.
License CC BY-NC-ND 4.0