Various ways exists for downloading an image using the requests package.
Persistent session or not?
There are two ways to make a HTTP request using requests:
The difference is that with
requests.get(), a new
session object is created
each time when we make a request. What does this mean? It means that each time
we make the request, we need to establish a new TCP connection to the remote
host. The connection is closed when we have finished the request.
session.get(), if we are making several requests to the same host, the
connection will be reused so that no time is wasted re-establishing new
connections. This is known as HTTP persistent connection.
In summary, using session will reduce the image download time.
stream or not?
As said in the previous post, for large files, we may want to use
parameter when making the request, which will reduce the memory overhead. So we
have two ways to get the binary image from the response:
By combining session options and stream options, we have four different ways to download images using requests.
In order to find which is faster, I have run a small benchmark. I combine
concurrent.futures and requests to download several images concurrently using
the above four different settings of requests. The complete code can be found
According to my benchmark, using sessions is faster than requests without
explicit sessions. For the stream option, using
r.raw is generally faster
r.content, but it is not always the case. If the image size not
big enough, using either
r.content is fine.
The benefit of using explicit sessions is more apparent when we are downloading more images concurrently. On my Mac, when downing 20 images concurrently, the output is:
avg time (r.raw with session): 0.2751150131225586 avg time (r.content with session): 0.2750370740890503 avg time (r.raw no session): 1.5932393550872803 avg time (r.content no session): 0.8408806085586548
With 40 images, the output is:
avg time (r.raw with session): 0.3991949200630188 avg time (r.content with session): 0.42252095937728884 avg time (r.raw no session): 1.937515652179718 avg time (r.content no session): 1.912631618976593
So the takeaway from this post is that using sessions will save our time and download images faster.
License CC BY-NC-ND 4.0