Skip to main content
  1. Posts/

Downloading Images Faster with requests Sessions

··457 words·3 mins·
Table of Contents

In my previous post, I write about how to download an image from URL using requests. In this post, I want to share ways to make the download speed faster.

Various ways exist for downloading an image using the requests package.

Persistent session or not?
#

There are two ways to make an HTTP request using requests:

The difference is that with requests.get(), a new session object is created each time when we make a request. It means that each time we make the request, we need to establish a new TCP connection to the remote host. The connection is closed when we finish the request. Since establishing a connection to the server takes some time, downloading images using requests.get() causes extra overhead.

With session.get(), if we are making several requests to the same host, the connection will be reused so that no time is wasted re-establishing new connections. This is also known as HTTP persistent connection. In summary, using session will reduce the image download time.

stream or not?
#

As said in the previous post, for large files, we may want to use stream parameter when making the request, which will reduce the memory overhead. So we have two ways to get the binary image from the response:

  • using response.content
  • using response.raw.read()

Benchmark
#

By combining session options and stream options, we have four different ways to download images using requests:

  • r.raw with session
  • r.content with session
  • r.raw without session
  • r.content without session

In order to find which is faster, I have run a small benchmark. I combine concurrent.futures and requests to download several images concurrently using the above four different settings of requests. The complete code can be found here.

According to benchmark result, using sessions is faster than requests without explicit sessions. For the stream option, using r.raw is generally faster than using r.content, but it is not always the case. If the image size not big enough, using either r.raw or r.content is fine.

The benefit of using explicit sessions is more apparent when we are downloading more images concurrently. On my Mac, when downing 20 images concurrently, I get the following result:

avg time (r.raw with session): 0.2751150131225586
avg time (r.content with session): 0.2750370740890503
avg time (r.raw no session): 1.5932393550872803
avg time (r.content no session): 0.8408806085586548

With 40 images, the output is:

avg time (r.raw with session): 0.3991949200630188
avg time (r.content with session): 0.42252095937728884
avg time (r.raw no session): 1.937515652179718
avg time (r.content no session): 1.912631618976593

Conclusion
#

From the above results, we can conclude that using requests with sessions will reduce image download time immensely.

References
#

Related

Build Web API with Flask --- Work with JSON-like Dict
··435 words·3 mins
Build Web API with Flask -- Post and Receive Image
··657 words·4 mins
Build A Web API Service Using Flask -- The Basics
··1014 words·5 mins