Recently, I want to download some images using Python. This is what I’ve learned after survey.

Using urllib package

The native and naive way is to use urllib.request module to download an image.

import urllib.request

url = "https://cdn.pixabay.com/photo/2020/05/12/17/04/wind-turbine-5163993_960_720.jpg"

r = urllib.request.urlopen(url)
with open("wind_turbine.jpg", "wb") as f:
    f.write(r.read())

However, the above code may error out with following message:

urllib.error.HTTPError: HTTP Error 403: Forbidden

In this case, we need to add a HTTP header to the request:

import urllib.request

# The following way works. Ref: https://stackoverflow.com/a/45358832/6064933
req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
with open("wind_turbine.jpg", "wb") as f:
    with urllib.request.urlopen(req) as r:
        f.write(r.read())

Using requests package

A better way is to use requests package. Here is a simple example to download an image using requests:

import requests

url = "https://cdn.pixabay.com/photo/2020/05/12/17/04/wind-turbine-5163993_960_720.jpg"

r = requests.get(url)
with open("wind-turbine.jpg", "wb") as f:
    f.write(r.content)

Downloading large files with streaming

response.iter_content

In the above code, all content of the image will be read into memory at once. If the image is large, it may consume too much memory.

Alternatively, we can set stream parameter to True to stream request. In this case, only the response header is downloaded. We can retrieve the image in a whole using response.content1 or chunk by chunk by using response.iter_content method:

# Using requests to download large files.
with requests.get(url, stream=True) as r:
    with open("wind-turbine.jpg", "wb") as f:
        for chunk in r.iter_content(chunk_size=1024):
            if chunk:
                f.write(chunk)

response.raw

When stream is True, we can also use response.raw to stream the download. response.raw is a file-like object. With the help of shutil.copyfileobj(), we can save the image like this:

# using r.raw
with requests.get(url, stream=True) as r:
    with open("wind-turbine.jpg", "wb") as f:
        r.raw.decode_content = True
        shutil.copyfileobj(r.raw, f)
        # or f.write(r.raw.read())

References


  1. You may want to avoid this for large files! ↩︎