In this post, I will summarize how to convert PDF to images using the pdftoppm command line tool.
conda install -c conda-forge poppler
On Windows, the
pdftoppm tool will be installed in
ANACONDA_ROOT/Library/bin. You should add this directory to the Windows PATH.
How to use
To convert a single page of PDF to image, you can use the following command:
pdftoppm -singlefile -f 4 -r 72 -jpeg -jpegopt quality=90 presentation.pdf test_poppler
The PDF file we want to convert to images is
presentation.pdf. The generated
image name prefix is
test_poppler. The image extension is decided by the
exported image format. A little explanation of the options:
-singlefile: only convert one page of PDF. It is used together with the
-foption to convert a single PDF page.
-f: index of the PDF page you want to convert. The page index starts at 1.
-r: image DPI in both x and y direction. If you want to set DPI in x and y direction separately, use
-jpeg: convert PDF page to JPEG format.
-jpegopt: option used when convert PDF pages to JPEG images. For options and their meanings, see here.
According to my test, the
pdftoppm command works great and is quick to
produce the need images.
If you want to use Python, there is also a pdf2image which is a wrapper around pdftoppm. Make sure you have installed pdftoppm and set its PATH correctly.
In the following script, I show an example on how to use the package.
from pdf2image import convert_from_path def main(): pages = convert_from_path("presentation.pdf", first_page=2, single_file=True) pages.save("test_pdf2image.jpg", quality=85) if __name__ == "__main__": main()
convert_from_path() will convert the PDF to a list of PIL Image
object. You can
then manipulate the images with the powerful functionality provided by the
- Parsing pdfs using Python.
- pdftoppm man page.
- Install poppler on Windows.
Note that older version of pdftoppm only support PPM and PNG format. Newer versions support export to JPEG and TIFF format image. You should check wheter exporting to JPEG is supported by using
pdftoppm --helpin the command line. ↩︎
License CC BY-NC-ND 4.0