In order to for PyTorch to use host GPU inside a Docker container, their versions must match.
Use correct nvidia-cuda docker as base image#
First, in order to use GPU, we can not just use a regular docker image as the base image,
we need to use images provided by nvidia/cuda
. For example, to use CUDA 10.1 as the base image:
FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
#... other build steps follows
Otherwise, even if you have installed PyTorch inside the container, torch.cuda.is_available()
is still False
.
PyTorch and torchvision versions#
Another point to remember is to install the right version of PyTorch that supports your CUDA versions. If you install PyTorch without specifying the version, the latest one is installed. It will have higher requirement for CUDA version. If you try to use CUDA, then you see error like this:
To check which version of CUDA torch is built with, use torch.version.cuda
(source here).
Find suitable PyTorch version#
To find the PyTorch version built with CUDA 10.1, in the PyTorch stable release page, search cu101
.
We know that there is v1.6.0 built with CUDA 10.1, then we can run the following command to install v1.6.0:
pip install torch==1.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
Match torchvision with torch version#
The torchvision package version should also match version of torch. In this page, we can see a table of version correspondence. For example, for torch 1.6.0, torchvision 0.7.0 is fine. So our final install command:
pip install torch==1.6.0+cu101 torchvision==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html