In PyTorch,
Tensor
is the
primary object that we deal with (Variable
is just a thin wrapper class for
Tensor
). In this post, I will give a summary of pitfalls that we should avoid
when using Tensors. Since FloatTensor
and LongTensor
are the most popular
Tensor
types in PyTorch, I will focus on these two data types.
Tensor operations#
Tensor and Tensor operation#
For operations between Tensors, the rule is strict. Both Tensors of the operation must have the same data type, or you will see error messages like
TypeError: sub received an invalid combination of arguments - got (float), but expected one of:
* (int value)
didn't match because some of the arguments have invalid types: (!float!)
* (torch.LongTensor other)
didn't match because some of the arguments have invalid types: (!float!)
* (int value, torch.LongTensor other)
As another example, several loss functions like
CrossEntropyLoss
require that the target should be torch LongTensor
. So before doing
operations, make sure that your input Tensor
types match the function
definitions.
It is easy to convert the type of one Tensor
to another Tensor
. Suppose x
and y
are Tensor
of different types. You can use x.type(y.type())
or
x.type_as(y)
to convert x
to the type of y
.
Tensor and scalar operation#
For FloatTensor
, you can do math operations (multiplication, addition,
division etc.) with a scalar of type int
or float
. But for LongTensor
,
you can only do math operation with int
scalar but not float
.
Why do some losses require target to be LongTensor?#
According to PyTorch
developers,
some use cases requires that the target be LongTensor
type and int just can
not hold the target value.
FloatTensor or DoubleTensor#
For deep learning, precision is not a very important issue. Plus, GPU can not
process double precision very well. So FloatTensor
is enough, which is also
the default type for model parameters.
NumPy array and torch Tensor#
Shared memory or not?#
You can use torch.from_numpy()
method to convert a NumPy array to
corresponding torch Tensor
, which will share underlying memory with NumPy
array. To convert Tensor
x
to NumPy array, use x.numpy()
to convert it to
a NumPy array, which also shares the memory with original Tensor
.
Does torch Tensor
and Numpy array always share the underlying memory? The
short answer is no. If their underlying data type is not compatible, a copy of
original data will be made. For example, if you try to save torch FloatTensor
as numpy array of type np.float64
, it will trigger a deep copy.
Correpsondece between NumPy and torch data type#
It should be noted that not all NumPy arrays can be converted to torch
Tensor
. Below is a table showing NumPy data types which is convertable to
torch Tensor
type.
NumPy data type | Tensor data type |
---|---|
numpy.uint8 | torch.ByteTensor |
numpy.int16 | torch.ShortTensor |
numpy.int32 | torch.IntTensor |
numpy.int | torch.LongTensor |
numpy.int64 | torch.LongTensor |
numpy.float32 | torch.FloatTensor |
numpy.float | torch.DoubleTensor |
numpy.float64 | torch.DoubleTensor |
Speed comparison between NumPy and torch operations#
I am curious to know the speed difference between torch Tensor operation and
equivalent NumPy ndarray operations. I do it in Jupyter-console using the
built-in magic %timeit
.
import torch
import numpy as np
# torch Tensor on CPU
x = torch.rand(1, 64)
y = torch.rand(5000, 64)
%timeit z=(x*y).sum(dim=1)
# torch Tensor on GPU
x, y = x.cuda(), y.cuda()
%timeit z = (x*y).sum(dim=1)
# numpy ndarray on CPU
x = np.random.random((1, 64))
y = np.random.random((5000, 64))
%timeit z = (x*y).sum(axis=1)
The result is listed on the following table:
Data type and device | Average operation time |
---|---|
Tensor on CPU | 938 $\mu s$ |
Tensor on GPU | 38.9 $\mu s$ |
NumPy ndarray (on CPU) | 623 $\mu s$ |
It is pretty clear that Tensor operations on GPU runs orders of magnitute faster than operations on CPU. NumPy, due to its excellent implementation of its core in C, runs a little bit faster than Tensor on CPU.
Convert scalar to torch Tensor#
You can convert a scalar to Tensor
by providing the scalr to the Tensor
constructor, which will not achieve what you want. For
example,torch.Tensor(1)
will not give you a Tensor
which contains float 1.
Instead, the produced Tensor
is something like
1.00000e-20 * 5.4514 [torch.FloatTensor of size 1]
To achieve what you want, you have to provide a list with single element 1 to
the Tensor
constructor.