Preface
One barrier to entry for PyTorch is that it looks like even if you keep the API
(verbs like arange
are shared between both libraries) you're no longer familiar with
some of the quirks and hidden defaults.
Here I'll run through some learnings I have from using PyTorch, and similarities/differences to NumPy.
Working with tensors
The common intro is "think of tensors as multidimensional arrays", but let's cover their practical side.
if [tensors] was the only thing PyTorch provided, we'd basically just be a Numpy clone.
In fact you can convert a PyTorch tensor to a NumPy array with the numpy()
method
(see [pytorchtensormethods#typeconversion]).
Just like in NumPy, we can call the tolist()
method to retrieve values
(and as in NumPy, we get base Python types in that list):
>>> torch.tensor([[1,2,3]]).tolist()
[[1, 2, 3]]
>>> torch.tensor([[1,2,3]])[0].tolist()
[1, 2, 3]
In NumPy, I can get an individual integer (albeit coerced to np.int64
!) stored in an array by
simply indexing into it:
>>> type(np.array([[1,2]])[0,0])
<class 'numpy.int64'>
In PyTorch however, indexing to individual entries in a (tensor) yields another tensor, containing a single value.
>>> torch.tensor([[1,2,3]])[0,0]
tensor(1)
To retrieve that value you call the item()
method:
>>> torch.tensor([[1,2,3]])[0,0].item()
1
>>> type(torch.tensor([[1,2,3]])[0,0].item())
<class 'int'>
Note that you get the base Python type back when calling item()
.
Tensors also have NumPylike fancy indexing
>>> xs = torch.arange(3)
>>> xs
tensor([0, 1, 2])
>>> ys = torch.tensor([[0,1],[2,3],[4,5],[6,7],[8,9]])
>>> ys
tensor([[0, 1],
[2, 3],
[4, 5],
[6, 7],
[8, 9]])
>>> ys[xs]
tensor([[0, 1],
[2, 3],
[4, 5]])
To find the number of items in a NumPy array you access the size
field,
however in PyTorch size()
is a method that returns an instance of the torch.Size
class
(also returned by accessing the shape
field):
>>> torch.tensor([[1,2,3],[4,5,6]]).size()
torch.Size([2, 3])
>>> torch.tensor([[1,2,3],[4,5,6]]).shape
torch.Size([2, 3])
Instead, the nelement()
method gives the number of elements in a PyTorch tensor (it's an alias for
numel()
).
>>> torch.tensor([[1,2,3],[4,5,6]]).numel()
6
>>> torch.tensor([[1,2,3],[4,5,6]]).nelement()
6
Be careful when creating tensors
One thing to remember with tensors is that there are two very similarly named ways to create them:
the torch.Tensor
class and the torch.tensor
(Unhelpfully the headers on the docs are shown in allcapitals!)
The key distinction is that with a capital T, torch.Tensor
will coerce to torch.float32
(singleprecision floating point number) while lowercase torch.tensor
infers dtype
from the data provided.
Hence creating a tensor from some integers will preserve the integer type as the tensor dtype if you
use the lowercase tensor
:
>>> torch.tensor([[1,2,3]])
tensor([[1, 2, 3]])
>>> torch.tensor([[1,2,3]]).dtype
torch.int64
>>> torch.tensor([[1,2,3]])[0,0].item()
1
>>> type(torch.tensor([[1,2,3]])[0,0].item())
<class 'int'>
but coerce to float if you uppercase Tensor
:
>>> torch.Tensor([[1,2,3]])
tensor([[1., 2., 3.]])
>>> torch.Tensor([[1,2,3]]).dtype
torch.float32
>>> torch.Tensor([[1,2,3]])[0,0].item()
1.0
>>> type(torch.Tensor([[1,2,3]])[0,0].item())
<class 'float'>
Functional Python
There's an odd idiom in PyTorch you won't see in NumPy/Pandas code:
>>> import torch.nn.functional as F
Many of the functions defined in this namespace are also present in the torch.nn
namespace,
the difference being that "F
" doesn't handle weights does not have 'state',
or in other words requires you to handle your loss function yourself, whereas in the torch.nn
namespace you get learnable state within the methods you use.
From this you can access many of the "basic building blocks for graphs" (neural net call graphs).
>>> F.
Display all 134 possibilities? (y or n)
F.adaptive_max_pool1d_with_indices( F.grad F.pairwise_distance(
F.adaptive_max_pool2d( F.grid_sample( F.pdist(
F.adaptive_max_pool2d_with_indices( F.group_norm( F.pixel_shuffle(
...
Compare to the torch.nn
namespace:
>>> torch.nn.
Display all 147 possibilities? (y or n)
torch.nn.AdaptiveAvgPool1d( torch.nn.GroupNorm( torch.nn.RNNCellBase(
torch.nn.AdaptiveAvgPool2d( torch.nn.Hardshrink( torch.nn.RReLU(
torch.nn.AdaptiveAvgPool3d( torch.nn.Hardsigmoid( torch.nn.ReLU(
...
(You get the idea!)
To take an example, torch.nn.functional.group_norm()
is a function,
whereas torch.nn.GroupNorm
is a class inheriting from torch.nn.Module
Compare the signatures:
class GroupNorm(torch.nn.modules.module.Module)
 GroupNorm(num_groups: int, num_channels: int, eps: float = 1e05, affine: bool = True) > None

 Applies Group Normalization over a minibatch of inputs as described in
 the paper `Group Normalization <https://arxiv.org/abs/1803.08494>`__
...
Whereas the function's docstring shows it takes weight
and bias
terms:
group_norm(input, num_groups, weight=None, bias=None, eps=1e05)
Applies Group Normalization for last certain number of dimensions.
See :class:`~torch.nn.GroupNorm` for details.
Indexing with onehot matrix multiplication
Accessing a given row of a tensor is equivalent to multiplying it by a onehot vector.
>>> torch.arange(9).reshape(3,3)
tensor([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> torch.arange(9).reshape(3,3)[1]
tensor([3, 4, 5])
>>> F.one_hot(torch.tensor(1), num_classes=3)
tensor([0, 1, 0])
>>> F.one_hot(torch.tensor(1), num_classes=3) @ torch.arange(9).reshape(3,3)
tensor([3, 4, 5])
Trying to index into a row of a floating point tensor in this way, you'll find that dtypes must match: otherwise an error is raised.
>>> torch.arange(9, dtype=torch.float32).reshape(3,3)
tensor([[0., 1., 2.],
[3., 4., 5.],
[6., 7., 8.]])
>>> F.one_hot(torch.tensor(1), num_classes=3)
tensor([0, 1, 0])
>>> F.one_hot(torch.tensor(1), num_classes=3) @ torch.arange(9, dtype=torch.float32).reshape(3,3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: expected scalar type Long but found Float
You also can't use a floating point tensor as the input to the one_hot()
function:
>>> torch.tensor(1.)
tensor(1.)
>>> F.one_hot(torch.tensor(1.), num_classes=3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: one_hot is only applicable to index tensor.
You must do so by casting the onehot vector to fp32 with the float()
method.
>>> F.one_hot(torch.tensor(1), num_classes=3).float()
tensor([0., 1., 0.])
Note that NumPy has no such restriction, it just coerces the array dtype quietly:
>>> np.eye(3)[1]
array([0., 1., 0.])
>>> np.arange(9).reshape(3,3)
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> np.eye(3)[1] @ np.arange(9).reshape(3,3)
array([3., 4., 5.])
>>> np.eye(3, dtype=np.int64)[1] @ np.arange(9).reshape(3,3)
array([3, 4, 5])
Unbinding
In PyTorch you can pull out a dimension into a sequence of tensors without that dimension.
>>> t = torch.tensor([[[1,2,3],[4,5,6]]])
>>> t
tensor([[[1, 2, 3],
[4, 5, 6]]])
>>> t.unbind(dim=1)
(tensor([[1, 2, 3]]), tensor([[4, 5, 6]]))
>>> t.unbind(dim=2)
(tensor([[1, 4]]), tensor([[2, 5]]), tensor([[3, 6]]))
In NumPy you might describe this as 'extracting submatrices' with some combination of np.take
(I can't figure its equivalent out unfortunately!)
 You can equivalently
view()
a torch tensor (see Andrej Karpathy's example and explanation here) which is more efficient, and can automatically infer the necessary dimension for an operation by passing1
as the shape, just like in Numpy'sreshape()
.