PyTorch for NumPy fans

Different APIs, same old scientific Python


One barrier to entry for PyTorch is that it looks like even if you keep the API (verbs like arange are shared between both libraries) you're no longer familiar with some of the quirks and hidden defaults.

Here I'll run through some learnings I have from using PyTorch, and similarities/differences to NumPy.

Working with tensors

The common intro is "think of tensors as multi-dimensional arrays", but let's cover their practical side.

Edward Z Yang has noted,

if [tensors] was the only thing PyTorch provided, we'd basically just be a Numpy clone.

In fact you can convert a PyTorch tensor to a NumPy array with the numpy() method (see [pytorch-tensor-methods#type-conversion]).

Just like in NumPy, we can call the tolist() method to retrieve values (and as in NumPy, we get base Python types in that list):

>>> torch.tensor([[1,2,3]]).tolist()
[[1, 2, 3]]
>>> torch.tensor([[1,2,3]])[0].tolist()
[1, 2, 3]

In NumPy, I can get an individual integer (albeit coerced to np.int64!) stored in an array by simply indexing into it:

>>> type(np.array([[1,2]])[0,0])
<class 'numpy.int64'>

In PyTorch however, indexing to individual entries in a (tensor) yields another tensor, containing a single value.

>>> torch.tensor([[1,2,3]])[0,0]

To retrieve that value you call the item() method:

>>> torch.tensor([[1,2,3]])[0,0].item()
>>> type(torch.tensor([[1,2,3]])[0,0].item())
<class 'int'>

Note that you get the base Python type back when calling item().

Tensors also have NumPy-like fancy indexing

>>> xs = torch.arange(3)
>>> xs
tensor([0, 1, 2])
>>> ys = torch.tensor([[0,1],[2,3],[4,5],[6,7],[8,9]])
>>> ys
tensor([[0, 1],
        [2, 3],
        [4, 5],
        [6, 7],
        [8, 9]])
>>> ys[xs]
tensor([[0, 1],
        [2, 3],
        [4, 5]])

To find the number of items in a NumPy array you access the size field, however in PyTorch size() is a method that returns an instance of the torch.Size class (also returned by accessing the shape field):

>>> torch.tensor([[1,2,3],[4,5,6]]).size()
torch.Size([2, 3])
>>> torch.tensor([[1,2,3],[4,5,6]]).shape
torch.Size([2, 3])

Instead, the nelement() method gives the number of elements in a PyTorch tensor (it's an alias for numel()).

>>> torch.tensor([[1,2,3],[4,5,6]]).numel()
>>> torch.tensor([[1,2,3],[4,5,6]]).nelement()

Be careful when creating tensors

One thing to remember with tensors is that there are two very similarly named ways to create them: the torch.Tensor class and the torch.tensor

(Unhelpfully the headers on the docs are shown in all-capitals!)

The key distinction is that with a capital T, torch.Tensor will coerce to torch.float32 (single-precision floating point number) while lower-case torch.tensor infers dtype from the data provided.

Hence creating a tensor from some integers will preserve the integer type as the tensor dtype if you use the lower-case tensor:

>>> torch.tensor([[1,2,3]])
tensor([[1, 2, 3]])
>>> torch.tensor([[1,2,3]]).dtype
>>> torch.tensor([[1,2,3]])[0,0].item()
>>> type(torch.tensor([[1,2,3]])[0,0].item())
<class 'int'>

but coerce to float if you upper-case Tensor:

>>> torch.Tensor([[1,2,3]])
tensor([[1., 2., 3.]])
>>> torch.Tensor([[1,2,3]]).dtype
>>> torch.Tensor([[1,2,3]])[0,0].item()
>>> type(torch.Tensor([[1,2,3]])[0,0].item())
<class 'float'>

Functional Python

There's an odd idiom in PyTorch you won't see in NumPy/Pandas code:

>>> import torch.nn.functional as F

Many of the functions defined in this namespace are also present in the torch.nn namespace, the difference being that "F" doesn't handle weights does not have 'state', or in other words requires you to handle your loss function yourself, whereas in the torch.nn namespace you get learnable state within the methods you use.

From this you can access many of the "basic building blocks for graphs" (neural net call graphs).

>>> F.
Display all 134 possibilities? (y or n)
F.adaptive_max_pool1d_with_indices(    F.grad                        F.pairwise_distance(
F.adaptive_max_pool2d(                 F.grid_sample(                F.pdist(
F.adaptive_max_pool2d_with_indices(    F.group_norm(                 F.pixel_shuffle(

Compare to the torch.nn namespace:

>>> torch.nn.
Display all 147 possibilities? (y or n)
torch.nn.AdaptiveAvgPool1d(              torch.nn.GroupNorm(         torch.nn.RNNCellBase(
torch.nn.AdaptiveAvgPool2d(              torch.nn.Hardshrink(        torch.nn.RReLU(
torch.nn.AdaptiveAvgPool3d(              torch.nn.Hardsigmoid(       torch.nn.ReLU(

(You get the idea!)

To take an example, torch.nn.functional.group_norm() is a function, whereas torch.nn.GroupNorm is a class inheriting from torch.nn.Module

Compare the signatures:

class GroupNorm(torch.nn.modules.module.Module)
 |  GroupNorm(num_groups: int, num_channels: int, eps: float = 1e-05, affine: bool = True) -> None
 |  Applies Group Normalization over a mini-batch of inputs as described in
 |  the paper `Group Normalization <>`__

Whereas the function's docstring shows it takes weight and bias terms:

group_norm(input, num_groups, weight=None, bias=None, eps=1e-05)
    Applies Group Normalization for last certain number of dimensions.

    See :class:`~torch.nn.GroupNorm` for details.

Indexing with one-hot matrix multiplication

Accessing a given row of a tensor is equivalent to multiplying it by a one-hot vector.

>>> torch.arange(9).reshape(3,3)
tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])
>>> torch.arange(9).reshape(3,3)[1]
tensor([3, 4, 5])
>>> F.one_hot(torch.tensor(1), num_classes=3)
tensor([0, 1, 0])
>>> F.one_hot(torch.tensor(1), num_classes=3) @ torch.arange(9).reshape(3,3)
tensor([3, 4, 5])

Trying to index into a row of a floating point tensor in this way, you'll find that dtypes must match: otherwise an error is raised.

>>> torch.arange(9, dtype=torch.float32).reshape(3,3)
tensor([[0., 1., 2.],
        [3., 4., 5.],
        [6., 7., 8.]])
>>> F.one_hot(torch.tensor(1), num_classes=3)
tensor([0, 1, 0])
>>> F.one_hot(torch.tensor(1), num_classes=3) @ torch.arange(9, dtype=torch.float32).reshape(3,3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: expected scalar type Long but found Float

You also can't use a floating point tensor as the input to the one_hot() function:

>>> torch.tensor(1.)
>>> F.one_hot(torch.tensor(1.), num_classes=3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: one_hot is only applicable to index tensor.

You must do so by casting the one-hot vector to fp32 with the float() method.

>>> F.one_hot(torch.tensor(1), num_classes=3).float()
tensor([0., 1., 0.])

Note that NumPy has no such restriction, it just coerces the array dtype quietly:

>>> np.eye(3)[1]
array([0., 1., 0.])
>>> np.arange(9).reshape(3,3)
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
>>> np.eye(3)[1] @ np.arange(9).reshape(3,3)
array([3., 4., 5.])
>>> np.eye(3, dtype=np.int64)[1] @ np.arange(9).reshape(3,3)
array([3, 4, 5])


In PyTorch you can pull out a dimension into a sequence of tensors without that dimension.

>>> t = torch.tensor([[[1,2,3],[4,5,6]]])
>>> t
tensor([[[1, 2, 3],
         [4, 5, 6]]])
>>> t.unbind(dim=1)
(tensor([[1, 2, 3]]), tensor([[4, 5, 6]]))
>>> t.unbind(dim=2)
(tensor([[1, 4]]), tensor([[2, 5]]), tensor([[3, 6]]))

In NumPy you might describe this as 'extracting submatrices' with some combination of np.take (I can't figure its equivalent out unfortunately!)