# PyTorch for NumPy fans

## Preface

One barrier to entry for PyTorch is that it looks like even if you keep the API (verbs like `arange` are shared between both libraries) you're no longer familiar with some of the quirks and hidden defaults.

Here I'll run through some learnings I have from using PyTorch, and similarities/differences to NumPy.

## Working with tensors

The common intro is "think of tensors as multi-dimensional arrays", but let's cover their practical side.

if [tensors] was the only thing PyTorch provided, we'd basically just be a Numpy clone.

In fact you can convert a PyTorch tensor to a NumPy array with the `numpy()` method (see [pytorch-tensor-methods#type-conversion]).

Just like in NumPy, we can call the `tolist()` method to retrieve values (and as in NumPy, we get base Python types in that list):

``````>>> torch.tensor([[1,2,3]]).tolist()
[[1, 2, 3]]
>>> torch.tensor([[1,2,3]]).tolist()
[1, 2, 3]
``````

In NumPy, I can get an individual integer (albeit coerced to `np.int64`!) stored in an array by simply indexing into it:

``````>>> type(np.array([[1,2]])[0,0])
<class 'numpy.int64'>
``````

In PyTorch however, indexing to individual entries in a (tensor) yields another tensor, containing a single value.

``````>>> torch.tensor([[1,2,3]])[0,0]
tensor(1)
``````

To retrieve that value you call the `item()` method:

``````>>> torch.tensor([[1,2,3]])[0,0].item()
1
>>> type(torch.tensor([[1,2,3]])[0,0].item())
<class 'int'>
``````

Note that you get the base Python type back when calling `item()`.

Tensors also have NumPy-like fancy indexing

``````>>> xs = torch.arange(3)
>>> xs
tensor([0, 1, 2])
>>> ys = torch.tensor([[0,1],[2,3],[4,5],[6,7],[8,9]])
>>> ys
tensor([[0, 1],
[2, 3],
[4, 5],
[6, 7],
[8, 9]])
>>> ys[xs]
tensor([[0, 1],
[2, 3],
[4, 5]])
``````

To find the number of items in a NumPy array you access the `size` field, however in PyTorch `size()` is a method that returns an instance of the `torch.Size` class (also returned by accessing the `shape` field):

``````>>> torch.tensor([[1,2,3],[4,5,6]]).size()
torch.Size([2, 3])
>>> torch.tensor([[1,2,3],[4,5,6]]).shape
torch.Size([2, 3])
``````

Instead, the `nelement()` method gives the number of elements in a PyTorch tensor (it's an alias for `numel()`).

``````>>> torch.tensor([[1,2,3],[4,5,6]]).numel()
6
>>> torch.tensor([[1,2,3],[4,5,6]]).nelement()
6
``````

## Be careful when creating tensors

One thing to remember with tensors is that there are two very similarly named ways to create them: the `torch.Tensor` class and the `torch.tensor`

The key distinction is that with a capital T, `torch.Tensor` will coerce to `torch.float32` (single-precision floating point number) while lower-case `torch.tensor` infers `dtype` from the data provided.

Hence creating a tensor from some integers will preserve the integer type as the tensor dtype if you use the lower-case `tensor`:

``````>>> torch.tensor([[1,2,3]])
tensor([[1, 2, 3]])
>>> torch.tensor([[1,2,3]]).dtype
torch.int64
>>> torch.tensor([[1,2,3]])[0,0].item()
1
>>> type(torch.tensor([[1,2,3]])[0,0].item())
<class 'int'>
``````

but coerce to float if you upper-case `Tensor`:

``````>>> torch.Tensor([[1,2,3]])
tensor([[1., 2., 3.]])
>>> torch.Tensor([[1,2,3]]).dtype
torch.float32
>>> torch.Tensor([[1,2,3]])[0,0].item()
1.0
>>> type(torch.Tensor([[1,2,3]])[0,0].item())
<class 'float'>
``````

## Functional Python

There's an odd idiom in PyTorch you won't see in NumPy/Pandas code:

``````>>> import torch.nn.functional as F
``````

Many of the functions defined in this namespace are also present in the `torch.nn` namespace, the difference being that "`F`" doesn't handle weights does not have 'state', or in other words requires you to handle your loss function yourself, whereas in the `torch.nn` namespace you get learnable state within the methods you use.

From this you can access many of the "basic building blocks for graphs" (neural net call graphs).

``````>>> F.
Display all 134 possibilities? (y or n)
...
``````

Compare to the `torch.nn` namespace:

``````>>> torch.nn.
Display all 147 possibilities? (y or n)
...
``````

(You get the idea!)

To take an example, `torch.nn.functional.group_norm()` is a function, whereas `torch.nn.GroupNorm` is a class inheriting from `torch.nn.Module`

Compare the signatures:

``````class GroupNorm(torch.nn.modules.module.Module)
|  GroupNorm(num_groups: int, num_channels: int, eps: float = 1e-05, affine: bool = True) -> None
|
|  Applies Group Normalization over a mini-batch of inputs as described in
|  the paper `Group Normalization <https://arxiv.org/abs/1803.08494>`__
...
``````

Whereas the function's docstring shows it takes `weight` and `bias` terms:

``````group_norm(input, num_groups, weight=None, bias=None, eps=1e-05)
Applies Group Normalization for last certain number of dimensions.

See :class:`~torch.nn.GroupNorm` for details.
``````

## Indexing with one-hot matrix multiplication

Accessing a given row of a tensor is equivalent to multiplying it by a one-hot vector.

``````>>> torch.arange(9).reshape(3,3)
tensor([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> torch.arange(9).reshape(3,3)
tensor([3, 4, 5])
>>> F.one_hot(torch.tensor(1), num_classes=3)
tensor([0, 1, 0])
>>> F.one_hot(torch.tensor(1), num_classes=3) @ torch.arange(9).reshape(3,3)
tensor([3, 4, 5])
``````

Trying to index into a row of a floating point tensor in this way, you'll find that dtypes must match: otherwise an error is raised.

``````>>> torch.arange(9, dtype=torch.float32).reshape(3,3)
tensor([[0., 1., 2.],
[3., 4., 5.],
[6., 7., 8.]])
>>> F.one_hot(torch.tensor(1), num_classes=3)
tensor([0, 1, 0])
>>> F.one_hot(torch.tensor(1), num_classes=3) @ torch.arange(9, dtype=torch.float32).reshape(3,3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: expected scalar type Long but found Float
``````

You also can't use a floating point tensor as the input to the `one_hot()` function:

``````>>> torch.tensor(1.)
tensor(1.)
>>> F.one_hot(torch.tensor(1.), num_classes=3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: one_hot is only applicable to index tensor.
``````

You must do so by casting the one-hot vector to fp32 with the `float()` method.

``````>>> F.one_hot(torch.tensor(1), num_classes=3).float()
tensor([0., 1., 0.])
``````

Note that NumPy has no such restriction, it just coerces the array dtype quietly:

``````>>> np.eye(3)
array([0., 1., 0.])
>>> np.arange(9).reshape(3,3)
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> np.eye(3) @ np.arange(9).reshape(3,3)
array([3., 4., 5.])
>>> np.eye(3, dtype=np.int64) @ np.arange(9).reshape(3,3)
array([3, 4, 5])
``````

## Unbinding

In PyTorch you can pull out a dimension into a sequence of tensors without that dimension.

``````>>> t = torch.tensor([[[1,2,3],[4,5,6]]])
>>> t
tensor([[[1, 2, 3],
[4, 5, 6]]])
>>> t.unbind(dim=1)
(tensor([[1, 2, 3]]), tensor([[4, 5, 6]]))
>>> t.unbind(dim=2)
(tensor([[1, 4]]), tensor([[2, 5]]), tensor([[3, 6]]))
``````

In NumPy you might describe this as 'extracting submatrices' with some combination of `np.take` (I can't figure its equivalent out unfortunately!)

• You can equivalently `view()` a torch tensor (see Andrej Karpathy's example and explanation here) which is more efficient, and can automatically infer the necessary dimension for an operation by passing `-1` as the shape, just like in Numpy's `reshape()`.