Pytorch

一、Pytorch介绍

PyTorch是一个开源的Python机器学习库，基于Torch，用于自然语言处理等应用程序。
2017年1月，由Facebook人工智能研究院（FAIR）基于Torch推出了PyTorch。它是一个基于Python的可续计算包，提供两个高级功能：1、具有强大的GPU加速的张量计算（如NumPy）。2、包含自动求导系统的深度神经网络。

二、Pytorch基础

2.1、张量Tensor

张量是一种特殊的数据结构，与数组和矩阵非常相似。在PyTorch中，我们使用张量对模型的输入和输出以及模型的参数进行编码。

张量类似于NumPy的ndarray，除了张量可以在 GPU 或其他硬件加速器上运行。事实上，张量和NumPy数组通常可以共享相同的底层内存，从而无需复制数据。``

2.1.1 创建张量

直接从数据中创建张量

data = [[1, 2], [3, 4], [5, 6]]
data_t = torch.tensor(data)
print(f"Tensor from Data:\n {data_t} \n")
 
# Tensor from Data:
#  tensor([[1, 2],
#         [3, 4]])

从numpy数据创建张量

np_array = np.array(data)
x_np = torch.from_numpy(np_array)
print(f"Tensor from Numpy:\n {x_np} \n")

# Tensor from Numpy:
#  tensor([[1, 2],
#         [3, 4],
#         [5, 6]], dtype=torch.int32)

2.1.2 张量的属性

张量的属性包括形状、数据类型和存储设备等。

tensor = torch.rand(3, 4)
print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

# Shape of tensor: torch.Size([3, 4])
# Datatype of tensor: torch.float32
# Device tensor is stored on: cpu

2.1.3 张量的操作

PyTorch中有100 多种张量运算，包括算术、线性代数、矩阵操作（转置、索引、切片）、采样等，而且这些操作中都可以在 GPU 上运行（通常以比 CPU 更高的速度）。
默认情况下，张量是在 CPU 上创建的。我们需要使用 .to方法明确地将张量移动到 GPU（在检查 GPU 可用性之后）。

# 将张量移动到GPU上
if torch.cuda.is_available():
    tensor = tensor.to("cuda")

2.1.3.1 切片索引

tensor = torch.ones(4, 4)
print(f"First row: {tensor[0]}")
print(f"First column: {tensor[:, 0]}")
print(f"Last column: {tensor[..., -1]}")
tensor[:, 1] = 0
print(tensor)

# First row: tensor([1., 1., 1., 1.])
# First column: tensor([1., 1., 1., 1.])
# Last column: tensor([1., 1., 1., 1.])
# tensor([[1., 0., 1., 1.],
#         [1., 0., 1., 1.],
#         [1., 0., 1., 1.],
#         [1., 0., 1., 1.]])

2.1.3.2 连接张量

t1 = torch.cat([tensor, tensor, tensor], dim=1) # 在第1个维度拼接，即水平方向
print(t1)
 
# tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
#         [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
#         [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
#         [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.]])

2.1.3.3 算数运算

# 矩阵相乘，y1、y2和y3的值相同
y1 = tensor @ tensor.T
y2 = tensor.matmul(tensor.T)
 
y3 = torch.rand_like(tensor)
torch.matmul(tensor, tensor.T, out=y3)
print(y1)
 
# tensor([[3., 3., 3., 3.],
#         [3., 3., 3., 3.],
#         [3., 3., 3., 3.],
#         [3., 3., 3., 3.]])

# 矩阵逐元素相乘，z1、z2和z3的值相同
z1 = tensor * tensor
z2 = tensor.mul(tensor)
 
z3 = torch.rand_like(tensor)
torch.mul(tensor, tensor, out=z3)
print(z1)
# tensor([[1., 0., 1., 1.],
#         [1., 0., 1., 1.],
#         [1., 0., 1., 1.],
#         [1., 0., 1., 1.]])

2.1.3.4 单元素张量

只有一个元素的张量，可以通过item()方法转换为数值

agg = tensor.sum()
agg_item = agg.item()
print(agg_item, type(agg_item))
 
# 12.0 <class 'float'>

2.1.3.5 就地操作

将结果存储到操作数中的操作称为就地操作。他们是由_后缀标识的。如：x.copy_(y)，x.t_()，会改变x的值。

print(f"{tensor} \n")
tensor.add_(5)
print(tensor)
 
# tensor([[1., 0., 1., 1.],
#         [1., 0., 1., 1.],
#         [1., 0., 1., 1.],
#         [1., 0., 1., 1.]]) 
# 
# tensor([[6., 5., 6., 6.],
#         [6., 5., 6., 6.],
#         [6., 5., 6., 6.],
#         [6., 5., 6., 6.]])

就地操作可以节省一些内存，但在计算导数时可能会出现问题，因为会立即丢失历史记录。因此不建议使用。

2.2 张量与numpy

在CPU上的张量和NumPy数组共享它们的内存位置，改变一个会改变另一个。
张量转换为NumPy数组：

t = torch.ones(5)
print(f"t: {t}")
n = t.numpy()
print(f"n: {n}")
 
# t: tensor([1., 1., 1., 1., 1.])
# n: [1. 1. 1. 1. 1.]

改变张量的值，numpy数组的值也随之更改。（内存共享）

t.add_(1)
print(f"t: {t}")
print(f"n: {n}")
 
# t: tensor([2., 2., 2., 2., 2.])
# n: [2. 2. 2. 2. 2.]

NumPy数组转换为张量：

n = np.ones(5)
print(f"n: {n}")
t = torch.from_numpy(n)
print(f"t: {t}")
 
# n: [1. 1. 1. 1. 1.]
# t: tensor([1., 1., 1., 1., 1.], dtype=torch.float64)

同理，改变numpy数组的值，张量的值也随之更改。

三、Autograd：自动求导

PyTorch的核心是autograd包。我们首先简单的了解一些，然后用PyTorch开始训练第一个神经网络。autograd为所有用于Tensor的operation提供自动求导的功能。我们通过一些简单的例子来学习它基本用法。

3.1 从自动求导看Tensor

torch.Tensor 是这个包的核心类。如果它的属性requires_grad是True，那么PyTorch就会追踪所有与之相关的operation。当完成(正向)计算之后，我们可以调用backward()，PyTorch会自动的把所有的梯度都计算好。与这个tensor相关的梯度都会累加到它的grad属性里。

如果不想计算这个tensor的梯度，我们可以调用detach()，这样它就不会参与梯度的计算了。为了阻止PyTorch记录用于梯度计算相关的信息(从而节约内存)，我们可以使用 with torch.no_grad()。这在模型的预测时非常有用，因为预测的时候我们不需要计算梯度，否则我们就得一个个的修改Tensor的requires_grad属性，这会非常麻烦。

关于autograd的实现还有一个很重要的Function类。Tensor和Function相互连接从而形成一个有向无环图, 这个图记录了计算的完整历史。每个tensor有一个grad_fn属性来引用创建这个tensor的Function(用户直接创建的Tensor，这些Tensor的grad_fn是None)。

如果你想计算梯度，可以对一个Tensor调用它的backward()方法。如果这个Tensor是一个scalar(只有一个数)，那么调用时不需要传任何参数。如果Tensor多于一个数，那么需要传入和它的shape一样的参数，表示反向传播过来的梯度。

创建tensor时设置属性requires_grad=True，PyTorch就会记录用于反向梯度计算的信息：

x = torch.ones(2, 2, requires_grad=True)
print(x)

# tensor([[1., 1.],
# [1., 1.]], requires_grad=True)

然后我们通过operation产生新的tensor：

y = x + 2
print(y)
# tensor([[3., 3.],
# [3., 3.]], grad_fn=<AddBackward0>)

y是通过operation产生的tensor，因此它的grad_fn不是None。

print(y.grad_fn)
# <AddBackward0 object at 0x7f35409a68d0>

再通过y得到z和out

z = y * y * 3
out = z.mean()

print(z, out)
# z = tensor([[ 27.,  27.],[ 27.,  27.]])
# out = tensor(27.)

requires_grad_()函数会修改一个Tensor的requires_grad。

a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

# False
# True
# <SumBackward0 object at 0x000001F99DD08130>

3.2 梯度

现在我们里反向计算梯度。因为out是一个scalar，因此out.backward()等价于out.backward(torch.tensor(1))。

out.backward()

我们可以打印梯度d(out)/dx：


print(x.grad)
# tensor([[ 4.5000,  4.5000],
# [ 4.5000,  4.5000]])

手动验算结果:
$$
out=\frac{1}{4}\sum_iz_i \
z_i = 3(x_i+2)^2 \
z_i|_{x_i=1}=27
$$

因此：
$$
\frac{\partial out}{\partial x_i}=\frac{3}{2}(x_i+2)
$$

因此：
$$
\frac{\partial out}{\partial x_i}|_{x_i=1}=\frac{9}{2}=4.5
$$

我们也可以用autograd做一些很奇怪的事情！比如y和x的关系是while循环的关系(似乎很难用一个函数直接表示y和x的关系？对x不断平方直到超过1000，这是什么函数？)

x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

print(y)
# tensor([ -692.4808,  1686.1211,   667.7313])
gradients = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(gradients)

print(x.grad)
# tensor([  102.4000,  1024.0000,     0.1024])

我们可以使用”with torch.no_grad()”来停止梯度的计算：

print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
    print((x ** 2).requires_grad)

# True
# True
# False

目录CONTENT

Pytorch学习-介绍