5. 深度学习计算

5.1 层和块

5.2 参数管理

\[ w\sim \left\{ \begin{array}{l@{\quad}r} U(5,10)&\text{可能性$\frac 1 4$}\\ 0&\text{可能性$\frac 1 2$}\\ U(-10,-5)&\text{可能性$\frac 1 4$} \end{array} \right. \tag{5.2.1} \]

选择架构并设置了超参数之后，我便进入训练阶段。目标是使得算是函数最小化的模型参数值，经过训练后，使用这些参数来做出未来的预测。

另外，有时也需要提取参数以便在其他环境中复用

访问参数，用于调试、诊断和可视化
参数初始化
在不同模型共享参数

首先看具有单隐藏层的多层感知器

import torch
from torch import nn
net=nn.Sequential(nn.Linear(4,8),nn.ReLU(),nn.Linear(8,1))
X=torch.rand(size(2,4))
net(x)

5.2.1 参数访问

我们从已有的的模型访问参数例如

1	print(net[2]state_dict())

也可以

如果直接print(net)那么

Sequential( (0): Linear(in_features=4, out_features=8, bias=True) (1): ReLU() (2): Linear(in_features=8, out_features=1, bias=True) )

当然也可以

1 2	for i in range(len(net)): print(net[i].state_dict())

OrderedDict([('weight', tensor([[-0.0986, 0.2455, -0.0860, 0.0680], [-0.1355, 0.1110, -0.2777, -0.2812], [ 0.0556, -0.1747, 0.4240, 0.0055], [ 0.4844, -0.2566, -0.2516, -0.0411], [-0.4566, -0.2377, -0.0310, -0.1946], [-0.1937, 0.4383, -0.2517, 0.3974], [ 0.3601, 0.0696, -0.1477, -0.1017], [ 0.0562, 0.1064, -0.1053, -0.0964]])), ('bias', tensor([ 0.1770, 0.3808, -0.0480, -0.2350, -0.1038, -0.3598, 0.3858, 0.2915]))]) OrderedDict() OrderedDict([('weight', tensor([[ 0.2666, -0.3127, -0.2842, -0.0652, -0.0895, 0.2961, -0.1265, -0.0633]])), ('bias', tensor([-0.2793]))])

可以见的如果每一层网络都有权重bias,前一层m*n那么后一层为n*k

显然这个模型的输出为1个,输入为4个

5.2.1.1 目标参数

每个参数都表示为参数类的一个实例.要对参数执行任何操作,首先我们需要访问底层参数的数值。

例如

1
2
3

print(type(net[2].bias))
print(net[2].bias)
print(net[2].bias.data)

<class 'torch.nn.parameter.Parameter'> Parameter containing: tensor([-0.2793], requires_grad=True) tensor([-0.2793])

参数是符合的对象，包含值、梯度、和额外信息

5.2.1.2 一次访问所有参数

1 2	print([(name,param.shape) for name,param in net[0].named_parameters()]) print([(name,param.shape) for name,param in net.named_parameters()])

('weight', torch.Size([8, 4])) ('bias', torch.Size([8])) ('0.weight', torch.Size([8, 4])) ('0.bias', torch.Size([8])) ('2.weight', torch.Size([1, 8])) ('2.bias', torch.Size([1]))

5.2.1.3 从嵌套块中收集参数

如果将多个块相互嵌套，参数命名约定是如何工作的。

首先定义一个“块工厂”，然后将这些块组合到更大的块中。

def block1():
    return nn.Sequential(nn.Linear(4,8),nn.ReLU(),nn.Linear(8,4),nn.ReLU())
def block2():
    net-nn.Sequential()
    for i in range(4):
        net.add_module(f'block{i}',block1())
    return net
rgnet=nn.Sequential(block2(),nn.Linear(4,1))
rgnet(X)

tensor([[-0.1032], [-0.1033]], grad_fn=)

block1构建包含2个线性层，2个非线性relu，4*4，

block2通过循环构建4个（block1)+一个线性层，最终为4*1，网络层数17

最后将X通过该网络得到输出

1	print(rgnet)

Sequential( (0): Sequential( (block0): Sequential( (0): Linear(in_features=4, out_features=8, bias=True) (1): ReLU() (2): Linear(in_features=8, out_features=4, bias=True) (3): ReLU() ) (block1): Sequential( (0): Linear(in_features=4, out_features=8, bias=True) (1): ReLU() (2): Linear(in_features=8, out_features=4, bias=True) (3): ReLU() ) (block2): Sequential( (0): Linear(in_features=4, out_features=8, bias=True) (1): ReLU() (2): Linear(in_features=8, out_features=4, bias=True) (3): ReLU() ) (block3): Sequential( (0): Linear(in_features=4, out_features=8, bias=True) (1): ReLU() (2): Linear(in_features=8, out_features=4, bias=True) (3): ReLU() ) ) (1): Linear(in_features=4, out_features=1, bias=True) )

5.2.2 参数初始化

默认情况下，PyTorch会根据一个范围均匀地初始化权重和偏置矩阵，这个范围根据输入输出维度计算得来

5.2.2.1 内置初始化

将所有权重参数初始化为标准差为0.01的高斯随机变量，且偏置参数设置为0

def init_normal(m):
    if(type(m))==nn.Linear:
        nn.init.normal_(m.weight,mean=0,std=0.01)
        nn.init.zeros_(m.bias)
net.apply(init_normal)
net[0].weight.data[0],net[0].bias.data[0]

若设置常数，则为

def init_constant(m):
    if type(m) == nn.Linear:
        nn.init.constant_(m.weight, 1)
        nn.init.zeros_(m.bias)
net.apply(init_constant)
net[0].weight.data[0], net[0].bias.data[0]

5.2.2.2 自定义初始化

有时候深度学习框架没有提供我们的初始化方法，使用我们设定的特定分布来为任意权重参数$w$定义初始化方法： $$

def my_init(m):
    if type(m) == nn.Linear:
        print("Init", *[(name, param.shape)
                        for name, param in m.named_parameters()][0])
        nn.init.uniform_(m.weight, -10, 10)
        m.weight.data *= m.weight.data.abs() >= 5
		#上一行即为自定义参数
net.apply(my_init)
net[0].weight[:2]

5.2.3 参数绑定

希望在多个层之间共享参数

可以定义一个稠密层，然后使用它的参数来设置另一个层参数

# 我们需要给共享层一个名称，以便可以引用它的参数
shared = nn.Linear(8, 8)
net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(),
                    shared, nn.ReLU(),
                    shared, nn.ReLU(),
                    nn.Linear(8, 1))
net(X)
# 检查参数是否相同
print(net[2].weight.data[0] == net[4].weight.data[0])
net[2].weight.data[0, 0] = 100
# 确保它们实际上是同一个对象，而不只是有相同的值
print(net[2].weight.data[0] == net[4].weight.data[0])

5.3 延后初始化

深度学习框架无法判断网络的输入维度是什么$$延后初始化(defers initialization)：

数据第一次通过模型传递时，框架才会动态地推断出每层的大小

5.4 自定义层

5.4.1 不带参数的层

首先我们构建一个没有任何参数的自定义层

import torch 
import torch.nn.functional as f
from torch import nn
class CenteredLayer(nn.Module):
    def __init__(self):
        super().__init__()
    def forward(self,X):
        return X-X.mean()
layer = CenteredLayer()
layer(torch.FloatTensor([1, 2, 3, 4, 5]))

tensor([-2., -1., 0., 1., 2.])

现在，我们可以将层作为组件合并到更复杂的模型中

1
2
3

net=nn.Sequential(nn.Linear(8,128),CenteredLayer())
Y = net(torch.rand(4, 8))
Y.mean()

tensor(-9.3132e-09, grad_fn=)

解释:

创建自定义层，对经过的样本的每个特征数据减去mean，确保E为0

然后将一个8*128的全连接层，将8个特征映射成128个

创建net,然后将随机生成的具有8个特征的4个样本，经过网络，求均值

5.4.2 带参数的层

全连接:权重、偏置

in_units,units分别表述输入数，输出数

class MyLinear(nn.Module):
    def __init__(self,in_units,units):
        super().__init__()
        self.weight=nn.Parameter(torch.randn(in_units,units))
        self.bias=nn.Parameter(torch.randn(units,))
    def forward(self,X):
        #Y=XW+b
        linear=torch.matmul(X,self.weight.data)+self.bias.data
        return F.relu(linear)
linear=MyLinear(5,3)
linear.weightpy

Parameter containing: tensor([[-1.9495, -0.6294, 0.2030], [ 0.6147, 0.1423, 0.9114], [ 0.0044, -0.2396, 0.0970], [ 1.2450, 1.0536, 0.4277], [ 0.7913, -0.5266, 2.0368]], requires_grad=True)

使用自定层进行前向传播计算

1	linear(torch.rand(10,5))

tensor([[0.1387, 1.0461, 3.2200], [0.0000, 0.8358, 1.6702], [0.0000, 1.0186, 1.4001], [0.1124, 1.4258, 2.9308], [0.2609, 0.7889, 2.3094], [0.7560, 1.1164, 2.2255], [0.2433, 0.5997, 3.4911], [1.4168, 1.4965, 3.3127], [0.4558, 1.6162, 2.4079], [0.0000, 0.8643, 1.9909]])

5.5 读写文件

5.5.1 加载和保存张量

对于单个张量，可以直接调用load和save函数分别读写。

import torch
from torch import nn
from torch.nn import functional as F
X=torch.arange(4)
torch.save(X,'x-file')
x2=torch.load('x-file')
x2

tensor([0, 1, 2, 3])

5.5.2 加载和保存模型参数

构建多层感知器，然后创建实例，保存参数。

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(20, 256)
        self.output = nn.Linear(256, 10)

    def forward(self, x):
        return self.output(F.relu(self.hidden(x)))

net = MLP()
X = torch.randn(size=(2, 20))
Y = net(X)
torch.save(net.state_dict(), 'mlp.params')

创建一个实例，读取参数

1
2
3

clone = MLP()
clone.load_state_dict(torch.load('mlp.params'))
clone.eval()

MLP( (hidden): Linear(in_features=20, out_features=256, bias=True) (output): Linear(in_features=256, out_features=10, bias=True) )