2020/05/02

# 前言

Pytorch凭借其成熟性和易用性，受到了大家的欢迎，是目前学术界使用最为广泛的深度学习框架之一。

# 第一部分 如何使用转换脚本

``````from jittor.utils.pytorch_converter import convert

pytorch_code="""
from torch import nn

class AlexNet(nn.Module):

def __init__(self, num_classes=1000):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.ReLU(inplace=True),
nn.ReLU(inplace=True),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=3, stride=2),
)
self.classifier = nn.Sequential(
nn.Dropout(),
nn.Linear(256 * 6 * 6, 4096),
nn.ReLU(inplace=True),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(inplace=True),
nn.Linear(4096, num_classes),
)

def forward(self, x):
x = self.features(x)
x = torch.flatten(x, 1)
x = self.classifier(x)
return x
"""

jittor_code = convert(pytorch_code)
print(jittor_code)
``````

``````import jittor as jt
from jittor import init
from jittor import nn

class AlexNet(nn.Module):

def __init__(self, num_classes=1000):
super(AlexNet, self).__init__()
self.features = nn.Sequential(
nn.ReLU(),
nn.Pool(3, stride=2, op='maximum'),
nn.ReLU(),
nn.Pool(3, stride=2, op='maximum'),
nn.ReLU(),
nn.ReLU(),
nn.ReLU(),
nn.Pool(3, stride=2, op='maximum')
)
self.classifier = nn.Sequential(
nn.Dropout(),
nn.Linear(((256 * 6) * 6), 4096),
nn.ReLU(),
nn.Dropout(),
nn.Linear(4096, 4096),
nn.ReLU(),
nn.Linear(4096, num_classes)
)

def execute(self, x):
x = self.features(x)
x = jt.flatten(x, start_dim=1)
x = self.classifier(x)
return x
``````

# 第二部分 转换原理

`pytorch_converter.py`里有一张映射表`pjmap`，通过这张表可以将`Pytorch`的函数转换为`Jittor`的函数。

``````class AvgPool2d(Module):
...

def forword(self, x):
...
``````

``````class Pool(Module):
def __init__(self, kernel_size, stride=None, padding=0, dilation=None, return_indices=None, ceil_mode=False, op="maximum"):
...

def execute(self, x):
...
``````

``````# Pytorch的函数名称
'AvgPool2d': {
'pytorch': {
'args': 'kernel_size, stride=None, padding=0, dilation=1, return_indices=False', # Pytorch的参数
},
'jittor': {
'module': 'nn', # 该函数在Jittor的哪个module
'name': 'Pool', # Jittor对应的函数名称
'args': 'kernel_size, stride=None, padding=0, dilation=None, return_indices=None, ceil_mode=False, op="maximum"' # Jittor参数
},
'extras': {
"op": "'mean'",
},
},
``````

`links`用于参数名称不一样，但是代表的含义一样的情况，可以使用`links`把参数对应起来。例如uniform_的参数名称不同的，但是含义相同，就可以按照下面的写法来写。

``````'uniform_': {
'pytorch': {
'args': "tensor, a=0.0, b=1.0",
},
'jittor': {
'module': 'init',
'name': 'uniform_',
'args': 'var, low, high'
},
'links': {'tensor': 'var', 'a': 'low', 'b': 'high'},
'extras': {},
},
``````

`delete`用于删除一些参数，因为这些参数在`Jittor`中不再使用。例如`ReLU`中的`inplace``Jittor`中已不再使用，可以直接把它添加到`delete`中。

``````'ReLU': {
'pytorch': {
'args': 'inplace=False',
},
'jittor': {
'module': 'nn',
'name': 'ReLU',
'args': ''
},
'extras': {},
'delete': ['inplace'],
},
``````

• `extras`：用于给变量额外赋值
• `links`：用于将名称不一样但含义相同的参数对应起来
• `delete`：用于删除Jittor不再使用的参数

• `pytorch_func_name`: Pytorch函数名称
• `pytorch_args`: Pytorch参数列表
• `jittor_func_module`: Jittor函数属于哪个Module
• `jittor_func_name`: Jittor函数名称
• `jittor_args`: Jittor参数列表
• `extras`: 参数赋值
• `links`: 连接参数
• `delete`: 删除参数

``````from jittor.utils.pytorch_converter import pjmap_append
pjmap_append(pytorch_func_name='AvgPool2d',
jittor_func_module='nn',
jittor_func_name='Pool',
jittor_args='kernel_size, stride=None, padding=0, dilation=None, return_indices=None, ceil_mode=False, op="maximum"',
extras={"op": "'mean'"})
``````

# 第三部分 时间性能测试方法

``````import torch
import jittor as jt
jt.flags.use_cuda = 1

# 定义numpy输入矩阵
bs = 32
test_img = np.random.random((bs,3,224,224)).astype('float32')

# 定义 pytorch & jittor 输入矩阵
pytorch_test_img = torch.Tensor(test_img).cuda()
jittor_test_img = jt.array(test_img)

# 跑turns次前向求平均值
turns = 100

# 定义 pytorch & jittor 的xxx模型，如vgg
pytorch_model = xxx().cuda()
jittor_model = xxx()

# 把模型都设置为eval来防止dropout层对输出结果的随机影响
pytorch_model.eval()
jittor_model.eval()

# jittor加载pytorch的初始化参数来保证参数完全相同
``````

``````# 测试Pytorch一次前向传播的平均用时
for i in range(10):
pytorch_result = pytorch_model(pytorch_test_img) # Pytorch热身
torch.cuda.synchronize()
sta = time.time()
for i in range(turns):
pytorch_result = pytorch_model(pytorch_test_img)
torch.cuda.synchronize() # 只有运行了torch.cuda.synchronize()才会真正地运行，时间才是有效的，因此执行forward前后都要执行这句话
end = time.time()
tc_time = round((end - sta) / turns, 5) # 执行turns次的平均时间，输出时保留5位小数
tc_fps = round(bs * turns / (end - sta),0) # 计算FPS
print(f"- Pytorch {key} forward average time cost: {tc_time}, Batch Size: {bs}, FPS: {tc_fps}")
``````

``````# 测试Jittor一次前向传播的平均用时
for i in range(10):
jittor_result = jittor_model(jittor_test_img) # Jittor热身
jittor_result.sync()
jt.sync_all(True)
# sync_all(true)是把计算图发射到计算设备上，并且同步。只有运行了jt.sync_all(True)才会真正地运行，时间才是有效的，因此执行forward前后都要执行这句话
sta = time.time()
for i in range(turns):
jittor_result = jittor_model(jittor_test_img)
jittor_result.sync() # sync是把计算图发送到计算设备上
jt.sync_all(True)
end = time.time()
jt_time = round((time.time() - sta) / turns, 5) # 执行turns次的平均时间，输出时保留5位小数
jt_fps = round(bs * turns / (end - sta),0) # 计算FPS
print(f"- Jittor {key} forward average time cost: {jt_time}, Batch Size: {bs}, FPS: {jt_fps}")
``````

``````threshold = 1e-3
# 计算 pytorch & jittor 前向结果相对误差. 如果误差小于threshold，则测试通过.
x = pytorch_result.detach().cpu().numpy() + 1
y = jittor_result.numpy() + 1
relative_error = abs(x - y) / abs(y)
diff = relative_error.mean()
assert diff < threshold, f"[*] {yourmodelname} forward fails..., Relative Error: {diff}"
print(f"[*] {yourmodelname} forword passes with Relative Error {diff}")
``````

# 参考文献

1. Krizhevsky, Alex. “One weird trick for parallelizing convolutional neural networks.” arXiv preprint arXiv:1404.5997 (2014).
2. Simonyan, Karen, and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition.” arXiv preprint arXiv:1409.1556 (2014).
3. He, Kaiming, et al. “Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
4. Iandola, Forrest N., et al. “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size.” arXiv preprint arXiv:1602.07360 (2016).
5. Zagoruyko, Sergey, and Nikos Komodakis. “Wide residual networks.” arXiv preprint arXiv:1605.07146 (2016).