首页>>人工智能->使用pytorch搭建VGGNet神经网络结构(附代码)

使用pytorch搭建VGGNet神经网络结构(附代码)

时间:2023-11-29 本站 点击:1

模型简介

VGGNet由牛津大学计算机视觉组合和Google DeepMind公司研究员一起研发的深度卷积神经网络。它探索了卷积神经网络的深度和其性能之间的关系,通过反复的堆叠33的小型卷积核和22的最大池化层,成功的构建了16~19层深的卷积神经网络。VGGNet获得了ILSVRC 2014年比赛的亚军和定位项目的冠军,在top5上的错误率为7.5%。目前为止,VGGNet依然被用来提取图像的特征。VGGNet 的核心思想是利用较小范围的卷积核,增加网络深度,VGGNet 网络结构清晰简洁,虽然已经发表多年,但目前仍然有非常广泛的应用。

常用的 VGGNet 有 VGG16、VGG19 两种类型。

VGG16 拥有13个核大小为 (3, 3) 的卷积层、5个最大池化层核3个全连接层。 VGG16 拥有16个核大小为 (3, 3) 的卷积层、5个最大池化层核3个全连接层。

模型结构

以下表格截图自VGGNet论文。 我们重点关注D、E两列。

它们分别对应VGG16 和 VGG19两个模型。

模型创新点

VGGNet 的所有卷积层都是(3, 3)的大小。因为连续 3组 (3, 3)的卷积核,效果和(7, 7)的卷积核效果相同。

一个(7, 7)的卷积核,参数个数是 7 x 7 = 49 个。

连续 3 组 (3, 3)的卷积核 参数个数是 3 x 3 x 3 = 27 个。

减少参数数量,加快训练速度。

数据集

接下来要运行的的VGG16 和VGG19 的网络模型, 都使用了该数据集。

Cifar-10 数据集 由6万张32 * 32的彩色图片组成,共10个分类, 每个分类6000张图片。其中训练集5万张,测试集1万张图片。

数据集包含 飞机,手机等10个分类:

VGG16 代码实现

importtorchvision.transformsastransformstrain_tf=transforms.Compose([transforms.Resize((224,224)),transforms.RandomHorizontalFlip(0.5),transforms.ToTensor(),transforms.Normalize([0.49139968,0.48215841,0.44653091],[0.24703223,0.24348513,0.26158784])])valid_tf=transforms.Compose([transforms.Resize((224,224)),transforms.ToTensor(),transforms.Normalize([0.49139968,0.48215841,0.44653091],[0.24703223,0.24348513,0.26158784])])
importtorchfromtorch.utils.dataimportDataLoaderimporttorchvision.datasetsasdsetsimporttorchvision.transformsastransforms#torchvision.transforms中定义了一系列数据转换形式,有PILImage,numpy,Tensor间相互转换,还能对数据进行处理。batch_size=64#MNISTdatasettrain_dataset=dsets.CIFAR10(root='/ml/cifar',#选择数据的根目录train=True,#选择训练集transform=train_tf,#不考虑使用任何数据预处理download=True)#从网络上download图片test_dataset=dsets.CIFAR10(root='/ml/cifar',#选择数据的根目录train=False,#选择测试集transform=valid_tf,#不考虑使用任何数据预处理download=True)#从网络上download图片#加载数据train_loader=torch.utils.data.DataLoader(dataset=train_dataset,batch_size=batch_size,shuffle=True)#将数据打乱test_loader=torch.utils.data.DataLoader(dataset=test_dataset,batch_size=12,shuffle=False)
fromtorchimportnnclassVGG16(nn.Module):def__init__(self,in_dim,num_classes):super().__init__()self.features=nn.Sequential(nn.Conv2d(in_dim,64,kernel_size=3,padding=1),nn.BatchNorm2d(64),nn.ReLU(inplace=True),nn.Conv2d(64,64,kernel_size=3,padding=1),nn.BatchNorm2d(64),nn.ReLU(inplace=True),nn.MaxPool2d(kernel_size=2,stride=2),#224/2=112nn.Conv2d(64,128,kernel_size=3,padding=1),nn.BatchNorm2d(128),nn.ReLU(inplace=True),nn.Conv2d(128,128,kernel_size=3,padding=1),nn.BatchNorm2d(128),nn.ReLU(inplace=True),nn.MaxPool2d(kernel_size=2,stride=2),#112/2=56nn.Conv2d(128,256,kernel_size=3,padding=1),nn.BatchNorm2d(256),nn.ReLU(inplace=True),nn.Conv2d(256,256,kernel_size=3,padding=1),nn.BatchNorm2d(256),nn.ReLU(inplace=True),nn.Conv2d(256,256,kernel_size=3,padding=1),nn.BatchNorm2d(256),nn.ReLU(inplace=True),nn.MaxPool2d(kernel_size=2,stride=2),#56/2=28nn.Conv2d(256,512,kernel_size=3,padding=1),nn.BatchNorm2d(512),nn.ReLU(inplace=True),nn.Conv2d(512,512,kernel_size=3,padding=1),nn.BatchNorm2d(512),nn.ReLU(inplace=True),nn.Conv2d(512,512,kernel_size=3,padding=1),nn.ReLU(inplace=True),nn.MaxPool2d(kernel_size=2,stride=2),#28/2=14nn.Conv2d(512,512,kernel_size=3,padding=1),nn.BatchNorm2d(512),nn.ReLU(inplace=True),nn.Conv2d(512,512,kernel_size=3,padding=1),nn.BatchNorm2d(512),nn.ReLU(inplace=True),nn.Conv2d(512,512,kernel_size=3,padding=1),nn.BatchNorm2d(512),nn.ReLU(inplace=True),nn.MaxPool2d(kernel_size=2,stride=2)#14-2/2+1=7)self.classifier=nn.Sequential(nn.Linear(512*7*7,4096),nn.ReLU(inplace=True),nn.Dropout(0.5),nn.Linear(4096,4096),nn.ReLU(True),nn.Dropout(0.5),nn.Linear(4096,num_classes))defforward(self,x):x=self.features(x)x=x.view(x.size(0),-1)x=self.classifier(x)returnx
device=torch.device('cuda')iftorch.cuda.is_available()elsetorch.device('cpu')net=VGG16(3,10)net.to(device)
importnumpyasnplearning_rate=1e-2#学习率num_epoches=20criterion=nn.CrossEntropyLoss()optimizer=torch.optim.SGD(net.parameters(),lr=learning_rate,momentum=0.9)#使用随机梯度下降net.train()#开启训练模式forepochinrange(num_epoches):print('currentepoch=%d'%epoch)fori,(images,labels)inenumerate(train_loader):#利用enumerate取出一个可迭代对象的内容images=images.to(device)labels=labels.to(device)#print(images.shape)#[batch,channel,width,height]outputs=net(images)#将数据集传入网络做前向计算loss=criterion(outputs,labels)#计算lossoptimizer.zero_grad()#在做反向传播之前先清除下网络状态loss.backward()#loss反向传播optimizer.step()#更新参数ifi%100==0:print('currentloss=%.5f'%loss.item())print('finishedtraining')
#做predictiontotal=0correct=0net.eval()#开启评估模式forimages,labelsintest_loader:images=images.to(device)labels=labels.to(device)outputs=net(images)_,predicts=torch.max(outputs.data,1)total+=labels.size(0)correct+=(predicts==labels).cpu().sum()print(total)print('Accuracy=%.2f'%(100*correct/total))

代码已调试,可运行,最后准确率约88%。

VGG19 代码实现

importtorchvision.transformsastransformstrain_tf=transforms.Compose([transforms.Resize((224,224)),transforms.RandomHorizontalFlip(0.5),transforms.ToTensor(),transforms.Normalize([0.49139968,0.48215841,0.44653091],[0.24703223,0.24348513,0.26158784])])valid_tf=transforms.Compose([transforms.Resize((224,224)),transforms.ToTensor(),transforms.Normalize([0.49139968,0.48215841,0.44653091],[0.24703223,0.24348513,0.26158784])])
importtorchfromtorch.utils.dataimportDataLoaderimporttorchvision.datasetsasdsetsimporttorchvision.transformsastransforms#torchvision.transforms中定义了一系列数据转换形式,有PILImage,numpy,Tensor间相互转换,还能对数据进行处理。batch_size=64#MNISTdatasettrain_dataset=dsets.CIFAR10(root='I:/datasets/cifar',#选择数据的根目录train=True,#选择训练集transform=train_tf,#不考虑使用任何数据预处理download=True)#从网络上download图片test_dataset=dsets.CIFAR10(root='I:/datasets/cifar',#选择数据的根目录train=False,#选择测试集transform=valid_tf,#不考虑使用任何数据预处理download=True)#从网络上download图片#加载数据train_loader=torch.utils.data.DataLoader(dataset=train_dataset,batch_size=batch_size,shuffle=True)#将数据打乱test_loader=torch.utils.data.DataLoader(dataset=test_dataset,batch_size=batch_size,shuffle=False)
fromtorchimportnnclassVGG19(nn.Module):def__init__(self,in_dim,num_classes):super().__init__()self.features=nn.Sequential(nn.Conv2d(in_dim,64,kernel_size=3,padding=1),nn.BatchNorm2d(64),nn.ReLU(inplace=True),nn.Conv2d(64,64,kernel_size=3,padding=1),nn.BatchNorm2d(64),nn.ReLU(inplace=True),nn.MaxPool2d(kernel_size=2,stride=2),nn.Conv2d(64,128,kernel_size=3,padding=1),nn.BatchNorm2d(128),nn.ReLU(inplace=True),nn.Conv2d(128,128,kernel_size=3,padding=1),nn.BatchNorm2d(128),nn.ReLU(inplace=True),nn.MaxPool2d(kernel_size=2,stride=2),nn.Conv2d(128,256,kernel_size=3,padding=1),nn.BatchNorm2d(256),nn.ReLU(inplace=True),nn.Conv2d(256,256,kernel_size=3,padding=1),nn.BatchNorm2d(256),nn.ReLU(inplace=True),nn.Conv2d(256,256,kernel_size=3,padding=1),nn.BatchNorm2d(256),nn.ReLU(inplace=True),nn.Conv2d(256,256,kernel_size=3,padding=1),nn.BatchNorm2d(256),nn.ReLU(inplace=True),nn.MaxPool2d(kernel_size=2,stride=2),nn.Conv2d(256,512,kernel_size=3,padding=1),nn.BatchNorm2d(512),nn.ReLU(inplace=True),nn.Conv2d(512,512,kernel_size=3,padding=1),nn.BatchNorm2d(512),nn.ReLU(inplace=True),nn.Conv2d(512,512,kernel_size=3,padding=1),nn.BatchNorm2d(512),nn.ReLU(inplace=True),nn.Conv2d(512,512,kernel_size=3,padding=1),nn.BatchNorm2d(512),nn.ReLU(inplace=True),nn.MaxPool2d(kernel_size=2,stride=2),nn.Conv2d(512,512,kernel_size=3,padding=1),nn.BatchNorm2d(512),nn.ReLU(inplace=True),nn.Conv2d(512,512,kernel_size=3,padding=1),nn.BatchNorm2d(512),nn.ReLU(inplace=True),nn.Conv2d(512,512,kernel_size=3,padding=1),nn.BatchNorm2d(512),nn.ReLU(inplace=True),nn.Conv2d(512,512,kernel_size=3,padding=1),nn.BatchNorm2d(512),nn.ReLU(inplace=True),nn.MaxPool2d(kernel_size=2,stride=2))self.classifier=nn.Sequential(nn.Linear(512*7*7,4096),nn.ReLU(inplace=True),nn.Dropout(0.5),nn.Linear(4096,4096),nn.ReLU(True),nn.Dropout(0.5),nn.Linear(4096,num_classes))defforward(self,x):x=self.features(x)x=x.view(x.size(0),-1)x=self.classifier(x)returnx
device=torch.device('cuda')iftorch.cuda.is_available()elsetorch.device('cpu')net=VGG19(3,10)net.to(device)
importnumpyasnplearning_rate=1e-2#学习率num_epoches=20criterion=nn.CrossEntropyLoss()optimizer=torch.optim.SGD(net.parameters(),lr=learning_rate,momentum=0.9)#使用随机梯度下降net.train()#开启训练模式forepochinrange(num_epoches):print('currentepoch=%d'%epoch)fori,(images,labels)inenumerate(train_loader):#利用enumerate取出一个可迭代对象的内容images=images.to(device)labels=labels.to(device)#print(images.shape)#[batch,channel,width,height]outputs=net(images)#将数据集传入网络做前向计算loss=criterion(outputs,labels)#计算lossoptimizer.zero_grad()#在做反向传播之前先清除下网络状态loss.backward()#loss反向传播optimizer.step()#更新参数ifi%100==0:print('currentloss=%.5f'%loss.item())print('finishedtraining')
#做predictiontotal=0correct=0net.eval()#开启评估模式forimages,labelsintest_loader:images=images.to(device)labels=labels.to(device)outputs=net(images)_,predicts=torch.max(outputs.data,1)total+=labels.size(0)correct+=(predicts==labels).cpu().sum()print(total)print('Accuracy=%.2f'%(100*correct/total))

参考资料

VGGNet论文


本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。
如若转载,请注明出处:/AI/1102.html