引用

本笔记为斯坦福大学课程CS231n的课堂+课后作业的笔记。由于理解能力有限,有部分内容将借鉴于国内大神整理的优秀的笔记来进行进一步的理解与学习,本笔记只适用于个人,强烈建议观看大神整理好的笔记。写得非常好,如果看原视频不清楚的朋友,可以先详细的阅读该大佬的笔记,再去看原视频课程。

CS231n课程笔记翻译:线性分类笔记(上) - 杜客的文章 - 知乎

CS231n课程笔记翻译:线性分类笔记(中) - 杜客的文章 - 知乎

CS231n课程笔记翻译:线性分类笔记(下) - 杜客的文章 - 知乎

笔记将会结合个人所理解的方式来进行整理,方便本人在外阅读和查看(为此特地将博客设计成响应式布局)

课后作业的部分可以登录网易云课程进行观看:

https://study.163.com/course/courseLearn.htm?courseId=1003223001&from=study#/learn/text?lessonId=1050980818&courseId=1003223001

参考了lightaime的代码:https://github.com/lightaime/cs231n

本笔记只针对个人理解的方向为出发点记录,所以一些地方如果我理解比较透彻了,就不会记录太多。

课堂笔记

线性分类器

线性分类器计算图像中3个颜色通道中所有像素的值与权重的矩阵乘,从而得到分类分值。根据我们对权重设置的值,对于图像中的某些位置的某些颜色,函数表现出喜好或者厌恶(根据每个权重的符号而定)。举个例子,可以想象“船”分类就是被大量的蓝色所包围(对应的就是水)。那么“船”分类器在蓝色通道上的权重就有很多的正权重(它们的出现提高了“船”分类的分值),而在绿色和红色通道上的权重为负的就比较多(它们的出现降低了“船”分类的分值)。

avatar

如上图中。为了便于可视化,假设图像只有4个像素(都是黑白像素,这里不考虑RGB通道),有3个分类(红色代表猫,绿色代表狗,蓝色代表船,注意,这里的红、绿和蓝3种颜色仅代表分类,和RGB通道没有关系)。首先将图像像素拉伸为一个列向量,与W进行矩阵乘,然后得到各个分类的分值。需要注意的是,这个W一点也不好:猫分类的分值非常低。从上图来看,算法倒是觉得这个图像是一只狗。

将图像看做高维度的点:既然图像被伸展成为了一个高维度的列向量,那么我们可以把图像看做这个高维度空间中的一个点(即每张图像是3072维空间中的一个点)。整个数据集就是一个点的集合,每个点都带有1个分类标签。既然定义每个分类类别的分值是权重和图像的矩阵乘,那么每个分类类别的分数就是这个空间中的一个线性函数的函数值。我们没办法可视化3072维空间中的线性函数,但假设把这些维度挤压到二维,那么就可以看看这些分类器在做什么了:

avatar

权重和偏差

权重和偏差关系公式定义如下,这是我们经常要使用到的。

avatar

损失函数 Loss function

上面我们对一张猫的图片进行分类,结果猫的评分结果得分是非常低的,猫:-96.8分、狗437.9分、船61.95分。后面两个错误的答案反而比较高。我们将使用损失函数(Loss Function)(也叫代价函数Cost Function或目标函数Objective)来衡量我们对结果的不满意程度。当评分函数输出结果与真实结果之间差异越大,损失函数输出越大,反之越小。

多类支持向量机损失 Multiclass Support Vector Machine Loss

损失函数有很多种,我们先来看看SVM损失函数。SVM主要是在正确的分类上的得分始终比不正确的得分高出一个临界值△。我们把损失函数看做是一个人,这个人有自己的品味,如果某个结果能使得损失值更低,那么SVM就更加的喜欢它。

我们将图像x中的第i个图像记为x[i],而正确的标签记为y[i]。使用评分函数输入图像数据得到结果后,通过公式s = f(x[i] , W)来计算不同分类类别的分值

avatar

假设一个图像在通过计算后,得到了分值s = [3.2 ,5.1 , -1.7] ,如果正确的分数是第一个3.7,那么我们设置△为1,则有如下公式:

L[i] = max(0 , 5.1 - 3.2 + 1) + max( 0 , -1.7 - 3.2 + 1 ) 
      = max(0 , 0.29) + max(0 , -3.9) = 29 + 0 
      = 2.9

该公式是选出错误的值来,通过计算后保留大于0的值,然后进行加法运算。可以看到第二数-1.7经过计算后,为负数,所以被取0。

那么在这次的模型中,我们面对的是线性评分函数f(x[i],W)=Wx[i],所以我们可以将损失函数的公式稍微改写一下:

avatar

将权重W的第j行进行转置为列向量后再进行运算,那么这样就不必使用一开始的那个评分函数来得到评分了。

课后作业 Q2

线性SVM分类器

线性分类器给样本分类,每一个可能类一个分数,正确的分类分数应该比错误的分类分数大,为了使分类器在分类未知样本的时候,鲁棒性更高点,我们希望正确分类的分数比错误分数分数大得多一点,使用我们设置一个阈值△,让正确分类的分数至少比错误分数大△,这是我们期望的安全距离。者就用到了hinge损失函数。

avatar

代码部分,先对数据集做预处理

import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt

# This is a bit of magic to make matplotlib figures appear inline in the
# notebook rather than in a new window.
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# Some more magic so that the notebook will reload external python modules;
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)


# As a sanity check, we print out the size of the training and test data.
print ('Training data shape: ', X_train.shape)
print ('Training labels shape: ', y_train.shape)
print ('Test data shape: ', X_test.shape)
print ( 'Test labels shape: ', y_test.shape)
Training data shape:  (50000, 32, 32, 3)
Training labels shape:  (50000,)
Test data shape:  (10000, 32, 32, 3)
Test labels shape:  (10000,)
# Visualize some examples from the dataset.
# We show a few examples of training images from each class.
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = len(classes)
samples_per_class = 7
for y, cls in enumerate(classes):
    idxs = np.flatnonzero(y_train == y)
    idxs = np.random.choice(idxs, samples_per_class, replace=False)
    for i, idx in enumerate(idxs):
        plt_idx = i * num_classes + y + 1
        plt.subplot(samples_per_class, num_classes, plt_idx)
        plt.imshow(X_train[idx].astype('uint8'))
        plt.axis('off')
        if i == 0:
            plt.title(cls)
plt.show()

png

我们将数据分割为训练集、验证集和测试集。另外我们创建一个小的“开发集”的子集,算法开发时可以使用这个开发集加快我们的代码运行速度。

# Split the data into train, val, and test sets. In addition we will
# create a small development set as a subset of the training data;
# we can use this for development so our code runs faster.
num_training = 49000
num_validation = 1000
num_test = 1000
num_dev = 500

# Our validation set will be num_validation points from the original
# training set.
mask = range(num_training, num_training + num_validation)
X_val = X_train[mask]
y_val = y_train[mask]

# Our training set will be the first num_train points from the original
# training set.
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]

# We will also make a development set, which is a small subset of
# the training set.
mask = np.random.choice(num_training, num_dev, replace=False)
X_dev = X_train[mask]
y_dev = y_train[mask]

# We use the first num_test points of the original test set as our
# test set.
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]

print ('Train data shape: ', X_train.shape)
print ('Train labels shape: ', y_train.shape)
print ('Validation data shape: ', X_val.shape)
print ('Validation labels shape: ', y_val.shape)
print ('Test data shape: ', X_test.shape)
print ('Test labels shape: ', y_test.shape)
Train data shape:  (49000, 32, 32, 3)
Train labels shape:  (49000,)
Validation data shape:  (1000, 32, 32, 3)
Validation labels shape:  (1000,)
Test data shape:  (1000, 32, 32, 3)
Test labels shape:  (1000,)
# 转换成二维数据
# Preprocessing: reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))

# As a sanity check, print out the shapes of the data
print( 'Training data shape: ', X_train.shape)
print( 'Validation data shape: ', X_val.shape)
print( 'Test data shape: ', X_test.shape)
print( 'dev data shape: ', X_dev.shape)
Training data shape:  (49000, 3072)
Validation data shape:  (1000, 3072)
Test data shape:  (1000, 3072)
dev data shape:  (500, 3072)
# 预处理,减去图像的平均值
# Preprocessing: subtract the mean image
# first: compute the image mean based on the training data
mean_image = np.mean(X_train, axis=0)
print (mean_image[:10]) # print a few of the elements
plt.figure(figsize=(4,4))
plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # visualize the mean image
plt.show()
[130.64189796 135.98173469 132.47391837 130.05569388 135.34804082
 131.75402041 130.96055102 136.14328571 132.47636735 131.48467347]

png

# 然后训练集和测试集图像分别减去均值
# second: subtract the mean image from train and test data
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image
X_dev -= mean_image
# 最后在X中添加一列1作为偏置维度,这样我们优化的时候只要考虑一个权重矩阵W就可以了
# third: append the bias dimension of ones (i.e. bias trick) so that our SVM
# only has to worry about optimizing a single weight matrix W.
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])

print (X_train.shape, X_val.shape, X_test.shape, X_dev.shape)
(49000, 3073) (1000, 3073) (1000, 3073) (500, 3073)

SVM分类器

完成compute_loss_naive方法

请打开classifiers/linear_svm.py文件,按照要求补充compute_loss_naive中的代码

# 评估我们给你的loss的朴素的现实

from cs231n.classifiers.linear_svm import svm_loss_naive
import time

# 生成一个很小的SVM随机权重矩阵
# 真的很小,先标准正太随机在乘0.00001

W = np.random.rand(3073,10)*0.00001

loss,grad = svm_loss_naive(W,X_dev,y_dev,0.000005)
print('loss:%f',(loss,))

W
loss:%f (9.017356318273132,)





array([[9.93865772e-06, 7.56864193e-06, 5.89375652e-06, ...,
        6.77515037e-06, 9.10267182e-06, 6.48091044e-07],
       [1.27947426e-06, 5.75214312e-06, 2.09397700e-06, ...,
        5.36760790e-07, 4.77017524e-06, 2.28127052e-06],
       [5.18547255e-06, 4.19426411e-06, 1.92434361e-06, ...,
        2.84480154e-06, 2.94404937e-06, 2.89615838e-06],
       ...,
       [7.56519950e-06, 4.57873627e-07, 6.21950921e-06, ...,
        8.03356542e-06, 2.40128822e-07, 8.33492116e-06],
       [1.10009417e-06, 5.41340929e-06, 2.37965695e-07, ...,
        2.98218212e-06, 3.22471411e-06, 8.73107731e-06],
       [3.86575240e-07, 7.78574769e-06, 1.92784563e-06, ...,
        1.55515629e-06, 3.14074781e-06, 5.88494769e-06]])

上面返回的grad全是0。推到并实现svm损失函数,写在compute_loss_naive中

为了验证是否掌握了梯度算法,可以用数值计算估算损失的梯度,然后比较数值与你用的方程的计算值,参考以下代码:

import numpy as np
from random import shuffle

def svm_loss_naive(W, X, y, reg):
  """
  Structured SVM loss function, naive implementation (with loops).

  Inputs have dimension D, there are C classes, and we operate on minibatches
  of N examples.

  Inputs:
  - W: A numpy array of shape (D, C) containing weights.
  - X: A numpy array of shape (N, D) containing a minibatch of data.
  - y: A numpy array of shape (N,) containing training labels; y[i] = c means
    that X[i] has label c, where 0 <= c < C.
  - reg: (float) regularization strength

  Returns a tuple of:
  - loss as single float
  - gradient with respect to weights W; an array of same shape as W
  """
"""
   使用循环实现SVM损失函数
   输入维数为D,有C类,我们使用N个样本作为一批数据的输入

   -W:一个numpy array ,形状为(D,C),存储权重
   -X:一个numpy array,形状为(N,D),存储一个小批数据
   -y:有个numpy array,形状为(N,)存储训练标签。y[i] = c表示x[i]的标签 0<=c<=C
   -reg:float,正则化强度
"""
  dW = np.zeros(W.shape) # initialize the gradient as zero

  # compute the loss and the gradient
  # 计算损失和梯度
  num_classes = W.shape[1]
  num_train = X.shape[0]
  loss = 0.0
  for i in range(num_train):
    scores = X[i].dot(W)
    correct_class_score = scores[y[i]]
    for j in range(num_classes):
      if j == y[i]:
        continue
      margin = scores[j] - correct_class_score + 1 # note delta = 1
      if margin > 0:
        loss += margin
        dW[:, j] += X[i]
        dW[:, y[i]] -= X[i]

  # Right now the loss is a sum over all training examples, but we want it
  # to be an average instead so we divide by num_train.
  # 现在是损失函数是所有训练样本的和,但是我们要的是他的均值,所以除num_train取均值
  loss /= num_train
  dW /= num_train

  # 添加正则项
  # Add regularization to the loss.
  loss += reg * np.sum(W * W)
  dW += reg * W * 2
  #############################################################################
  # TODO:                                                                     #
  # Compute the gradient of the loss function and store it dW.                #
  # Rather that first computing the loss and then computing the derivative,   #
  # it may be simpler to compute the derivative at the same time that the     #
  # loss is being computed. As a result you may need to modify some of the    #
  # code above to compute the gradient.                                       #
  #
  # 任务:
  # 计算损失函数的梯度,并存储在dW中,相比先计算loss在计算梯度
  # 同时计算loss和梯度会更加简单,所以你可能修改上面的代码来计算梯度
  #############################################################################


  return loss, dW
# 实现梯度后,运行下面的代码重新计算梯度
# 输出的是grad_check_sparse函数的结果,2种情况下,两种算法可以看出,2种算法误差已经几乎等于0
loss ,grad = svm_loss_naive(W,X_dev,y_dev,0.0)

# 对随机选的几个维度进行数值梯度计算,并把他和你的计算进行比较,所有维度用过几乎都相等
from cs231n.gradient_check import grad_check_sparse

f = lambda w : svm_loss_naive(W,X_dev,y_dev,0.0)[0]
grad_numerical = grad_check_sparse(f , W , grad)

# 再次验证梯度,使用正则化梯度
print('turn on reg')
loss , grad = svm_loss_naive(W , X_dev , y_dev , 5e1)
f = lambda w : svm_loss_naive(W,X_dev , y_dev , 5e1)[0]
grad_numerical = grad_check_sparse(f , W , grad)
numerical: 2.349021 analytic: 2.349021, relative error: 1.180449e-10
numerical: 4.804235 analytic: 4.804235, relative error: 5.747734e-12
numerical: 1.327000 analytic: 1.327000, relative error: 1.575626e-10
numerical: 3.088571 analytic: 3.088571, relative error: 7.266453e-11
numerical: -2.552727 analytic: -2.552727, relative error: 1.547920e-11
numerical: 5.595337 analytic: 5.595337, relative error: 3.142359e-11
numerical: -17.572298 analytic: -17.572298, relative error: 7.754295e-12
numerical: 19.780034 analytic: 19.780034, relative error: 1.729385e-11
numerical: 16.555597 analytic: 16.555597, relative error: 8.886292e-13
numerical: -35.181096 analytic: -35.181096, relative error: 4.288773e-12
turn on reg
numerical: 9.329604 analytic: 9.329604, relative error: 4.093753e-11
numerical: 38.069406 analytic: 38.069406, relative error: 4.712668e-12
numerical: -2.631211 analytic: -2.631211, relative error: 2.318176e-11
numerical: 25.142093 analytic: 25.142093, relative error: 1.526339e-11
numerical: -0.256059 analytic: -0.256059, relative error: 2.609563e-09
numerical: 0.825197 analytic: 0.825197, relative error: 1.383475e-10
numerical: 1.610363 analytic: 1.610363, relative error: 5.504740e-11
numerical: 7.908189 analytic: 7.908189, relative error: 1.615372e-12
numerical: -17.057089 analytic: -17.057089, relative error: 2.334215e-11
numerical: 3.835728 analytic: 3.835728, relative error: 7.055238e-11

随堂练习1:

偶尔会出现梯度验证的时候,某个维度不一致,为什么呢?这是我们需要考虑一个因素吗?梯度验证失败的简单例子是?

提示:SVM损失函数没有被严格证明是可导的。

答:

解析和数解的区别,数解值是用前后两个很小的随机尺度比如0.00001进行计算,当Loss不可导时,两者会出现差异。比如S[y[i]]刚好比S[j]大1。

完成svm_loss_vectorized方法

现在先计算loss,等一下完成梯度

def svm_loss_vectorized(W, X, y, reg):
  """
  Structured SVM loss function, vectorized implementation.

  Inputs and outputs are the same as svm_loss_naive.
  """
  """
      结构化SVM损失函数,使用向量来实现
      输入和输出和svm_loss_naive一致
  """

  loss = 0.0
  # initialize the gradient as zero
  # 初始化梯度为0
  dW = np.zeros(W.shape) 

  #############################################################################
  # TODO:                                                                     #
  # Implement a vectorized version of the structured SVM loss, storing the    #
  # result in loss.                                                           #
  #
  # 任务:
  # 结构化SVM损失函数的向量版本,将损失存储在loss变量中。
  #############################################################################
  score = X.dot(W)
  correct_class_score = score[range(X.shape[0]), y]
  correct_class_score = correct_class_score.reshape(X.shape[0], -1)
  margin = score - correct_class_score + 1
  margin = np.maximum(margin, 0)
  margin[range(X.shape[0]), y] = 0
  loss = np.sum(margin)
  loss /= X.shape[0]
  loss += reg * np.sum(W * W)

  ########################Sum#####################################################
  #                             END OF YOUR CODE                              #
  #############################################################################


  #############################################################################
  # TODO:                                                                     #
  # Implement a vectorized version of the gradient for the structured SVM     #
  # loss, storing the result in dW.                                           #
  #                                                                           #
  # Hint: Instead of computing the gradient from scratch, it may be easier    #
  # to reuse some of the intermediate values that you used to compute the     #
  # loss.   
  #
  # 任务:
  # 使用向量计算结构化SVM损失函数的梯度,把结果保存在dW
  # 提示:不一定从头计算梯度,可能重用计算loss时的一些中间结果会更简单
  #############################################################################
  margin[margin > 0] = 1
  rowSum = np.sum(margin, axis=1)
  margin[range(margin.shape[0]), y] = -rowSum
  dW = np.dot(X.T, margin) / X.shape[0]
  dW += reg * W * 2

  #############################################################################
  #                             END OF YOUR CODE                              #
  #############################################################################

  return loss, dW

回到notebook中

# Next implement the function svm_loss_vectorized; for now only compute the loss;
# we will implement the gradient in a moment.
tic = time.time()
loss_naive, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.00001)
toc = time.time()
print ('Naive loss: %e computed in %fs' % (loss_naive, toc - tic))

from cs231n.classifiers.linear_svm import svm_loss_vectorized
tic = time.time()
loss_vectorized, _ = svm_loss_vectorized(W, X_dev, y_dev, 0.00001)
toc = time.time()
print ('Vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic))

# The losses should match but your vectorized implementation should be much faster.
print ('difference: %f' % (loss_naive - loss_vectorized))
Naive loss: 9.023527e+00 computed in 0.106438s
Vectorized loss: 9.023527e+00 computed in 0.003772s
difference: 0.000000
# Complete the implementation of svm_loss_vectorized, and compute the gradient
# of the loss function in a vectorized way.
# 使用向量的方式来计算损失函数的梯度

# The naive implementation and the vectorized implementation should match, but
# the vectorized version should still be much faster.
# 朴素方法和向量法结果应该是一样的,但是向量法会更快一点

tic = time.time()
_, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.00001)
toc = time.time()
print( 'Naive loss and gradient: computed in %fs' % (toc - tic))

tic = time.time()
_, grad_vectorized = svm_loss_vectorized(W, X_dev, y_dev, 0.00001)
toc = time.time()
print ('Vectorized loss and gradient: computed in %fs' % (toc - tic))

# The loss is a single number, so it is easy to compare the values computed
# by the two implementations. The gradient on the other hand is a matrix, so
# we use the Frobenius norm to compare them.
difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
print( 'difference: %f' % difference)
Naive loss and gradient: computed in 0.112543s
Vectorized loss and gradient: computed in 0.003465s
difference: 0.000000

完成SGD函数LinearClassifier.train()

def train(self, X, y, learning_rate=1e-3, reg=1e-5, num_iters=100,
            batch_size=200, verbose=False):
    """
    Train this linear classifier using stochastic gradient descent.

    Inputs:
    - X: A numpy array of shape (N, D) containing training data; there are N
      training samples each of dimension D.
    - y: A numpy array of shape (N,) containing training labels; y[i] = c
      means that X[i] has label 0 <= c < C for C classes.
    - learning_rate: (float) learning rate for optimization.
    - reg: (float) regularization strength.
    - num_iters: (integer) number of steps to take when optimizing
    - batch_size: (integer) number of training examples to use at each step.
    - verbose: (boolean) If true, print progress during optimization.

    Outputs:
    A list containing the value of the loss function at each training iteration.
    """

    """
    使用随机梯度下降来训练这个分类。
    输入:
    -X:numpy array ,形状为(N,D),存储训练的数据,共有N个训练数据,每个维度有D个训练样本
    -Y:numpy array ,形状为(N,),存储训练数据标签。y[i]=c表示x[i]的标签为在C类别中0<=c<C
    -learning rate : float,优化的学习率
    -reg:float ,正则化强度
    -num_iters:inter,优化时训练的步数
    -batch_size : integer,每一步使用的训练样本数
    -verbose:boolean,若为真,优化打印过程

    输出:
    一个存储每次训练的损失函数值的list
    """

    num_train, dim = X.shape
    # assume y takes values 0...K-1 where K is number of classes
    # 假设y的值是0...K-1,其中K是类别数量
    num_classes = np.max(y) + 1 
    if self.W is None:
      # lazily initialize W
      self.W = 0.001 * np.random.randn(dim, num_classes)

    # Run stochastic gradient descent to optimize W
    # 使用梯度随机下降优化W
    loss_history = []
    for it in range(num_iters):
        X_batch = None
        y_batch = None

      #########################################################################
      # TODO:                                                                 #
      # Sample batch_size elements from the training data and their           #
      # corresponding labels to use in this round of gradient descent.        #
      # Store the data in X_batch and their corresponding labels in           #
      # y_batch; after sampling X_batch should have shape (dim, batch_size)   #
      # and y_batch should have shape (batch_size,)                           #
      #                                                                       #
      # Hint: Use np.random.choice to generate indices. Sampling with         #
      # replacement is faster than sampling without replacement.              #
      #
      # 任务:
      # 从训练集中采用batch_size个样本和对应的标签,在这一轮梯度下降使用
      # 把数据存储在X_batch中,把对应的标签存储在y_batch中
      # 采用后,X_batch的形状(dim,batch_size),y_batch的形状为(dim,batch_size)
      # 
      # 提示:用np.random.choice 来生成indices。有放回采样的速度无比放回采样的速度要更快
      #########################################################################
        idxs = np.random.choice(num_train, batch_size)
        X_batch = X[idxs, :]
        y_batch = y[idxs]
      #########################################################################
      #                       END OF YOUR CODE                                #
      #########################################################################

      # evaluate loss and gradient
        loss, grad = self.loss(X_batch, y_batch, reg)
        loss_history.append(loss)

      # perform parameter update
      #########################################################################
      # TODO:                                                                 #
      # Update the weights using the gradient and the learning rate.          #
      # 使用梯度和学习绿更新权重
      #########################################################################
        self.W += - learning_rate * grad
      #########################################################################
      #                       END OF YOUR CODE                                #
      #########################################################################

    if verbose and it % 100 == 0:
        print('iteration %d / %d: loss %f' % (it, num_iters, loss))

    return loss_history

回到notebook

from cs231n.classifiers import LinearSVM
svm = LinearSVM()
tic = time.time()
loss_hist = svm.train(X_train, y_train, learning_rate=1e-7, reg=5e4,num_iters=1500, verbose=True)
toc = time.time()
print( 'That took %fs' % (toc - tic))
iteration 0 / 1500: loss 1545.066031
iteration 100 / 1500: loss 208.050508
iteration 200 / 1500: loss 32.622470
iteration 300 / 1500: loss 9.515445
iteration 400 / 1500: loss 6.096698
iteration 500 / 1500: loss 5.704047
iteration 600 / 1500: loss 5.456522
iteration 700 / 1500: loss 6.016809
iteration 800 / 1500: loss 5.202152
iteration 900 / 1500: loss 5.296231
iteration 1000 / 1500: loss 5.584359
iteration 1100 / 1500: loss 5.772275
iteration 1200 / 1500: loss 5.622784
iteration 1300 / 1500: loss 5.495696
iteration 1400 / 1500: loss 5.581319
That took 3.586979s
# 我们将数据画出来
plt.plot(loss_hist)
plt.xlabel('Iteration number')
plt.ylabel('Loss value')
plt.show()

png

# 编写函数LinearSVM.predict,评估训练集和验证集的表现
y_train_pred = svm.predict(X_train)
print ('training accuracy: %f' % (np.mean(y_train == y_train_pred), ))
y_val_pred = svm.predict(X_val)
print( 'validation accuracy: %f' % (np.mean(y_val == y_val_pred), ))
training accuracy: 0.357122
validation accuracy: 0.368000

使用验证集去调整超参数(正则化强度和学习率)

# 使用验证集去调整超参数(正则化强度和学习率),你尝试各种不同的学习率和正则化强度,如果认真做
# ,将会在验证集上得到一个分类准确度大约是0.4的结果。
# 设置学习率和正则化强度,多设几个靠谱的,可能会好一点。
# 可以尝试先用较大的步长探索,在微调

learning_rates = [2e-7 , 0.75e-7 , 1.5e-7 , 0.75e-7 ]
regularization_strengths = [(1+i*0.1)*1e4 for i in range(-3,3)] + [(2+0.1*i)*1e4 for i in range(-3,3)]
# regularization_strengths = [3e4 , 3.25e4 , 3.5e4 , 3.75e4 , 4e4, 4.25e4 , 4.5e4]

# 结果是一个词典,将形式为(learning_rate , regularization_strength)的tuples的形式

results = {}
best_val = -1  # 出现的正确率最大值
best_svm = None # 达到正确率最大值的svm对象


################################################################################
# TODO:                                                                        #
# Write code that chooses the best hyperparameters by tuning on the validation #
# set. For each combination of hyperparameters, train a linear SVM on the      #
# training set, compute its accuracy on the training and validation sets, and  #
# store these numbers in the results dictionary. In addition, store the best   #
# validation accuracy in best_val and the LinearSVM object that achieves this  #
# accuracy in best_svm.                                                        #
#                                                                              #
# Hint: You should use a small value for num_iters as you develop your         #
# validation code so that the SVMs don't take much time to train; once you are #
# confident that your validation code works, you should rerun the validation   #
# code with a larger value for num_iters.                                      #
# 任务:
# 写下你的code,通过验证集选择最佳超参数。对于每一个超参数的组合,在训练集一个线性svm
#,在训练集和测试集上计算它的准确度,然后在字典里存储这些值。另外,在best_val中存储最好的验证集准确度,
# 在best_svm中存储达到这个值的svm对象
# 
# 提示:当编写验证代码是,应该使用较小的num_iters。这样SVM的训练模型并不会花费太多时间去训练。
# 当你确认验证code可以正常运行之后,再用较大的num_iters重新跑验证代码。
################################################################################

for rate in learning_rates:
    for regular in regularization_strengths:
        svm = LinearSVM()
        svm.train(X_train,y_train,learning_rate=rate,reg=regular,num_iters=1000)
        y_train_pred = svm.predict(X_train)
        accuracy_train = np.mean(y_train == y_train_pred)
        y_val_pred = svm.predict(X_val)
        accuracy_val = np.mean(y_val == y_val_pred)
        results[(rate , regular)] = (accuracy_train , accuracy_val)
        if (best_val<accuracy_val):
            best_val = accuracy_val
            best_svm = svm

########################END##############################

for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print ('lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy))

print ('best validation accuracy achieved during cross-validation: %f' % best_val)
lr 7.500000e-08 reg 7.000000e+03 train accuracy: 0.335776 val accuracy: 0.334000
lr 7.500000e-08 reg 8.000000e+03 train accuracy: 0.341429 val accuracy: 0.378000
lr 7.500000e-08 reg 9.000000e+03 train accuracy: 0.350204 val accuracy: 0.350000
lr 7.500000e-08 reg 1.000000e+04 train accuracy: 0.357490 val accuracy: 0.357000
lr 7.500000e-08 reg 1.100000e+04 train accuracy: 0.360816 val accuracy: 0.376000
lr 7.500000e-08 reg 1.200000e+04 train accuracy: 0.365592 val accuracy: 0.369000
lr 7.500000e-08 reg 1.700000e+04 train accuracy: 0.369796 val accuracy: 0.393000
lr 7.500000e-08 reg 1.800000e+04 train accuracy: 0.375265 val accuracy: 0.380000
lr 7.500000e-08 reg 1.900000e+04 train accuracy: 0.369245 val accuracy: 0.369000
lr 7.500000e-08 reg 2.000000e+04 train accuracy: 0.372143 val accuracy: 0.376000
lr 7.500000e-08 reg 2.100000e+04 train accuracy: 0.370959 val accuracy: 0.382000
lr 7.500000e-08 reg 2.200000e+04 train accuracy: 0.375367 val accuracy: 0.385000
lr 1.500000e-07 reg 7.000000e+03 train accuracy: 0.379878 val accuracy: 0.380000
lr 1.500000e-07 reg 8.000000e+03 train accuracy: 0.381184 val accuracy: 0.392000
lr 1.500000e-07 reg 9.000000e+03 train accuracy: 0.379367 val accuracy: 0.407000
lr 1.500000e-07 reg 1.000000e+04 train accuracy: 0.381082 val accuracy: 0.368000
lr 1.500000e-07 reg 1.100000e+04 train accuracy: 0.385041 val accuracy: 0.392000
lr 1.500000e-07 reg 1.200000e+04 train accuracy: 0.377245 val accuracy: 0.385000
lr 1.500000e-07 reg 1.700000e+04 train accuracy: 0.366429 val accuracy: 0.365000
lr 1.500000e-07 reg 1.800000e+04 train accuracy: 0.371918 val accuracy: 0.387000
lr 1.500000e-07 reg 1.900000e+04 train accuracy: 0.369551 val accuracy: 0.384000
lr 1.500000e-07 reg 2.000000e+04 train accuracy: 0.371245 val accuracy: 0.382000
lr 1.500000e-07 reg 2.100000e+04 train accuracy: 0.362184 val accuracy: 0.366000
lr 1.500000e-07 reg 2.200000e+04 train accuracy: 0.369612 val accuracy: 0.377000
lr 2.000000e-07 reg 7.000000e+03 train accuracy: 0.366469 val accuracy: 0.380000
lr 2.000000e-07 reg 8.000000e+03 train accuracy: 0.384184 val accuracy: 0.385000
lr 2.000000e-07 reg 9.000000e+03 train accuracy: 0.383878 val accuracy: 0.390000
lr 2.000000e-07 reg 1.000000e+04 train accuracy: 0.372939 val accuracy: 0.376000
lr 2.000000e-07 reg 1.100000e+04 train accuracy: 0.379694 val accuracy: 0.391000
lr 2.000000e-07 reg 1.200000e+04 train accuracy: 0.372776 val accuracy: 0.366000
lr 2.000000e-07 reg 1.700000e+04 train accuracy: 0.364796 val accuracy: 0.390000
lr 2.000000e-07 reg 1.800000e+04 train accuracy: 0.365388 val accuracy: 0.372000
lr 2.000000e-07 reg 1.900000e+04 train accuracy: 0.365082 val accuracy: 0.382000
lr 2.000000e-07 reg 2.000000e+04 train accuracy: 0.355184 val accuracy: 0.376000
lr 2.000000e-07 reg 2.100000e+04 train accuracy: 0.358633 val accuracy: 0.373000
lr 2.000000e-07 reg 2.200000e+04 train accuracy: 0.363469 val accuracy: 0.391000
best validation accuracy achieved during cross-validation: 0.407000

将数据可视化处理

import math
x_scatter = [math.log10(x[0]) for x in results]
y_scatter = [math.log10(x[1]) for x in results]

# 画出训练准确率
marker_size = 100
colors = [results[x][0] for x in results]
plt.subplot(2, 1, 1)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 training accuracy')

# 画出验证准确率
colors = [results[x][1] for x in results] # default size of markers is 20
plt.subplot(2, 1, 2)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 validation accuracy')
plt.tight_layout() # 调整子图间距
plt.show()

png

在测试集上评价最好的svm的表现

y_test_pred = best_svm.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print ('linear SVM on raw pixels final test set accuracy: %f' % test_accuracy)
linear SVM on raw pixels final test set accuracy: 0.380000

结果偏低,不过考虑到是一个10分类问题,均匀分布下,乱猜结果是0.1,所以还是有那么一点意思

# 对于每一类,可视化学习到的权重
# 依赖于你对学习权重和正则化强度的选择,这些可视化效果或者很明显或者很不明显
w = best_svm.W[:-1,:] # strip out the bias
w = w.reshape(32, 32, 3, 10)
w_min, w_max = np.min(w), np.max(w)
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in range(10):
    plt.subplot(2, 5, i + 1)

    # Rescale the weights to be between 0 and 255
    wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
    plt.imshow(wimg.astype('uint8'))
    plt.axis('off')
    plt.title(classes[i])

png

随堂练习2:

描述你的SVM可视化图像,给出一个简单的解释

答:

将学习到的权重,从图像上可以看出,权重是用于对原图图像进行特征提取的工具,与原图像关系很大。很朴素的思想,分类器权重向量上投影最大的向量得分应该最高,训练样本得到的权重向量最好结果就是训练样本提取出来的共性的方向,类似于一种模板或者过滤器。


版权声明:如无特殊说明,文章均为本站原创,转载请注明出处

本文链接:http://tunm.top/article/q2/