项目07:多输出神经网络实现captcha验证码识别

验证码(CAPTCHA)是“Completely Automated Public Turing test to tell Computers and Humans Apart”(全自动区分计算机和人类的图灵测试)的缩写,是一种区分用户是计算机还是人的公共全自动程序。可以防止:恶意破解密码、刷票、论坛灌水,有效防止某个黑客对某一个特定注册用户用特定程序暴力破解方式进行不断的登陆尝试,实际上用验证码是现在很多网站通行的方式,我们利用比较简易的方式实现了这个功能。这个问题可以由计算机生成并评判,但是其目的是为了让只有人类才能解答。由于计算机无法解答CAPTCHA的问题,所以回答出问题的用户就可以被认为是人类。但是计算机高速发展到今天,机器试图模仿人类识别行为的工作能力越来越优秀,今天我们就来尝试打破曾经的这种"不能"。

1. 准备工作

在指定的磁盘路径创建存放当前项目的目录,linux或macos可使用mkdir命令创建文件夹目录,Windows直接使用图形化界面右键新建文件夹即可,例如我们的存放项目的目录名为project07:

    (dlwork) jingyudeMacBook-Pro:~ jingyuyan$ mkdir project07

进入project07文件夹后,启动jupyter,创建一个文件开始我们的实验。

    (dlwork) jingyudeMacBook-Pro:~ jingyuyan$ cd project07

    (dlwork) jingyudeMacBook-Pro:project07$ jupyter notebook

2. 数据集的处理

本次实验的数据集相对于前面几次实验的数据集较为特殊,本次实验的数据集从训练集、验证集再到测试集,都需要我们自行生成。我们需要使用Python的验证码生成的库进行生存我们所需要使用到的数据。

2.1 captcha验证码

captcha是基于Python的一个验证码生成库,它可以根据你给定的参数,随机生成图片验证码,并且还支持语音验证码。我们需要使用它生成的图片验证码功能,为我们接下去的任务提供训练集和测试集数据后,搭建模型进行训练。我们的最终目的是,使用训练好的模型实现对captcha验证码的识别。我们先来熟悉一下captcha所生成的验证码是怎么样的形式。

# 导入需要使用的包
import numpy as np
from captcha.image import ImageCaptcha
import matplotlib.pyplot as plt
import random

我们先尝试使用ImageCaptcha函数,随意传递几个参数,生成几个验证码。我们设定原始字符为'HOW ARE YOU',生成400*200的一张验证码图片。

code = 'HOW ARE YOU'
img = ImageCaptcha(width=400, height=200).generate_image(code)
plt.imshow(img)
plt.title(code)
plt.show()

png

生成验证码数据集就是这么的简单,接下来我们尝试构建生成器,和预设数据集的存储情况。

2.2 构建captcha验证码生成器

我们预设好captcha验证码为一张字符数(num_len)为4、宽(width)170、高(height)80。同时构建一个字符字典(characters)表并且包含26个大写字母和10个数字字母,构成一种我们在生活中比较常用的大写英文字母混合数字的验证码,这个字符字典中的36个字符就是我们神经网络需要处理的36个分类(class_num)。

CHARACTERS = 'QWERTYUIOPASDFGHJKLZXCVBNM0123456789'
WIDTH = 170
HEIGHT = 80
NUM_LEN = 4
CLASS_NUM = len(CHARACTERS)

2.2.1 构建验证码随机生成函数

构建random_code_generator函数,用于生成一张验证码图片和一个原始字符在字典中的各个索引地址。

def random_code_generator():
    generator = ImageCaptcha(width=WIDTH, height=HEIGHT) 
    char_list = []
    char_index_list = []
    for _ in range(NUM_LEN):
        char = random.choice(CHARACTERS)
        char_list.append(char)
        char_index_list.append(CHARACTERS.find(char))
    random_str = ''.join(char_list)
    img = generator.generate_image(random_str)

    return img, char_index_list

测试random_code_generator函数所生成的数据

# 随机生成一张验证码
img, idx_list = random_code_generator()
# 显示验证码图片
plt.imshow(img)
plt.show()

png

# 显示索引
idx_list
[55, 55, 14, 27]

可以发现idx_list返回的是一个索引列表,我们将其转换成字符字典中的字符串

# 转换索引为字符串
[CHARACTERS[idx] for idx in idx_list]
['3', '3', 'G', 'q']

2.2.2 构建数据集生成器

接下去我们需要在已经定义好的随机生成函数random_code_generator的情况下构建数据集的生成器,该生成器可以直接提供给数据集使用。生成器是我们Python中比较常用的方法,我们之前所接触的实验的数据集大部分都是一次性准备好的数据进行清洗后直接使用,这次采用的生成器方式属于在训练的过程中才生成数据,边训练边生成。相对第一种的方式,生成器方式不需要一次性生成大量数据,训练过程可以比较好的利用到CPU来进行数据生成和处理。

首先我们需要定义好数据集数组的形状,数据集中包含数据(X)和标签(y),X是三通道的图片形式,我们预设的形状为(batch_size, height, width, 3)。

而y的形状我们我们预设的形式是一个存有四个存放One-Hot编码格式的(batch_size, class_num)形状的列表(list),如果只有一张图片的数据那就是4个(1, 36)数据,以此类推。

# 先定义好data_y和data_x的形状,采用数字0来进行占位
y = [np.zeros((1, CLASS_NUM), dtype=np.uint8) for i in range(NUM_LEN)]
x = np.zeros((1, HEIGHT, WIDTH, 3), dtype=np.uint8)

可以发现,y是一个列表,包含四个数组,分别存放验证码四个的One-Hot编码,先采用zeros进行占位处理。我们分别输出y和x看看他们长什么样。

plt.imshow(x[0])
plt.show()
x[0].shape

png

(80, 170, 3)
y, y[0].shape
([array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8),
  array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8),
  array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8),
  array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8)],
 (1, 36))

由上面输出的内容可发现,我们定义了一个x列表,里面仅存放一张形状为(80, 170, 3)的图片,并预设了所有像素值为0使其成为一张全黑图片作为占位作用。而y是一个存放四个形状为(1, 36)数组的列表,这四个数组分别存放验证码四个字符的One-Hot编码。接下来我们尝试构建一对数据。

from keras.utils import np_utils
img, index_list = random_code_generator()
for idx, item in enumerate(index_list):
    # 使用keras.utils提供的np_utils中的to_categorical函数将索引列表进行One-Hot编码
    one_hot_code = np_utils.to_categorical([item], CLASS_NUM).astype(np.uint8)
    # 填充y中的数组
    y[idx] = one_hot_code
# 填充x数组
x[0] = img

我们将生产好的随机验证码和图片存放到数据集中,并查看设置好的数据集。

# 查看y的One-hot编码和y的实际验证码
print("".join([CHARACTERS[np.argmax(np.array(item))]for item in y]))
y
Q72T





[array([[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8),
 array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]], dtype=uint8),
 array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8),
 array([[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=uint8)]
# 查看已经生产好的验证码图片
plt.imshow(x[0])
plt.show()
x[0].shape

png

(80, 170, 3)

根据以上函数,我们构造一个生成器,方便后面生产数据使用。生成器的使用详见Python基础教程。

from keras.utils import np_utils
def gen(batch_size=64):
    data_y = [np.zeros((batch_size, CLASS_NUM), dtype=np.uint8) for i in range(NUM_LEN)]
    data_x = np.zeros((batch_size, HEIGHT, WIDTH, 3), dtype=np.uint8)
    while True:
        for i in range(batch_size):
            x, index_list = random_code_generator()
            for idx, item in enumerate(index_list):
                one_hot_code = np_utils.to_categorical([item], CLASS_NUM).astype(np.uint8)
                data_y[idx][i] = one_hot_code
            data_x[i] = x
        yield data_x, data_y

2.2.3 构建可视化函数

构造好生成器后,我们先尝试生产一小批量的数据集。

X, y = next(gen())

定义decode函数用于将One-Hot编码转换成字符串。

def decode(y, idx):
    return "".join([CHARACTERS[np.argmax(np.array(item)[idx])]for item in y])

定义show_data函数,可以显示一对数据的图片和真实结果。

def show_data(X, y, idx, axis, pred_y=None):
    im = Image.fromarray(X[idx].astype('uint8')).convert('RGB')
    axis.imshow(im)
    real = decode(y, idx)
    res = 'real : ' + real
    color = 'black'
    if pred_y is not None:
        pred = decode(pred_y, idx)
        res = res + ', pred : ' + pred + ' X'
        if pred != real:
            # 当输入预测值和真实值不同时将字体改为红色
            color = 'red'
    plt.title(res, color=color, fontsize=15)

随机选择一对数据进行查看

idx = 3
fig = plt.figure(figsize=(6, 6))
ax = fig.add_subplot(1, 1, 1, xticks=[], yticks=[])
show_data(X, y, idx, ax)

png

定义show_img_list可查看多对数据可视化结果。

def show_img_list(X, y, begin=0, pred_y=None):
    fig = plt.figure(figsize=(15, 15))
    fig.subplots_adjust(left=0, right=1, bottom=0, top=0.6, hspace=0.05, wspace=0.05)
    for i in range(16):
        ax = fig.add_subplot(4, 4, i + 1, xticks=[], yticks=[])
        show_data(X, y, i+begin, ax, pred_y=pred_y)
    plt.show()
show_img_list(X, y)

png

3. 深度神经网络模型

3.1 搭建深度卷积神经网络模型

由于本次实验任务是分别识别一张图片中四个验证码,每个验证码均是一个36分类的输出,所以需要将模型设置为多输出的形式。这和我们之前的实验所使用的序贯模型(Sequential Models)不一样,Sequential是一个相对比较简单方便的模型定义形式。本次任务需要使用Keras的函数式(functional)API来构建神经网络,使用函数式的API构建模型可以实现多输出的效果,以下是模型的定义。

from keras.models import *
from keras.layers import *
from keras import callbacks

inputs = Input((HEIGHT, WIDTH, 3))
x = inputs
x = Conv2D(32, (3, 3), activation='relu')(x)
x = Conv2D(32, (3, 3), activation='relu')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2))(x)

x = Conv2D(64, (3, 3), activation='relu')(x)
x = Conv2D(64, (3, 3), activation='relu')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2))(x)

x = Conv2D(128, (3, 3), activation='relu')(x)
x = Conv2D(128, (3, 3), activation='relu')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2))(x)

x = Conv2D(256, (3, 3), activation='relu')(x)
x = Conv2D(256, (3, 3), activation='relu')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2))(x)

x = Flatten()(x)
x = Dropout(0.25)(x)

# 最后输出是四个字母的预测,使用softmax来进行预测
d1 = Dense(CLASS_NUM, activation='softmax', name='d1')(x)
d2 = Dense(CLASS_NUM, activation='softmax', name='d2')(x)
d3 = Dense(CLASS_NUM, activation='softmax', name='d3')(x)
d4 = Dense(CLASS_NUM, activation='softmax', name='d4')(x)

model = Model(inputs=inputs, outputs=[d1, d2, d3, d4])

model.summary()

model.compile(loss='categorical_crossentropy',
              optimizer='adadelta',
              metrics=['accuracy'])
Model: "model_6"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_8 (InputLayer)            (None, 80, 170, 3)   0                                            
__________________________________________________________________________________________________
conv2d_57 (Conv2D)              (None, 78, 168, 32)  896         input_8[0][0]                    
__________________________________________________________________________________________________
conv2d_58 (Conv2D)              (None, 76, 166, 32)  9248        conv2d_57[0][0]                  
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 76, 166, 32)  128         conv2d_58[0][0]                  
__________________________________________________________________________________________________
max_pooling2d_29 (MaxPooling2D) (None, 38, 83, 32)   0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
conv2d_59 (Conv2D)              (None, 36, 81, 64)   18496       max_pooling2d_29[0][0]           
__________________________________________________________________________________________________
conv2d_60 (Conv2D)              (None, 34, 79, 64)   36928       conv2d_59[0][0]                  
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 34, 79, 64)   256         conv2d_60[0][0]                  
__________________________________________________________________________________________________
max_pooling2d_30 (MaxPooling2D) (None, 17, 39, 64)   0           batch_normalization_2[0][0]      
__________________________________________________________________________________________________
conv2d_61 (Conv2D)              (None, 15, 37, 128)  73856       max_pooling2d_30[0][0]           
__________________________________________________________________________________________________
conv2d_62 (Conv2D)              (None, 13, 35, 128)  147584      conv2d_61[0][0]                  
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 13, 35, 128)  512         conv2d_62[0][0]                  
__________________________________________________________________________________________________
max_pooling2d_31 (MaxPooling2D) (None, 6, 17, 128)   0           batch_normalization_3[0][0]      
__________________________________________________________________________________________________
conv2d_63 (Conv2D)              (None, 4, 15, 256)   295168      max_pooling2d_31[0][0]           
__________________________________________________________________________________________________
conv2d_64 (Conv2D)              (None, 2, 13, 256)   590080      conv2d_63[0][0]                  
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 2, 13, 256)   1024        conv2d_64[0][0]                  
__________________________________________________________________________________________________
max_pooling2d_32 (MaxPooling2D) (None, 1, 6, 256)    0           batch_normalization_4[0][0]      
__________________________________________________________________________________________________
flatten_7 (Flatten)             (None, 1536)         0           max_pooling2d_32[0][0]           
__________________________________________________________________________________________________
dropout_7 (Dropout)             (None, 1536)         0           flatten_7[0][0]                  
__________________________________________________________________________________________________
d1 (Dense)                      (None, 36)           55332       dropout_7[0][0]                  
__________________________________________________________________________________________________
d2 (Dense)                      (None, 36)           55332       dropout_7[0][0]                  
__________________________________________________________________________________________________
d3 (Dense)                      (None, 36)           55332       dropout_7[0][0]                  
__________________________________________________________________________________________________
d4 (Dense)                      (None, 36)           55332       dropout_7[0][0]                  
==================================================================================================
Total params: 1,395,504
Trainable params: 1,394,544
Non-trainable params: 960
__________________________________________________________________________________________________

模型中每个卷积层之后都需要跟随一个batch_normalization层,主要是用于批标准化,使用标准化后的模型更加容易收敛。模型的第一层即输入层的形状为上节定义的(batch_size, 80, 170, 3)形状。模型的输出层为四个(batch_size, 36)形状的数据,即预测的最终结果。

model.png

3.2 训练模型

训练模型我们采用上小节定义的生成器,对8个训练周期,每次训练单次数据批量64,每个周期增加2000个新的数据。并设置回调函数保存训练过程中效果最佳的模型。为了方便,我们把上节定义需要用到的函数放在此处。

import numpy as np
from captcha.image import ImageCaptcha
import matplotlib.pyplot as plt
import random
from keras.utils import np_utils

CHARACTERS = 'QWERTYUIOPASDFGHJKLZXCVBNM0123456789'
WIDTH = 170
HEIGHT = 80
NUM_LEN = 4
CLASS_NUM = len(CHARACTERS)

def random_code_generator():
    generator = ImageCaptcha(width=WIDTH, height=HEIGHT) 
    char_list = []
    char_index_list = []
    for _ in range(NUM_LEN):
        char = random.choice(CHARACTERS)
        char_list.append(char)
        char_index_list.append(CHARACTERS.find(char))
    random_str = ''.join(char_list)
    img = generator.generate_image(random_str)

    return img, char_index_list

def gen(batch_size=64):
    data_y = [np.zeros((batch_size, CLASS_NUM), dtype=np.uint8) for i in range(NUM_LEN)]
    data_x = np.zeros((batch_size, HEIGHT, WIDTH, 3), dtype=np.uint8)
    while True:
        for i in range(batch_size):
            x, index_list = random_code_generator()
            for idx, item in enumerate(index_list):
                one_hot_code = np_utils.to_categorical([item], CLASS_NUM).astype(np.uint8)
                data_y[idx][i] = one_hot_code
            data_x[i] = x
        yield data_x, data_y
# 设置模型参数和训练参数
# 设置50个数据进行验证
VALIDATION_STEPS = 50
# 训练周期,这边设置8个周期即可
EPOCHS = 8
# 单批次数据量
BATCH_SIZE = 64
# 每个epoch增加新生成的2000个数据量
STEPS_PER_EPOCH = 2000
# 训练LOG打印形式
VERBOSE = 2

开始训练模型,时间较长,这里建议使用GPU进行训练。

cbks = [callbacks.ModelCheckpoint("best_model.h5", save_best_only=True)]
history = model.fit_generator(gen(batch_size=BATCH_SIZE),     
                    steps_per_epoch=STEPS_PER_EPOCH,   
                    epochs=EPOCHS,               
                    callbacks=cbks,         
                    validation_data=gen(),  
                    validation_steps=VALIDATION_STEPS      
                    )
Epoch 1/10
val_d1_accuracy: 0.9269 - val_d2_accuracy: 0.9256 - val_d3_accuracy: 0.9006 - val_d4_accuracy: 0.8375
d2_accuracy: 0.2066 - d3_accuracy: 0.1918 - d4_accuracy: 0.1492 - val_loss: 1.6052 - val_d1_loss: 0.2337 - val_d2_loss: 0.1911 - val_d3_loss: 0.3142 - val_d4_loss: 0.6982 - val_d1_accuracy: 0.9269 - val_d2_accuracy: 0.9256 - val_d3_accuracy: 0.9006 - val_d4_accuracy: 0.8375
Epoch 2/10
d2_accuracy: 0.9683 - d3_accuracy: 0.9521 - d4_accuracy: 0.8981 - val_loss: 0.0334 - val_d1_loss: 0.0214 - val_d2_loss: 0.0129 - val_d3_loss: 0.0202 - val_d4_loss: 0.1488 - val_d1_accuracy: 0.9919 - val_d2_accuracy: 0.9950 - val_d3_accuracy: 0.9950 - val_d4_accuracy: 0.9581
Epoch 3/10
d2_accuracy: 0.9896 - d3_accuracy: 0.9849 - d4_accuracy: 0.9624 - val_loss: 0.2863 - val_d1_loss: 0.0064 - val_d2_loss: 0.0108 - val_d3_loss: 0.0192 - val_d4_loss: 0.0757 - val_d1_accuracy: 0.9994 - val_d2_accuracy: 0.9975 - val_d3_accuracy: 0.9962 - val_d4_accuracy: 0.9781
Epoch 4/10
d2_accuracy: 0.9912 - d3_accuracy: 0.9887 - d4_accuracy: 0.9717 - val_loss: 0.3038 - val_d1_loss: 0.0141 - val_d2_loss: 0.0226 - val_d3_loss: 0.0274 - val_d4_loss: 0.0415 - val_d1_accuracy: 0.9987 - val_d2_accuracy: 0.9975 - val_d3_accuracy: 0.9944 - val_d4_accuracy: 0.9937
Epoch 5/10
d2_accuracy: 0.9919 - d3_accuracy: 0.9901 - d4_accuracy: 0.9776 - val_loss: 0.1098 - val_d1_loss: 0.0139 - val_d2_loss: 0.0146 - val_d3_loss: 0.0197 - val_d4_loss: 0.0902 - val_d1_accuracy: 0.9994 - val_d2_accuracy: 0.9981 - val_d3_accuracy: 0.9975 - val_d4_accuracy: 0.9812
Epoch 6/10
d2_accuracy: 0.9930 - d3_accuracy: 0.9914 - d4_accuracy: 0.9791 - val_loss: 0.2746 - val_d1_loss: 0.0686 - val_d2_loss: 0.0812 - val_d3_loss: 0.0513 - val_d4_loss: 0.1238 - val_d1_accuracy: 0.9900 - val_d2_accuracy: 0.9887 - val_d3_accuracy: 0.9950 - val_d4_accuracy: 0.9731
Epoch 7/10
d2_accuracy: 0.9926 - d3_accuracy: 0.9909 - d4_accuracy: 0.9796 - val_loss: 0.1662 - val_d1_loss: 0.0508 - val_d2_loss: 0.0363 - val_d3_loss: 0.0753 - val_d4_loss: 0.0683 - val_d1_accuracy: 0.9919 - val_d2_accuracy: 0.9969 - val_d3_accuracy: 0.9900 - val_d4_accuracy: 0.9881
Epoch 8/10
d2_accuracy: 0.9912 - d3_accuracy: 0.9897 - d4_accuracy: 0.9789 - val_loss: 0.1168 - val_d1_loss: 0.0055 - val_d2_loss: 0.0137 - val_d3_loss: 0.0048 - val_d4_loss: 0.0214 - val_d1_accuracy: 0.9969 - val_d2_accuracy: 0.9894 - val_d3_accuracy: 0.9975 - val_d4_accuracy: 0.9944
Epoch 9/10
d2_accuracy: 0.9930 - d3_accuracy: 0.9917 - d4_accuracy: 0.9821 - val_loss: 0.0442 - val_d1_loss: 0.0089 - val_d2_loss: 0.0089 - val_d3_loss: 0.0082 - val_d4_loss: 0.0475 - val_d1_accuracy: 0.9950 - val_d2_accuracy: 0.9975 - val_d3_accuracy: 0.9975 - val_d4_accuracy: 0.9844
Epoch 10/10
d2_accuracy: 0.9924 - d3_accuracy: 0.9914 - d4_accuracy: 0.9815 - val_loss: 0.0248 - val_d1_loss: 0.0325 - val_d2_loss: 0.0296 - val_d3_loss: 0.0346 - val_d4_loss: 0.0439 - val_d1_accuracy: 0.9987 - val_d2_accuracy: 0.9981 - val_d3_accuracy: 0.9906 - val_d4_accuracy: 0.9919

定义训练可视化函数

def plot_train_history(history, train_metrics, val_metrics):
    plt.plot(history.history.get(train_metrics))
    plt.plot(history.history.get(val_metrics))
    plt.ylabel(train_metrics)
    plt.xlabel('Epochs')
    plt.legend(['train', 'validation'])

显示训练总结果的损失

plot_train_history(history, 'loss', 'val_loss')
plt.show()

png

显示四个输出层的准确率

plt.figure(figsize=(12,6))
plt.subplot(2,2,1)
plot_train_history(history, 'd1_accuracy','val_d1_accuracy')
plt.subplot(2,2,2)
plot_train_history(history, 'd2_accuracy','val_d2_accuracy')
plt.subplot(2,2,3)
plot_train_history(history, 'd3_accuracy','val_d3_accuracy')
plt.subplot(2,2,4)
plot_train_history(history, 'd4_accuracy','val_d4_accuracy')
plt.show()

png

可以发现,第3个周期时模型出现了一些问题,后面仍然收敛了,读者可以根据自己训练出来的模型结果进行调整参数,再尝试训练找出原因。

3. 模型评估与预测

3.1 评估模型准确率

我们对模型进行一个评估与预测的操作。读取model,这里如果有读者没有训练模型,可以使用我们附录提供的模型自行下载进行预测。

from keras.models import load_model
# 附录的模型放在本实验的model文件夹下
# 如果是根据上小节训练的模型,代码已经设置默认保存在项目根文件下
model = load_model('./model/best_model.h5')
# model = load_model('best_model.h5')

定义evaluate函数求出模型的准确率

from tqdm import tqdm
def evaluate(model, batch_num=32):
    batch_acc = 0 
    batch_count = 0 

    generator = gen()
    for i in tqdm(range(batch_num)):
        X, y = next(generator) 
        y_pred = model.predict(X) 
        y_pred = np.argmax(y_pred, axis=2).T 
        y_true = np.argmax(y, axis=2).T 

        for i in range(len(y_pred)):
            batch_count += 1
            if np.array_equal(y_true[i], y_pred[i]):
                batch_acc += 1

    return batch_acc / batch_count

这里我们生成批次为100的数据,使用evaluate函数对模型进行评估

evaluate(model, 100)
  0%|          | 0/100 [00:00<?, ?it/s]

  1%|          | 1/100 [00:01<02:38,  1.61s/it]

  2%|         | 2/100 [00:02<02:25,  1.48s/it]

  3%|         | 3/100 [00:04<02:19,  1.44s/it]

  4%|         | 4/100 [00:05<02:11,  1.37s/it]

  5%|         | 5/100 [00:06<02:04,  1.32s/it]

  6%|         | 6/100 [00:07<02:01,  1.29s/it]

  7%|         | 7/100 [00:09<01:58,  1.28s/it]

  8%|         | 8/100 [00:10<01:53,  1.24s/it]

  9%|         | 9/100 [00:11<01:49,  1.20s/it]

 10%|         | 10/100 [00:12<01:45,  1.17s/it]

 11%|         | 11/100 [00:13<01:42,  1.16s/it]

 12%|█▏        | 12/100 [00:14<01:42,  1.16s/it]

 13%|█▎        | 13/100 [00:15<01:42,  1.17s/it]

 14%|█▍        | 14/100 [00:17<01:42,  1.19s/it]

 15%|█▌        | 15/100 [00:18<01:41,  1.19s/it]

 16%|█▌        | 16/100 [00:19<01:39,  1.19s/it]

 17%|█▋        | 17/100 [00:20<01:36,  1.16s/it]

 18%|█▊        | 18/100 [00:21<01:37,  1.19s/it]

 19%|█▉        | 19/100 [00:22<01:34,  1.17s/it]

 20%|██        | 20/100 [00:24<01:35,  1.19s/it]

 21%|██        | 21/100 [00:26<02:01,  1.54s/it]

 22%|██▏       | 22/100 [00:27<01:53,  1.46s/it]

 23%|██▎       | 23/100 [00:29<01:49,  1.42s/it]

 24%|██▍       | 24/100 [00:30<01:44,  1.38s/it]

 25%|██▌       | 25/100 [00:31<01:47,  1.43s/it]

 26%|██▌       | 26/100 [00:33<01:46,  1.44s/it]

 27%|██▋       | 27/100 [00:34<01:39,  1.36s/it]

 28%|██▊       | 28/100 [00:35<01:34,  1.32s/it]

 29%|██▉       | 29/100 [00:37<01:36,  1.35s/it]

 30%|███       | 30/100 [00:38<01:31,  1.31s/it]

 31%|███       | 31/100 [00:39<01:26,  1.26s/it]

 32%|███▏      | 32/100 [00:40<01:23,  1.23s/it]

 33%|███▎      | 33/100 [00:42<01:22,  1.23s/it]

 34%|███▍      | 34/100 [00:43<01:20,  1.22s/it]

 35%|███▌      | 35/100 [00:44<01:18,  1.21s/it]

 36%|███▌      | 36/100 [00:45<01:16,  1.20s/it]

 37%|███▋      | 37/100 [00:46<01:14,  1.18s/it]

 38%|███▊      | 38/100 [00:47<01:13,  1.18s/it]

 39%|███▉      | 39/100 [00:49<01:12,  1.20s/it]

 40%|████      | 40/100 [00:50<01:10,  1.18s/it]

 41%|████      | 41/100 [00:51<01:08,  1.16s/it]

 42%|████▏     | 42/100 [00:52<01:06,  1.14s/it]

 43%|████▎     | 43/100 [00:53<01:05,  1.15s/it]

 44%|████▍     | 44/100 [00:54<01:05,  1.17s/it]

 45%|████▌     | 45/100 [00:56<01:05,  1.19s/it]

 46%|████▌     | 46/100 [00:57<01:09,  1.29s/it]

 47%|████▋     | 47/100 [00:59<01:16,  1.45s/it]

 48%|████▊     | 48/100 [01:01<01:19,  1.53s/it]

 49%|████▉     | 49/100 [01:02<01:22,  1.61s/it]

 50%|█████     | 50/100 [01:04<01:17,  1.55s/it]

 51%|█████     | 51/100 [01:06<01:22,  1.68s/it]

 52%|█████▏    | 52/100 [01:07<01:17,  1.62s/it]

 53%|█████▎    | 53/100 [01:09<01:10,  1.50s/it]

 54%|█████▍    | 54/100 [01:10<01:04,  1.39s/it]

 55%|█████▌    | 55/100 [01:11<01:03,  1.41s/it]

 56%|█████▌    | 56/100 [01:12<00:59,  1.36s/it]

 57%|█████▋    | 57/100 [01:14<01:07,  1.56s/it]

 58%|█████▊    | 58/100 [01:16<01:04,  1.54s/it]

 59%|█████▉    | 59/100 [01:17<01:02,  1.53s/it]

 60%|██████    | 60/100 [01:19<00:59,  1.49s/it]

 61%|██████    | 61/100 [01:20<00:55,  1.43s/it]

 62%|██████▏   | 62/100 [01:21<00:52,  1.39s/it]

 63%|██████▎   | 63/100 [01:23<00:51,  1.39s/it]

 64%|██████▍   | 64/100 [01:24<00:48,  1.35s/it]

 65%|██████▌   | 65/100 [01:25<00:47,  1.35s/it]

 66%|██████▌   | 66/100 [01:27<00:44,  1.32s/it]

 67%|██████▋   | 67/100 [01:28<00:43,  1.32s/it]

 68%|██████▊   | 68/100 [01:29<00:42,  1.34s/it]

 69%|██████▉   | 69/100 [01:31<00:40,  1.31s/it]

 70%|███████   | 70/100 [01:32<00:38,  1.29s/it]

 71%|███████   | 71/100 [01:33<00:37,  1.29s/it]

 72%|███████▏  | 72/100 [01:34<00:35,  1.28s/it]

 73%|███████▎  | 73/100 [01:36<00:34,  1.27s/it]

 74%|███████▍  | 74/100 [01:37<00:32,  1.26s/it]

 75%|███████▌  | 75/100 [01:38<00:31,  1.25s/it]

 76%|███████▌  | 76/100 [01:39<00:30,  1.28s/it]

 77%|███████▋  | 77/100 [01:41<00:29,  1.29s/it]

 78%|███████▊  | 78/100 [01:42<00:27,  1.26s/it]

 79%|███████▉  | 79/100 [01:43<00:26,  1.26s/it]

 80%|████████  | 80/100 [01:44<00:25,  1.27s/it]

 81%|████████  | 81/100 [01:46<00:24,  1.29s/it]

 82%|████████▏ | 82/100 [01:47<00:23,  1.30s/it]

 83%|████████▎ | 83/100 [01:48<00:21,  1.29s/it]

 84%|████████▍ | 84/100 [01:50<00:20,  1.28s/it]

 85%|████████▌ | 85/100 [01:51<00:18,  1.26s/it]

 86%|████████▌ | 86/100 [01:52<00:17,  1.24s/it]

 87%|████████▋ | 87/100 [01:53<00:16,  1.26s/it]

 88%|████████▊ | 88/100 [01:55<00:15,  1.28s/it]

 89%|████████▉ | 89/100 [01:56<00:13,  1.25s/it]

 90%|█████████ | 90/100 [01:57<00:12,  1.23s/it]

 91%|█████████ | 91/100 [01:58<00:10,  1.22s/it]

 92%|█████████▏| 92/100 [02:00<00:09,  1.23s/it]

 93%|█████████▎| 93/100 [02:01<00:08,  1.20s/it]

 94%|█████████▍| 94/100 [02:02<00:07,  1.19s/it]

 95%|█████████▌| 95/100 [02:03<00:05,  1.19s/it]

 96%|█████████▌| 96/100 [02:04<00:04,  1.19s/it]

 97%|█████████▋| 97/100 [02:05<00:03,  1.20s/it]

 98%|█████████▊| 98/100 [02:07<00:02,  1.19s/it]

 99%|█████████▉| 99/100 [02:08<00:01,  1.19s/it]

100%|██████████| 100/100 [02:09<00:00,  1.29s/it]





0.9125

模型的准确率可以达到0.91,还算是一个比较理想的结果,但是还有很大的提升空间,读者们可以自行调整网络接口和训练批次等进行实验。

3.2 生成数据集预测

模型在自定义测试集的准确率有0.91,我们来看看模型一般是在哪些地方出错。生成一组批次为64的数据,可视化出数据。

import matplotlib.pyplot as plt

def decode(y, idx):
    return "".join([CHARACTERS[np.argmax(np.array(item)[idx])]for item in y])

def show_data(X, y, idx, axis, pred_y=None):
    im = Image.fromarray(X[idx].astype('uint8')).convert('RGB')
    axis.imshow(im)
    real = decode(y, idx)
    res = 'real : '+real
    color = 'black'
    if pred_y is not None:
        pred = decode(pred_y, idx)
        res = res + ', pred : ' + pred + ' X'
        if pred != real:
            # 当输入预测值和真实值不同时将字体改为红色
            color = 'red'
    plt.title(res, color=color, fontsize=15)

def show_img_list(X, y, begin=0, pred_y=None):
    fig = plt.figure(figsize=(15, 15))
    fig.subplots_adjust(left=0, right=1, bottom=0, top=0.6, hspace=0.05, wspace=0.05)
    for i in range(16):
        ax = fig.add_subplot(4, 4, i + 1, xticks=[], yticks=[])
        show_data(X, y, i+begin, ax, pred_y=pred_y)
    plt.show()
X, y = next(gen(64))
pred_y = model.predict(X)
show_img_list(X, y, begin=0, pred_y=pred_y)

png

可以发现,这一组中有2个结果是预测错误的,错误的地方是’0‘和’O‘、’O‘和’D‘这样较为相似的字母。相信平常生活中也会有人会因为看错而误输入验证码。

结论

本章实验训练的模型是能识别4位字符的验证码的,有读者可能会问,那6位的、8位的如何做?不定长的呢?如何使用循环神经网络(RNN)和LSTM网络解决不定长的问题?如何实现大小写混合数字识别?如何实现中文验证码识别?请读者继续加深研究。

版权声明:如无特殊说明,文章均为本站原创,转载请注明出处

本文链接:http://tunm.top/article/learning_07/