
GoAI
2023/04/08阅读:20主题:全栈蓝
Paddle入门实战系列(五):基于CRNN的手写英文单词识别

❝👨💻「作者简介:」 大数据专业硕士在读,CSDN人工智能领域博客专家,阿里云专家博主,专注大数据与人工智能知识分享。 「公众号:GoAI的学习小屋」 ,免费分享书籍、简历、导图等资料,更有交流群分享AI和大数据,加群方式公众号回复“加群”或➡️「点击链接」。 🎉「专栏推荐:」 目前在写一个CV方向专栏,后期会更新不限于目标检测、OCR、图像分类、图像分割等方向,目前活动仅29.9,虽然付费但会长期更新且价格便宜,感兴趣的小伙伴可以关注下➡️「专栏地址」 🎉「技术控福利:」 程序员兼职社区招募!技术范围广,CV、NLP方向均可,要求有一定基础,最好是研究生及以上或有工作经验,欢迎大佬加入!群内Python、c++、Matlab等各类编程语言单应有尽有, 资源靠谱、费用自谈,有意向者直接➡️访问。
❞
❝「OCR专栏地址:」 深入浅出OCR
🍀恭喜你发现宝藏!本专栏系列主要介绍计算机视觉OCR文字识别领域,每章将分别从OCR技术发展、方向、概念、算法、论文、数据集、对现有平台及未来发展方向等各种角度展开详细介绍,综合基础与实战知识。以下是本系列目录,分为前置篇、基础篇与进阶篇,「进阶篇在基础篇基础上进行全面总结,会针对最经典论文及最新算法展开讲解」,内容目前包括不限于文字检测、识别、表格分析等方向。 未来看情况更新NLP方向知识,本专栏目前主要面向深度学习及CV的同学学习,希望大家能够多多交流,欢迎订阅本专栏,如有错误请大家在评论区指正,如有侵权联系删除。
❞
OCR项目实战导图:

Paddle入门实战系列(五):基于CRNN的手写英文单词识别
一、「项目介绍」
本项目采用PaddleOCR开源框架进行写英文单词识别,流程分为数据集构建、数据集处理、模型搭建与预测、推理等,数据集采用好未来教育提供的开源数据集,每张图片对应多个单词,相比传统单个手写数字识别具有一定难度,项目采用CRNN+CTC方法,设定相关参数值,实现不定长的手写英文单词识别。
「✨效果展示:」

「🌐项目链接:」 下周公开,欢迎订阅!
1️⃣数据样例展示:

2️⃣安装环境
!git clone https://gitee.com/paddlepaddle/PaddleOCR
%cd PaddleOCR
!git checkout -b release/2.4 remotes/origin/release/2.4
!pip install -r requirements.txt
二、获取预训练模型
本次选用PaddleOCR提供的预训练模型 模型地址
# 获取预训练模型
!wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/en_number_mobile_v2.0_rec_slim_train.tar
!tar -xf /home/aistudio/PaddleOCR/pretrain_models/en_number_mobile_v2.0_rec_slim_train.tar -C /home/aistudio/PaddleOCR/pretrain_models
三、数据处理
-
解压数据集
!unzip /home/aistudio/data/data128403/TAL_OCR_ENG手写英文数据集.zip -d /home/aistudio/data/data128403/
-
数据集拆分
# 划分数据集
# 格式示例: 1016_752_1.jpg I'm Li Hua,chairman of the Student Union from
with open(f'/home/aistudio/data/data128403/TAL_OCR_ENG手写英文数据集/label.txt') as f:
lines = f.readlines()
# 9000用于训练, 1000用于测试
with open(f'/home/aistudio/data/data128403/TAL_OCR_ENG手写英文数据集/train.txt', 'w') as f1:
with open(f'/home/aistudio/data/data128403/TAL_OCR_ENG手写英文数据集/test.txt', 'w') as f2:
for index, line in enumerate(lines):
firstSpaceIndex = line.find(' ')
line2 = line[0:firstSpaceIndex] + '\t' + line[firstSpaceIndex+1:]
if index < 9000:
f1.write(line2)
if index >= 9000:
f2.write(line2)
四、模型训练
模型介绍
❝「模型介绍与参数配置:」 以PaddleOCR为框架,采用CRNN+CTC进行手写英文单词识别,,整体上完成识别模型的搭建、训练、评估和预测过程。训练时可以手动更改config配置文件(数据训练、加载、评估验证等参数),骨干网络采用MobileNetV3,使用CTC损失函数。优化器采用adam,学习率策略为余弦,训练轮次epoch200轮,设置字典路径、训练集与测试集及输出路径。
❞
CRNN算法框架:

CRNN网络结构包含三部分,从下到上依次为:
(1)卷积层。作用是从输入图像中提取特征序列。
(2)循环层。作用是预测从卷积层获取的特征序列的标签(真实值)分布。
(3)转录层。作用是把从循环层获取的标签分布通过去重整合等操作转换成最终的识别结果。

CRNN模型训练:
❝在模型训练过程中,首先使用标准的CNN网络提取文本图像的特征,再利用BLSTM将特征向量进行融合以提取字符序列的上下文特征,然后得到每列特征的概率分布,最后通过转录层(CTC)进行预测得到文本序列。
❞
具体模型训练流程为:
1.将输入图像统一缩放至32W3。
2.利用CNN提取后图像卷积特征,得到的大小为:1W/4512。
3.通过上述输入到LSTM提取序列特征,得到W/4*n后验概率矩阵。
4.利用CTC损失,实现标签和输出一一对应,进行训练。
CTC损失函数介绍:
CTC是一种Loss计算方法,用CTC代替Softmax Loss,训练样本无需对齐。引入blank字符,解决有些位置没有字符的问题,通过递推,快速计算梯度。
「训练配置文件:」
Global:
use_gpu: True
epoch_num: 200
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/rec_en_number_lite
save_epoch_step: 3
# evaluation is run every 5000 iterations after the 4000th iteration
eval_batch_step: [0, 100]
# if pretrained_model is saved in static mode, load_static_weights must set to True
cal_metric_during_train: True
pretrained_model: ./pretrain_models/en_number_mobile_v2.0_rec_slim_train/best_accuracy
checkpoints:
save_inference_dir:
use_visualdl: True
infer_img:
# for data or label process
character_dict_path: ppocr/utils/en_dict.txt
max_text_length: 250
infer_mode: False
use_space_char: True
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.005
regularizer:
name: 'L2'
factor: 0.00001
Architecture:
model_type: rec
algorithm: CRNN
Transform:
Backbone:
name: MobileNetV3
scale: 0.5
model_name: small
small_stride: [1, 2, 2, 2]
Neck:
name: SequenceEncoder
encoder_type: rnn
hidden_size: 48
Head:
name: CTCHead
fc_decay: 0.00001
Loss:
name: CTCLoss
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
Train:
dataset:
name: SimpleDataSet
data_dir: /home/aistudio/data/data128403/TAL_OCR_ENG手写英文数据集/data_composition
label_file_list: ["/home/aistudio/data/data128403/TAL_OCR_ENG手写英文数据集/train.txt"]
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- RecAug:
- CTCLabelEncode: # Class handling label
- RecResizeImg:
image_shape: [3, 32, 320]
- KeepKeys:
keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
loader:
shuffle: True
# batch_size_per_card: 1024
batch_size_per_card: 256
drop_last: True
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
data_dir: /home/aistudio/data/data128403/TAL_OCR_ENG手写英文数据集/data_composition
label_file_list: ["/home/aistudio/data/data128403/TAL_OCR_ENG手写英文数据集/test.txt"]
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- CTCLabelEncode: # Class handling label
- RecResizeImg:
image_shape: [3, 32, 320]
- KeepKeys:
keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
loader:
shuffle: False
drop_last: False
batch_size_per_card: 256
num_workers: 8
#若运行训练提示缺少下列安装包,执行下述代码即可
!pip install imgaug
!pip install Levenshtein
!pip install pyclipper
!pip install lmdb
# 开始训练
%cd /home/aistudio/PaddleOCR
!python tools/train.py -c /home/aistudio/work/rec_en_number_lite_train.yml
继续训练
在训练过程中经常会遇到各种问题导致训练中断,可以尝试继续训练
# 执行本段代码继续上次训练
!python tools/train.py -c /home/aistudio/work/rec_en_number_lite_train.yml -o Global.checkpoints=/home/aistudio/PaddleOCR/output/rec_en_number_lite/best_accuracy
# 图片显示
import matplotlib.pyplot as plt
import cv2
def imshow(img_path):
im = cv2.imread(img_path)
plt.imshow(im )
path = '/home/aistudio/data/data128403/TAL_OCR_ENG手写英文数据集/data_composition/1016_1396_3.jpg'
imshow(path)
五、模型预测
#由于本数据集字符较长,训练准确率效果有待提升,可以调整相关超参数及其他网络结构进行训练。
!python tools/infer_rec.py -c /home/aistudio/work/rec_en_number_lite_train.yml \
-o Global.infer_img="/home/aistudio/data/data128403/TAL_OCR_ENG手写英文数据集/data_composition/1016_1396_3.jpg" \
Global.pretrained_model="/home/aistudio/PaddleOCR/output/rec_en_number_lite/best_accuracy"
# 显示该图片
path = '/home/aistudio/data/data128403/TAL_OCR_ENG手写英文数据集/data_composition/1016_1396_3.jpg'
imshow(path)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:26: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
def convert_to_list(value, n, name, dtype=np.int):
[2023/03/26 15:36:27] root INFO: Architecture :
[2023/03/26 15:36:27] root INFO: Backbone :
[2023/03/26 15:36:27] root INFO: model_name : small
[2023/03/26 15:36:27] root INFO: name : MobileNetV3
[2023/03/26 15:36:27] root INFO: scale : 0.5
[2023/03/26 15:36:27] root INFO: small_stride : [1, 2, 2, 2]
[2023/03/26 15:36:27] root INFO: Head :
[2023/03/26 15:36:27] root INFO: fc_decay : 1e-05
[2023/03/26 15:36:27] root INFO: name : CTCHead
[2023/03/26 15:36:27] root INFO: Neck :
[2023/03/26 15:36:27] root INFO: encoder_type : rnn
[2023/03/26 15:36:27] root INFO: hidden_size : 48
[2023/03/26 15:36:27] root INFO: name : SequenceEncoder
[2023/03/26 15:36:27] root INFO: Transform : None
[2023/03/26 15:36:27] root INFO: algorithm : CRNN
[2023/03/26 15:36:27] root INFO: model_type : rec
[2023/03/26 15:36:27] root INFO: Eval :
[2023/03/26 15:36:27] root INFO: dataset :
[2023/03/26 15:36:27] root INFO: data_dir : /home/aistudio/data/data128403/TAL_OCR_ENG手写英文数据集/data_composition
[2023/03/26 15:36:27] root INFO: label_file_list : ['/home/aistudio/data/data128403/TAL_OCR_ENG手写英文数据集/test.txt']
[2023/03/26 15:36:27] root INFO: name : SimpleDataSet
[2023/03/26 15:36:27] root INFO: transforms :
[2023/03/26 15:36:27] root INFO: DecodeImage :
[2023/03/26 15:36:27] root INFO: channel_first : False
[2023/03/26 15:36:27] root INFO: img_mode : BGR
[2023/03/26 15:36:27] root INFO: CTCLabelEncode : None
[2023/03/26 15:36:27] root INFO: RecResizeImg :
[2023/03/26 15:36:27] root INFO: image_shape : [3, 32, 320]
[2023/03/26 15:36:27] root INFO: KeepKeys :
[2023/03/26 15:36:27] root INFO: keep_keys : ['image', 'label', 'length']
[2023/03/26 15:36:27] root INFO: loader :
[2023/03/26 15:36:27] root INFO: batch_size_per_card : 256
[2023/03/26 15:36:27] root INFO: drop_last : False
[2023/03/26 15:36:27] root INFO: num_workers : 8
[2023/03/26 15:36:27] root INFO: shuffle : False
[2023/03/26 15:36:27] root INFO: Global :
[2023/03/26 15:36:27] root INFO: cal_metric_during_train : True
[2023/03/26 15:36:27] root INFO: character_dict_path : ppocr/utils/en_dict.txt
[2023/03/26 15:36:27] root INFO: checkpoints : None
[2023/03/26 15:36:27] root INFO: debug : False
[2023/03/26 15:36:27] root INFO: distributed : False
[2023/03/26 15:36:27] root INFO: epoch_num : 200
[2023/03/26 15:36:27] root INFO: eval_batch_step : [0, 100]
[2023/03/26 15:36:27] root INFO: infer_img : /home/aistudio/data/data128403/TAL_OCR_ENG手写英文数据集/data_composition/1016_1396_3.jpg
[2023/03/26 15:36:27] root INFO: infer_mode : False
[2023/03/26 15:36:27] root INFO: log_smooth_window : 20
[2023/03/26 15:36:27] root INFO: max_text_length : 250
[2023/03/26 15:36:27] root INFO: pretrained_model : /home/aistudio/PaddleOCR/output/rec_en_number_lite/best_accuracy
[2023/03/26 15:36:27] root INFO: print_batch_step : 10
[2023/03/26 15:36:27] root INFO: save_epoch_step : 3
[2023/03/26 15:36:27] root INFO: save_inference_dir : None
[2023/03/26 15:36:27] root INFO: save_model_dir : ./output/rec_en_number_lite
[2023/03/26 15:36:27] root INFO: use_gpu : True
[2023/03/26 15:36:27] root INFO: use_space_char : True
[2023/03/26 15:36:27] root INFO: use_visualdl : True
[2023/03/26 15:36:27] root INFO: Loss :
[2023/03/26 15:36:27] root INFO: name : CTCLoss
[2023/03/26 15:36:27] root INFO: Metric :
[2023/03/26 15:36:27] root INFO: main_indicator : acc
[2023/03/26 15:36:27] root INFO: name : RecMetric
[2023/03/26 15:36:27] root INFO: Optimizer :
[2023/03/26 15:36:27] root INFO: beta1 : 0.9
[2023/03/26 15:36:27] root INFO: beta2 : 0.999
[2023/03/26 15:36:27] root INFO: lr :
[2023/03/26 15:36:27] root INFO: learning_rate : 0.005
[2023/03/26 15:36:27] root INFO: name : Cosine
[2023/03/26 15:36:27] root INFO: name : Adam
[2023/03/26 15:36:27] root INFO: regularizer :
[2023/03/26 15:36:27] root INFO: factor : 1e-05
[2023/03/26 15:36:27] root INFO: name : L2
[2023/03/26 15:36:27] root INFO: PostProcess :
[2023/03/26 15:36:27] root INFO: name : CTCLabelDecode
[2023/03/26 15:36:27] root INFO: Train :
[2023/03/26 15:36:27] root INFO: dataset :
[2023/03/26 15:36:27] root INFO: data_dir : /home/aistudio/data/data128403/TAL_OCR_ENG手写英文数据集/data_composition
[2023/03/26 15:36:27] root INFO: label_file_list : ['/home/aistudio/data/data128403/TAL_OCR_ENG手写英文数据集/train.txt']
[2023/03/26 15:36:27] root INFO: name : SimpleDataSet
[2023/03/26 15:36:27] root INFO: transforms :
[2023/03/26 15:36:27] root INFO: DecodeImage :
[2023/03/26 15:36:27] root INFO: channel_first : False
[2023/03/26 15:36:27] root INFO: img_mode : BGR
[2023/03/26 15:36:27] root INFO: RecAug : None
[2023/03/26 15:36:27] root INFO: CTCLabelEncode : None
[2023/03/26 15:36:27] root INFO: RecResizeImg :
[2023/03/26 15:36:27] root INFO: image_shape : [3, 32, 320]
[2023/03/26 15:36:27] root INFO: KeepKeys :
[2023/03/26 15:36:27] root INFO: keep_keys : ['image', 'label', 'length']
[2023/03/26 15:36:27] root INFO: loader :
[2023/03/26 15:36:27] root INFO: batch_size_per_card : 256
[2023/03/26 15:36:27] root INFO: drop_last : True
[2023/03/26 15:36:27] root INFO: num_workers : 4
[2023/03/26 15:36:27] root INFO: shuffle : True
[2023/03/26 15:36:27] root INFO: profiler_options : None
[2023/03/26 15:36:27] root INFO: train with paddle 2.0.2 and device CUDAPlace(0)
W0326 15:36:27.990514 24951 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0326 15:36:27.996276 24951 device_context.cc:372] device: 0, cuDNN Version: 7.6.
[2023/03/26 15:36:36] root INFO: load pretrain successful from /home/aistudio/PaddleOCR/output/rec_en_number_lite/best_accuracy
[2023/03/26 15:36:36] root INFO: infer_img: /home/aistudio/data/data128403/TAL_OCR_ENG手写英文数据集/data_composition/1016_1396_3.jpg
[2023/03/26 15:36:36] root INFO: result: for you a wonderful trip. 0.9326488
[2023/03/26 15:36:36] root INFO: success!

<Figure size 640x480 with 1 Axes>
#由于训练时间较长,这里使用预先训练好的模型继续效果展示(效果优于上面)。
!python tools/infer_rec.py -c /home/aistudio/work/rec_en_number_lite_train.yml \
-o Global.infer_img="/home/aistudio/data/data128403/TAL_OCR_ENG手写英文数据集/data_composition/1016_1396_3.jpg" \
Global.pretrained_model="/home/aistudio/work/en_number_model/best_accuary"
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:26: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
def convert_to_list(value, n, name, dtype=np.int):
[2023/03/26 15:37:19] root INFO: Architecture :
[2023/03/26 15:37:19] root INFO: Backbone :
[2023/03/26 15:37:19] root INFO: model_name : small
[2023/03/26 15:37:19] root INFO: name : MobileNetV3
[2023/03/26 15:37:19] root INFO: scale : 0.5
[2023/03/26 15:37:19] root INFO: small_stride : [1, 2, 2, 2]
[2023/03/26 15:37:19] root INFO: Head :
[2023/03/26 15:37:19] root INFO: fc_decay : 1e-05
[2023/03/26 15:37:19] root INFO: name : CTCHead
[2023/03/26 15:37:19] root INFO: Neck :
[2023/03/26 15:37:19] root INFO: encoder_type : rnn
[2023/03/26 15:37:19] root INFO: hidden_size : 48
[2023/03/26 15:37:19] root INFO: name : SequenceEncoder
[2023/03/26 15:37:19] root INFO: Transform : None
[2023/03/26 15:37:19] root INFO: algorithm : CRNN
[2023/03/26 15:37:19] root INFO: model_type : rec
[2023/03/26 15:37:19] root INFO: Eval :
[2023/03/26 15:37:19] root INFO: dataset :
[2023/03/26 15:37:19] root INFO: data_dir : /home/aistudio/data/data128403/TAL_OCR_ENG手写英文数据集/data_composition
[2023/03/26 15:37:19] root INFO: label_file_list : ['/home/aistudio/data/data128403/TAL_OCR_ENG手写英文数据集/test.txt']
[2023/03/26 15:37:19] root INFO: name : SimpleDataSet
[2023/03/26 15:37:19] root INFO: transforms :
[2023/03/26 15:37:19] root INFO: DecodeImage :
[2023/03/26 15:37:19] root INFO: channel_first : False
[2023/03/26 15:37:19] root INFO: img_mode : BGR
[2023/03/26 15:37:19] root INFO: CTCLabelEncode : None
[2023/03/26 15:37:19] root INFO: RecResizeImg :
[2023/03/26 15:37:19] root INFO: image_shape : [3, 32, 320]
[2023/03/26 15:37:19] root INFO: KeepKeys :
[2023/03/26 15:37:19] root INFO: keep_keys : ['image', 'label', 'length']
[2023/03/26 15:37:19] root INFO: loader :
[2023/03/26 15:37:19] root INFO: batch_size_per_card : 256
[2023/03/26 15:37:19] root INFO: drop_last : False
[2023/03/26 15:37:19] root INFO: num_workers : 8
[2023/03/26 15:37:19] root INFO: shuffle : False
[2023/03/26 15:37:19] root INFO: Global :
[2023/03/26 15:37:19] root INFO: cal_metric_during_train : True
[2023/03/26 15:37:19] root INFO: character_dict_path : ppocr/utils/en_dict.txt
[2023/03/26 15:37:19] root INFO: checkpoints : None
[2023/03/26 15:37:19] root INFO: debug : False
[2023/03/26 15:37:19] root INFO: distributed : False
[2023/03/26 15:37:19] root INFO: epoch_num : 200
[2023/03/26 15:37:19] root INFO: eval_batch_step : [0, 100]
[2023/03/26 15:37:19] root INFO: infer_img : /home/aistudio/data/data128403/TAL_OCR_ENG手写英文数据集/data_composition/1016_1396_3.jpg
[2023/03/26 15:37:19] root INFO: infer_mode : False
[2023/03/26 15:37:19] root INFO: log_smooth_window : 20
[2023/03/26 15:37:19] root INFO: max_text_length : 250
[2023/03/26 15:37:19] root INFO: pretrained_model : /home/aistudio/work/en_number_model/best_accuary
[2023/03/26 15:37:19] root INFO: print_batch_step : 10
[2023/03/26 15:37:19] root INFO: save_epoch_step : 3
[2023/03/26 15:37:19] root INFO: save_inference_dir : None
[2023/03/26 15:37:19] root INFO: save_model_dir : ./output/rec_en_number_lite
[2023/03/26 15:37:19] root INFO: use_gpu : True
[2023/03/26 15:37:19] root INFO: use_space_char : True
[2023/03/26 15:37:19] root INFO: use_visualdl : True
[2023/03/26 15:37:19] root INFO: Loss :
[2023/03/26 15:37:19] root INFO: name : CTCLoss
[2023/03/26 15:37:19] root INFO: Metric :
[2023/03/26 15:37:19] root INFO: main_indicator : acc
[2023/03/26 15:37:19] root INFO: name : RecMetric
[2023/03/26 15:37:19] root INFO: Optimizer :
[2023/03/26 15:37:19] root INFO: beta1 : 0.9
[2023/03/26 15:37:19] root INFO: beta2 : 0.999
[2023/03/26 15:37:19] root INFO: lr :
[2023/03/26 15:37:19] root INFO: learning_rate : 0.005
[2023/03/26 15:37:19] root INFO: name : Cosine
[2023/03/26 15:37:19] root INFO: name : Adam
[2023/03/26 15:37:19] root INFO: regularizer :
[2023/03/26 15:37:19] root INFO: factor : 1e-05
[2023/03/26 15:37:19] root INFO: name : L2
[2023/03/26 15:37:19] root INFO: PostProcess :
[2023/03/26 15:37:19] root INFO: name : CTCLabelDecode
[2023/03/26 15:37:19] root INFO: Train :
[2023/03/26 15:37:19] root INFO: dataset :
[2023/03/26 15:37:19] root INFO: data_dir : /home/aistudio/data/data128403/TAL_OCR_ENG手写英文数据集/data_composition
[2023/03/26 15:37:19] root INFO: label_file_list : ['/home/aistudio/data/data128403/TAL_OCR_ENG手写英文数据集/train.txt']
[2023/03/26 15:37:19] root INFO: name : SimpleDataSet
[2023/03/26 15:37:19] root INFO: transforms :
[2023/03/26 15:37:19] root INFO: DecodeImage :
[2023/03/26 15:37:19] root INFO: channel_first : False
[2023/03/26 15:37:19] root INFO: img_mode : BGR
[2023/03/26 15:37:19] root INFO: RecAug : None
[2023/03/26 15:37:19] root INFO: CTCLabelEncode : None
[2023/03/26 15:37:19] root INFO: RecResizeImg :
[2023/03/26 15:37:19] root INFO: image_shape : [3, 32, 320]
[2023/03/26 15:37:19] root INFO: KeepKeys :
[2023/03/26 15:37:19] root INFO: keep_keys : ['image', 'label', 'length']
[2023/03/26 15:37:19] root INFO: loader :
[2023/03/26 15:37:19] root INFO: batch_size_per_card : 256
[2023/03/26 15:37:19] root INFO: drop_last : True
[2023/03/26 15:37:19] root INFO: num_workers : 4
[2023/03/26 15:37:19] root INFO: shuffle : True
[2023/03/26 15:37:19] root INFO: profiler_options : None
[2023/03/26 15:37:19] root INFO: train with paddle 2.0.2 and device CUDAPlace(0)
W0326 15:37:19.148036 25282 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0326 15:37:19.155648 25282 device_context.cc:372] device: 0, cuDNN Version: 7.6.
[2023/03/26 15:37:24] root INFO: load pretrain successful from /home/aistudio/work/en_number_model/best_accuary
[2023/03/26 15:37:24] root INFO: infer_img: /home/aistudio/data/data128403/TAL_OCR_ENG手写英文数据集/data_composition/1016_1396_3.jpg
[2023/03/26 15:37:24] root INFO: result: for you a wonderful trip. 0.97006863
[2023/03/26 15:37:24] root INFO: success!
六、结论
本项目主要对手写英文单词进行识别,学习了解PaddleOCR结构,使用过程中直接修改配置文件即可,非常方便。项目整体流程包括环境安装,数据处理、格式转换,模型搭建、模型训练及模型预测与推理,主体采用PaddleOCR框架中的CRNN+CTC算法,经过多轮训练及优化,最终在手写英文单词上识别效果较好,后期可以尝试更换其他网络模型进行实验。
注:该项目为飞桨特训营第三期项目,本人作为活动导师完成项目开发。
作者介绍
