
FightingCV
2022/10/07阅读:116主题:默认主题
深度学习刷SOTA有哪些trick?
深度学习刷SOTA有哪些trick?
【写在前面】
对深度学习而言,不论是学术研究还是落地应用都需要尽可能提升模型效果,这往往需要trick进行支撑。这些trick有的是广泛适用的(如循环学习率、BN等等),有的是任务特定的(比如cv里的数据增强,nlp里的mask,推荐里的降采样)。
这些trick有的能够提升网络精度,有的能够加速收敛,有的甚至比模型提升更加显著。
在同学们各自的领域中,有哪些常遇到的、易实践的、易推广的trick呢?
来源:https://www.zhihu.com/question/540433389/answer/2549775065[1]
作者:Gordon Lee
-
R-Drop:两次前向+KL loss约束
-
MLM: 在领域语料上用mlm进一步预训练 (Post-training)
-
EFL: 少样本下,把分类问题转为匹配问题,把输入构造为NSP任务形式.
-
混合精度fp16: 加快训练速度,提高训练精度
-
多卡ddp训练的时候,用到梯度累积时,可以使用no_sync减少不必要的梯度同步,加快速度
-
对于验证集或者测试集特别大的情况,可以尝试多卡inference,需要用的就是dist.all_gather,对于非张量[2]的话也可以用all_gather_object
-
PET: 少样本下,把分类转为mask位置预测,并构造verbalizer,参考EACL2021. PET
-
ArcFaceLoss:双塔句子匹配的loss把NT-Xent loss改成arccos[3]的形式,参考ACL2022. ArcCSE
-
数据增强在zero shot x-lingual transfer:code switch,machine translation..记得最后加一致性loss,参考consistency regularization for cross lingual finetuning
-
SimCSE:继续在领域语料上做simcse[4]的预训练
-
Focal loss: 不平衡的处理
-
双塔迟交互[5]:maxsim操作:query和doc的每个token表征算相似度,取最大相似度再求和。速度和精度都有一个很好的平衡,参考colbert
-
持续学习减轻遗忘:EWC方法+一个很强的预训练模型效果很不错。就是加一个正则让重要参数遗忘不太多,重要性用fisher信息度量。
-
对抗训练:FGM,PGD,能提点,就是训练慢,
-
memory bank增大bsz[6],虽然我感觉有时候有点鸡肋
-
PolyLoss: -logpt + eps * (1-pt) 效果存疑,反正我试了没啥效果,有人试过效果不错
作者:昆特Alex
一句话原则: AI performance = data(70%) + model(CNN、RNN、Transformer、Bert、GPT 20%) + trick(loss、warmup[7]、optimizer、attack-training etc 10%) 记住:数据决定了AI的上线,模型和trick只是去逼近这个上线,还是那句老话:garbage in, garbage out。 下面具体分享在NLP领域的一些具体trick:
一、Data Augmentation
1、噪音数据删除:(最大熵删除法[8]、cleanlab等)
2、错误标注数据修改:交叉验证[9]训练多个模型,取模型预测结果一致且prob比threshold大的数据(或者topN)。多个模型可以采用不同的seed[10],不同的训练集测试机,或者不同的模型结果(bert与textcnn等),找出覆盖部分模型预测与标柱数据[11]不一致的标注错误数据进行修改。
3、数据增强
-
同义词替换(Synonym Replacement):从句子中随机选取n个不属于停用词集的单词,并随机选择其同义词替换它们;
-
随机插入(Random Insertion):随机的找出句中某个不属于停用词集的词,并求出其随机的同义词,将该同义词插入句子的一个随机位置。重复n次;
-
随机交换(Random Swap):随机的选择句中两个单词并交换它们的位置。重复n次;
-
随机删除(Random Deletion):以 ppp 的概率,随机的移除句中的每个单词;
-
反向翻译(back translation):将源语言翻译成中间语言,再翻译回原语言
二、Model backbone
Transformer已经随着bert而大杀四方了,不同的预训练模型backbone有着不同的应用场景。领域数据充足且条件允许的话可以考虑用行业预料进行预训练,次之进行领域再训练,最后才考虑用公开的模型进行finetune[12]。各个公开的backbone选择小trick如下:
-
robert_wwm_ext: 文本分类、NER等任务单句自然语言理解(NLU)任务上性能较好
-
simbert:句子相似度计算、句子对关系判断等任务上效果较好
-
GPT系列:文本翻译、文本摘要等自然语言生成(NLG)任务上性能效果较好。 三、训练loss等其他trick
三、tricks
-
样本不均衡问题:除了前面介绍的数据增强,过采样等方法外,还可以试试facalloss、loss加权[13]等方式处理。
-
optimizer、lr、warmup、batch\_size[14]等配合的好也能能够神奇提点(比如通常batch_size较大时lr也可以同步提升)。
-
训练trick:进行对抗训练[15](FGM、PGD)等
-
多任务学习:增加auxiliary loss
-
label smoothing[16]: 经过了噪音数据删除、数据增强等数据精度还是差强人意的话可以考虑
-
etc···
last but not least:AI performance = data(70%) + model(20%) +other trick(10%),请把时间花在最能提升模型性能的事情上面,而不是追求各种花式trick叠buff,trick只是用来景上添花,而数据以及选择的模型backbone才是最核心的景色。
作者:爱睡觉的KKY
这里我罗列一些比较通用的trick[17],都是经过自己论文或比赛中使用验证过的:
-
尝试模型初始化方法,不同的分布,分布参数。下图是用不同初始化方法网络性能对比,有兴趣的同学可以看看kaiming[18]的论文Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification[19] 。

-
不同的预训练任务初始化,在最近的Google Universal Embedding比赛中,采用ImageNet 1K(1K类别分类)和ImageNet21k的两种Pretrained Model 表现差异非常大,主要原因是21K任务分类粒度更细,模型对图片细粒度信息关注度更高,得到的模型输出作为Image Embedding Vector性能表现越好,有兴趣的同学可以看看这篇讨论[20]。
-
warmup cosine lr scheduler , 先热身(学习率逐渐攀升),再进行余弦衰减[21],对大模型这个学习率策略非常好用,在huggingface Transformers[22] 库已经有现成的。

-
分层学习率或学习率策略
-
对抗训练提升模型鲁棒性,方法有很多,我常用的是对抗权重扰动[23](AWP, Adversarial Weight Perturbation),实现可以参考这篇文章[24]。
-
随机权重平均[25](Stochastic Weight Averaging,SWA),通过对训练过程中的模型权重进行Avg融合,提升模型鲁棒性,PyTorch有官方实现。

-
pseudo label[26] / meta pseudo label ,比赛中常用的半监督技巧。
-
Meta Pseudo Labels (https://arxiv.org/abs/2003.10580[27])
-
Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach (https://arxiv.org/pdf/2010.07835.pdf[28])
-
TTA,test time augmentation[29],可以搭配Data Augmentation来做。
-
数据增强
-
nlp :回译,词性替换等
-
CV :resize[30]、crop、flip、ratate、blur、HSV变化、affine(仿射)、perspective(透视)、Mixup、cutout、cutmix、Random Erasing(随机擦除)、Mosaic(马赛克)、CopyPaste、GANs domain transfer等)
-
蒸馏,参考论文Can Students Outperform Teachers in Knowledge Distillation Based Model Comparison? (https://openreview.net/pdf?id=XZDeL25T12l[31])
-
结构重参数化,细节可查看RepVGG论文(https://arxiv.org/abs/2101.03697[32])
-
GradientCheckpoint, 节省显存,让你有更高的建模[33]自由度
-
终极答案,换个random seed[34](笑)
以上均是针对单模型的通用trick。一些领域限定的技术这里不再具体罗列,但是有不少文献大家可以借鉴,例如:
-
细粒度分类领域:A Strong Baseline and Batch Normalization Neck for Deep Person Re-identification[35]
-
GANs训练技巧:Training Language GANs from Scratch[36]
【项目推荐】
面向小白的顶会论文核心代码库:https://github.com/xmu-xiaoma666/External-Attention-pytorch[37]
AI必读论文和视频教程:https://github.com/xmu-xiaoma666/FightingCV-Course[38]
面向小白的YOLO目标检测库:https://github.com/iscyy/yoloair[39]
面向小白的顶刊顶会的论文解析:https://github.com/xmu-xiaoma666/FightingCV-Paper-Reading[40]

【技术交流】
已建立深度学习公众号——FightingCV,关注于最新论文解读、基础知识巩固、学术科研交流,欢迎大家关注!!!
请关注FightingCV公众号,并后台回复ECCV2022即可获得ECCV中稿论文汇总列表。
推荐加入FightingCV交流群,每日会发送论文解析、算法和代码的干货分享,进行学术交流,加群请添加小助手wx:FightngCV666,备注:地区-学校(公司)-名称

参考资料
https://www.zhihu.com/question/540433389/answer/2549775065: https://www.zhihu.com/question/540433389/answer/2549775065
[2]非张量: https://www.zhihu.com/search?q=非张量&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2549775065}
[3]arccos: https://www.zhihu.com/search?q=arccos&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2549775065}
[4]simcse: https://www.zhihu.com/search?q=simcse&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2549775065}
[5]双塔迟交互: https://www.zhihu.com/search?q=双塔迟交互&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2549775065}
[6]bsz: https://www.zhihu.com/search?q=bsz&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2549775065}
[7]warmup: https://www.zhihu.com/search?q=warmup&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2576569581}
[8]最大熵删除法: https://www.zhihu.com/search?q=最大熵删除法&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2576569581}
[9]交叉验证: https://www.zhihu.com/search?q=交叉验证&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2576569581}
[10]seed: https://www.zhihu.com/search?q=seed&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2576569581}
[11]标柱数据: https://www.zhihu.com/search?q=标柱数据&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2576569581}
[12]finetune: https://www.zhihu.com/search?q=finetune&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2576569581}
[13]loss加权: https://www.zhihu.com/search?q=loss加权&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2576569581}
[14]batch_size: https://www.zhihu.com/search?q=batch_size&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2576569581}
[15]对抗训练: https://www.zhihu.com/search?q=对抗训练&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2576569581}
[16]label smoothing: https://www.zhihu.com/search?q=label+smoothing&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2576569581}
[17]trick: https://www.zhihu.com/search?q=trick&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2626853486}
[18]kaiming: https://www.zhihu.com/search?q=kaiming&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2626853486}
[19]Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification: https://link.zhihu.com/?target=https://arxiv.org/abs/1502.01852
[20]这篇讨论: https://link.zhihu.com/?target=https://www.kaggle.com/competitions/google-universal-image-embedding/discussion/339554
[21]余弦衰减: https://www.zhihu.com/search?q=余弦衰减&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2626853486}
[22]mers: https://www.zhihu.com/search?q=mers&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2626853486}
[23]对抗权重扰动: https://www.zhihu.com/search?q=对抗权重扰动&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2626853486}
[24]这篇文章: https://link.zhihu.com/?target=https://www.kaggle.com/code/junkoda/fast-awp
[25]随机权重平均: https://www.zhihu.com/search?q=随机权重平均&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2626853486}
[26]pseudo label: https://www.zhihu.com/search?q=pseudo+label&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2626853486}
[27]https://arxiv.org/abs/2003.10580: https://link.zhihu.com/?target=https://arxiv.org/abs/2003.10580
[28]https://arxiv.org/pdf/2010.07835.pdf: https://link.zhihu.com/?target=https://arxiv.org/pdf/2010.07835.pdf
[29]test time augmentation: https://www.zhihu.com/search?q=test+time+augmentation&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2626853486}
[30]resize: https://www.zhihu.com/search?q=resize&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2626853486}
[31]https://openreview.net/pdf?id=XZDeL25T12l: https://link.zhihu.com/?target=https://openreview.net/pdf?id=XZDeL25T12l
[32]https://arxiv.org/abs/2101.03697: https://link.zhihu.com/?target=https://arxiv.org/abs/2101.03697
[33]建模: https://www.zhihu.com/search?q=建模&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2626853486}
[34]random seed: https://www.zhihu.com/search?q=random+seed&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra={"sourceType":"answer","sourceId":2626853486}
[35]A Strong Baseline and Batch Normalization Neck for Deep Person Re-identification: https://link.zhihu.com/?target=https://arxiv.org/abs/1906.08332
[36]Training Language GANs from Scratch: https://link.zhihu.com/?target=https://proceedings.neurips.cc/paper/2019/file/a6ea8471c120fe8cc35a2954c9b9c595-Paper.pdf
[37]https://github.com/xmu-xiaoma666/External-Attention-pytorch: https://github.com/xmu-xiaoma666/External-Attention-pytorch
[38]https://github.com/xmu-xiaoma666/FightingCV-Course: https://github.com/xmu-xiaoma666/FightingCV-Course
[39]https://github.com/iscyy/yoloair: https://github.com/iscyy/yoloair
[40]https://github.com/xmu-xiaoma666/FightingCV-Paper-Reading: https://github.com/xmu-xiaoma666/FightingCV-Paper-Reading
作者介绍

FightingCV
公众号 FightingCV