深度学习笔记（十九）HAMBox: Delving into Online High-quality Anchors Mining for Detecting Outer Faces

时间：2020-03-26 15:25:34 阅读：135 评论：0 收藏：0 [点我收藏+]

paper link

keyword

outer faces：异常人脸，由于人脸尺度过小或者人脸尺度与anchor尺度不匹配，造成训练时匹配不到足够多的Anchor（小于阈值K），影响了这些人脸的召回。

HAMBox：Online High-quality Anchor Mining Strategy, 在线高质量锚框挖掘策略。

high-quality anchor：如果某个 Anchor 经过网络回归后的框与人脸框的 GT 的i IoU 大于0.5，则称其为高质量 Anchor。

matched anchor：在训练时，与目标人脸的 $IoU \ge 0.35$ 的anchor。

unmatched ancho：在训练时，与目标人脸的 $IoU<0.35$ 的anchor。

PBB：represents ‘Predicted Bounding Boxes’，即匹配的 Anchor 经过回归后的框。

CPBB：Correctly Predicted Bounding Boxes，匹配的 Anchor 经过回归后能与 GT 的 $IoU \ge 0.5$，则称其为CPBB，即 matched high-quality anchor 回归后的框。

Abstract

However, we observe that more than 80% correctly predicted bounding boxes are regressed from the unmatched anchors (the IoUs between anchors and target faces are lower than a threshold) in the inference phase.

作者发现，在前向推理中，居然有 80% 的正确回归框（Pre vs GT $IoU \ge 0.5$）是来自于未匹配的 Anchors( Anchor vs GT $IoU < 0.5$)，这一现象惊呆了小伙伴们！这表明那些我们在训练阶段忽略的 Anchor(不参与回归训练) 居然出色的回归能力。基于此，作者提出了 HAMBox 来提升 outer faces 的性能。同时，作者强调这是 a general strategy，适用于 anchor-based single stage face detection。至于代码，说是会放在度娘的 PaddlePaddle 里。技术分享图片

Introduction

Current state-of-the-art face detectors are usually based on anchor-based deep CNNs, inspired by their successes on the general object detection.

Different from general object detectors, face detectors often face smaller variations of aspect ratios (from 1:1 to 1:1.5) but much larger scale variations (face area, from several pixels to thousands of pixels).

主流的人脸检测器也是和通用目标检测一样，是基于 anchor 设计的（当然 Anchor Free 也在高速发展）。但不同于 general object detectors 的是，人脸往往 aspect ratios 变化较小，但是 scale variations 很大。针对这一问题，有的方案是采用 FPN + Dense Anchors 的策略，但这会极大的增加推理耗时。从效率上来说，你设计的 Anchor 越简单高效越好。$S^3FD$ 论文中采用单一尺度和高宽比的 Anchor，这看起来可能会比较简单高效，但实际用的时候选择合适的锚定尺度仍然是一个很大的挑战，这通常是由以下的失准现象造成的：


(a) Average Number of Anchors Matched to Each Face	(b) Proportion of Faces that can Match with Anchors
Figure 1. Two crucial factors in designing anchor scales on the WIDER FACE dataset. (a) As the scale of anchor increases, the average number of anchors matched to each face also increases. (b) The proportion of faces that can match the anchors decreases significantly outside a specific interval ([0.43, 0.7])

the average number of anchors matched to each face：图中表明可以通过增加 Anchor 的尺度来增加 GT 匹配的 Anchor 的数量，这其实和我自己做的实验是一致的（小目标匹配的 Anchor 数量少，大目标匹配的 Anchor 数量多）
the proportion of all faces that can match the anchors：这张图则表明，单纯的增加 Anchor 的尺度到后期会导致匹配失败（一般是小目标无法匹配）的数量增加

因此，我们在设计 Anchor 的时候可以参考这两个方面。

$S^3FD$ 通过降低匹配阈值来强行为 outer face 匹配足够数目的 Anchor；EMO 论文中则通过 Expected Maximum Overlap 策略来获得合适的 anchor stride and receptive field。然而，通过实验观察到，这些补偿方法其实引入了大量低质量的anchor，其实表现也不是很好，见下图(b)：

图中橄榄色表示的是训练过程中，传统的 Anchor 匹配策略下，这些匹配的 Anchor 回归后与 GT 的 IoU；紫色线则表示作者提出的 HAMBox 方法。可以看出，传统的匹配策略由于Anchor 的质量不高，平均回归 IoU 只有 0.4（而训练目标是要这些匹配 Anchor 都向着 GT 的方向去的）；作者提出的 HAMBox 方法，平均回归 IoU 可以达到 0.8！


(a) Cumulative Desity Curve of IoU	(b) Performance of Compensated Anchors

(c) Proportion of unmatched High-quality Anchors	(d) Performance of Matched High-quality Anchors
Figure 2. The problem of standard anchor matching strategy during training and inference (on the WIDER FACE dataset). (a) During inference, only 11% of all correctly predicted bounding boxes are regressed by matched anchors. (b) PBB represents ‘Predicted Bounding Boxes’. When using our HAMBox strategy, the IoUs between ground-truths and predicted bounding boxes regressed by compensated anchors are much higher than standard anchor matching strategy during training. (c) During training, the average number of unmatched high-quality anchors occupies a surprisingly 65% proportion of all high-quality anchors. (d) CPBB represents ‘Correctly Predicted Bounding Boxes’. During inference, the number of matched high-quality anchors dramatically decreases after NMS, representing some unmatched anchors have higher regression ability. All these results demonstrate that the standard anchor matching strategy can not utilize high-quality negative anchors effectively, which play essential roles whatever during training or inference.

文中以 PyramidBox 为基准算法来研究了 Anchor Matching 的问题，如上图(a) 所示，横坐标是匹配阶段 high-quality anchor 与 GT 的 IoU，纵坐标是累计概率分布F(x)。可以看到匹配 $IoU \le 0.35$ 的 high-quality anchor 占所有 high-quality anchor 的 89%。也就是说，如果以 0.35 为匹配阈值，所有 Anchor 经过 regression 之后，high-quality-anchor 中只有 11% 的框是来源于匹配 Anchor，89% 的高质量 Anchor 竟然来源于负 Anchor(这些 Anchor 实际上并没有参与回归训练)！amazing！

而在训练阶段，如上图(c) ，所有 high-quanlity anchor 中，有65%是由 unmatched anchor 回归得到的。

图(d) 则进一步统计了 Matched Anchors 的表现：以横坐标为 1 的一组柱状图为例，以 0.35 为阈值的话，能匹配到 2 个 Anchor 的 GT 数量是 2492，这些匹配的 Anchor 经过回归后能与 GT $IoU \ge 0.5$，即 matched high-quality anchor 的数量变成了 1968，也就是说有 524 个 GT 由于原始匹配的框质量较差，难以通过回归网络来提升 IoU，因此无法召回。对于绿色柱状图我有点懵逼，我理解起来有两种解释技术分享图片。第一种解释是将所有的 matched high-quality anchor 经过 NMS 操作后却仅有 343 个 GT 被保留了下来，刨除掉目标相互遮挡的原因，这些 NMS 被丢掉的 GT 可能是由于分类得分低于阈值。这表明低质量的 Anchor 即使有可能回归好，最终还是会因为分类分数低而被干掉。第二种解释是所有的 high-quality anchor（包括 matched high-quality anchor 和 unmatched high-quality anchor）经过NMS 操作后只剩下了 343 个 GT 被召回，其他 matched high-quality anchor 都被 unmatched high-quality anchor(分类分数比和它交叠的 matched high-quality anchor 还高) 给抑制掉了。估摸着第二种比较贴合论文，反正一句话就是，为了匹配 OuterFace 而增加的低质量的 Matched Anchors 大部分是个渣渣，并不能提升 OuterFace 的召回率！

基于这些问题，作者提出了 HAMBox 算法，旨在为 OuterFace 在线匹配更多的 high-quality-anchor，简而言之，该算法随着训练进行，逐渐为 OuterFace 挖掘 high-quality-anchor，从而获得更好的回归效果。在挖掘高质量 Anchor 之后，使用 regression-aware focal loss 来对新补偿得到的 high-quality anchor 的分类分支 loss 进行加权。

深度学习笔记（十九）HAMBox: Delving into Online High-quality Anchors Mining for Detecting Outer Faces

原文：https://www.cnblogs.com/xuanyuyt/p/12566024.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)