In this paper, we provide a much simpler and more effificient alternative. We represent objects by a single point at their bounding box center (see Figure 2). Other properties, such as object size, dimension, 3D extent, orientation, and pose are then regressed directly from image features at the center location. Object detection is then a standard keypoint estimation problem [3,39,60]. We simply feed the input image to a fully convolutional network [37, 40] that generates a heatmap. Peaks in this heatmap correspond to object centers. Image features at each peak predict the objects bounding box height and weight. The model trains using standard dense supervised learning [39,60]. Inference is a single network forward-pass, without non-maximal suppression for post-processing.
介绍
在这篇论文中, 我们提供了一种更加简单的和有效的方法。我们通过一个在框中心位置的简单点来表示一个物体,其他属性,类型框的大小,维度, 3d范围, 方向和位置可以通过在图像特征中中心点的位置来回归获得。物体检测是一种标准的点估计问题。我们简单的喂数据给一个全卷机的网络然后生成一个热力图, 在热力图的尖端对应着物体的中心。图片特征中每一个尖端预测物体边界框的长和宽,这个模型训练使用标准的有监督学习。推理是一个简单的前向网络,不需要nms作为后处理。
Related work