Object detection is a fundamental problem in computer vision. Impressive results have been achieved on large-scale detection benchmarks by fully-supervised object detection (FSOD) methods. However, FSOD approaches require tremendous instance-level annotations, which are time-consuming to collect. In contrast, weakly supervised object detection (WSOD) exploits easily-collected image-level labels while it suffers from relatively inferior detection performance. This thesis studies hybrid learning methods on the object detection problems. We intend to train an object detector from a dataset where both instance-level and image-level labels are employed. Extensive experiments on the challenging PASCAL VOC 2007 and 2012 benchmarks strongly demonstrate the effectiveness of our method, which gives a trade-off between collecting fewer annotations and building a more accurate object detector. Our method is also a strong baseline bridging the wide gap between FSOD and WSOD performances. Based on the hybrid learning framework, we further study the problem of object detection from a novel perspective in which the annotation budget constraints are taken into consideration. When provided with a fixed budget, we propose a strategy for building a diverse and informative dataset that can be used to optimally train a robust detector. We investigate both optimization and learning-based methods to sample which images to annotate and which level of annotations (strongly or weakly supervised) to annotate them with. By combining an optimal image/annotation selection scheme with the hybrid supervised learning, we show that one can achieve the performance of a strongly supervised detector on PASCAL-VOC 2007 while saving 12:8% of its original annotation budget. Furthermore, when 100% of the budget is used, it surpasses this performance by 2:0 mAP percentage points.
|Date made available
|KAUST Research Repository