TY - GEN
T1 - SPDA-CNN: Unifying semantic part detection and abstraction for fine-grained recognition
AU - Zhang, Han
AU - Xu, Tao
AU - Elhoseiny, Mohamed
AU - Huang, Xiaolei
AU - Zhang, Shaoting
AU - Elgammal, Ahmed
AU - Metaxas, Dimitris
N1 - Generated from Scopus record by KAUST IRTS on 2019-11-20
PY - 2016/12/9
Y1 - 2016/12/9
N2 - Most convolutional neural networks (CNNs) lack midlevel layers that model semantic parts of objects. This limits CNN-based methods from reaching their full potential in detecting and utilizing small semantic parts in recognition. Introducing such mid-level layers can facilitate the extraction of part-specific features which can be utilized for better recognition performance. This is particularly important in the domain of fine-grained recognition. In this paper, we propose a new CNN architecture that integrates semantic part detection and abstraction (SPDACNN) for fine-grained classification. The proposed network has two sub-networks: one for detection and one for recognition. The detection sub-network has a novel top-down proposal method to generate small semantic part candidates for detection. The classification sub-network introduces novel part layers that extract features from parts detected by the detection sub-network, and combine them for recognition. As a result, the proposed architecture provides an end-to-end network that performs detection, localization of multiple semantic parts, and whole object recognition within one framework that shares the computation of convolutional filters. Our method outperforms state-of-theart methods with a large margin for small parts detection (e.g. our precision of 93.40% vs the best previous precision of 74.00% for detecting the head on CUB-2011). It also compares favorably to the existing state-of-the-art on finegrained classification, e.g. it achieves 85.14% accuracy on CUB-2011.
AB - Most convolutional neural networks (CNNs) lack midlevel layers that model semantic parts of objects. This limits CNN-based methods from reaching their full potential in detecting and utilizing small semantic parts in recognition. Introducing such mid-level layers can facilitate the extraction of part-specific features which can be utilized for better recognition performance. This is particularly important in the domain of fine-grained recognition. In this paper, we propose a new CNN architecture that integrates semantic part detection and abstraction (SPDACNN) for fine-grained classification. The proposed network has two sub-networks: one for detection and one for recognition. The detection sub-network has a novel top-down proposal method to generate small semantic part candidates for detection. The classification sub-network introduces novel part layers that extract features from parts detected by the detection sub-network, and combine them for recognition. As a result, the proposed architecture provides an end-to-end network that performs detection, localization of multiple semantic parts, and whole object recognition within one framework that shares the computation of convolutional filters. Our method outperforms state-of-theart methods with a large margin for small parts detection (e.g. our precision of 93.40% vs the best previous precision of 74.00% for detecting the head on CUB-2011). It also compares favorably to the existing state-of-the-art on finegrained classification, e.g. it achieves 85.14% accuracy on CUB-2011.
UR - http://ieeexplore.ieee.org/document/7780498/
UR - http://www.scopus.com/inward/record.url?scp=84986309458&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2016.129
DO - 10.1109/CVPR.2016.129
M3 - Conference contribution
SN - 9781467388504
BT - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
PB - IEEE Computer [email protected]
ER -