MVTN: Multi-View Transformation Network for 3D Shape Recognition

Abdullah Hamdi, Silvio Giancola, Bernard Ghanem

Research output: Chapter in Book/Report/Conference proceedingConference contribution

108 Scopus citations

Abstract

Multi-view projection methods have demonstrated their ability to reach state-of-the-art performance on 3D shape recognition. Those methods learn different ways to aggregate information from multiple views. However, the camera view-points for those views tend to be heuristically set and fixed for all shapes. To circumvent the lack of dynamism of current multi-view methods, we propose to learn those view-points. In particular, we introduce the Multi-View Transformation Network (MVTN) that regresses optimal view-points for 3D shape recognition, building upon advances in differentiable rendering. As a result, MVTN can be trained end-to-end along with any multi-view network for 3D shape classification. We integrate MVTN in a novel adaptive multi-view pipeline that can render either 3D meshes or point clouds. MVTN exhibits clear performance gains in the tasks of 3D shape classification and 3D shape retrieval with-out the need for extra training supervision. In these tasks, MVTN achieves state-of-the-art performance on ModelNet40, ShapeNet Core55, and the most recent and realistic ScanObjectNN dataset (up to 6% improvement). Interestingly, we also show that MVTN can provide network robustness against rotation and occlusion in the 3D domain. The code is available at https://github.com/ajhamdi/MVTN.
Original languageEnglish (US)
Title of host publication2021 IEEE/CVF International Conference on Computer Vision (ICCV)
PublisherIEEE
ISBN (Print)978-1-6654-2813-2
DOIs
StatePublished - 2021

Fingerprint

Dive into the research topics of 'MVTN: Multi-View Transformation Network for 3D Shape Recognition'. Together they form a unique fingerprint.

Cite this