TY - GEN
T1 - PMR: Prototypical Modal Rebalance for Multimodal Learning
AU - Fan, Yunfeng
AU - Xu, Wenchao
AU - Wang, Haozhao
AU - Wang, Junxiao
AU - Guo, Song
N1 - KAUST Repository Item: Exported on 2023-08-29
Acknowledgements: Our work was supported by fundings from the KeyArea Research and Development Program of Guangdong Province (No. 2021B0101400003), Hong Kong RGC Research Impact Fund (No. R5060-19), Areas of Excellence Scheme (AoE/E-601/22-R), General Research Fund (No. 152203/20E, 152244/21E, 152169/22E, PolyU15222621), Shenzhen Science and Technology Innovation Commission (JCYJ20200109142008673) and the grant from Establishment of Distributed Artificial Intelligence Laboratory for Interdisciplinary Research (UGC/IDS(R)11/19).
PY - 2023/8/22
Y1 - 2023/8/22
N2 - Multimodal learning (MML) aims to jointly exploit the common priors of different modalities to compensate for their inherent limitations. However, existing MML methods often optimize a uniform objective for different modalities, leading to the notorious “modality imbalance” problem and counterproductive MML performance. To address the problem, some existing methods modulate the learning pace based on the fused modality, which is dominated by the better modality and eventually results in a limited improvement on the worse modal. To better exploit the features of multimodal, we propose Prototypical Modality Rebalance (PMR) to perform stimulation on the particular slow-learning modality without interference from other modalities. Specifically, we introduce the prototypes that represent general features for each class, to build the non-parametric classifiers for uni-modal performance evaluation. Then, we try to accelerate the slow-learning modality by enhancing its clustering toward prototypes. Furthermore, to alleviate the suppression from the dominant modality, we introduce a prototype-based entropy regularization term during the early training stage to prevent premature convergence. Besides, our method only relies on the representations of each modality and without restrictions from model structures and fusion methods, making it with great application potential for various scenarios.
AB - Multimodal learning (MML) aims to jointly exploit the common priors of different modalities to compensate for their inherent limitations. However, existing MML methods often optimize a uniform objective for different modalities, leading to the notorious “modality imbalance” problem and counterproductive MML performance. To address the problem, some existing methods modulate the learning pace based on the fused modality, which is dominated by the better modality and eventually results in a limited improvement on the worse modal. To better exploit the features of multimodal, we propose Prototypical Modality Rebalance (PMR) to perform stimulation on the particular slow-learning modality without interference from other modalities. Specifically, we introduce the prototypes that represent general features for each class, to build the non-parametric classifiers for uni-modal performance evaluation. Then, we try to accelerate the slow-learning modality by enhancing its clustering toward prototypes. Furthermore, to alleviate the suppression from the dominant modality, we introduce a prototype-based entropy regularization term during the early training stage to prevent premature convergence. Besides, our method only relies on the representations of each modality and without restrictions from model structures and fusion methods, making it with great application potential for various scenarios.
UR - http://hdl.handle.net/10754/693779
UR - https://ieeexplore.ieee.org/document/10204495/
U2 - 10.1109/cvpr52729.2023.01918
DO - 10.1109/cvpr52729.2023.01918
M3 - Conference contribution
BT - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
PB - IEEE
ER -