TY - GEN
T1 - Supercharging Imbalanced Data Learning With Energy-based Contrastive Representation Transfer
AU - Chen, Junya
AU - Xiu, Zidi
AU - Goldstein, Benjamin
AU - Henao, Ricardo
AU - Carin, Lawrence
AU - Tao, Chenyang
N1 - KAUST Repository Item: Exported on 2022-06-20
Acknowledgements: This research was supported in part by NIH/NIDDK R01-DK123062, NIH/NIBIB R01-EB025020, NIH/NINDS 1R61NS120246, DARPA, DOE, ONR and NSF. J. Chen was partially supported by Shanghai Municipal Science and Technology Major Project (No.2018SHZDZX01) and National Key R&D Program of China (No.2018AAA0100303). This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562 [85]. This work used the Extreme Science and Engineering Discovery Environment (XSEDE) PSC Bridges-2 and SDSC Expanse at the service-provider through allocation TG-ELE200002 and TG-CIS210044.
PY - 2021/1/1
Y1 - 2021/1/1
N2 - Dealing with severe class imbalance poses a major challenge for many real-world applications, especially when the accurate classification and generalization of minority classes are of primary interest. In computer vision and NLP, learning from datasets with long-tail behavior is a recurring theme, especially for naturally occurring labels. Existing solutions mostly appeal to sampling or weighting adjustments to alleviate the extreme imbalance, or impose inductive bias to prioritize generalizable associations. Here we take a novel perspective to promote sample efficiency and model generalization based on the invariance principles of causality. Our contribution posits a meta-distributional scenario, where the causal generating mechanism for label-conditional features is invariant across different labels. Such causal assumption enables efficient knowledge transfer from the dominant classes to their under-represented counterparts, even if their feature distributions show apparent disparities. This allows us to leverage a causal data augmentation procedure to enlarge the representation of minority classes. Our development is orthogonal to the existing imbalanced data learning techniques thus can be seamlessly integrated. The proposed approach is validated on an extensive set of synthetic and real-world tasks against state-of-the-art solutions.
AB - Dealing with severe class imbalance poses a major challenge for many real-world applications, especially when the accurate classification and generalization of minority classes are of primary interest. In computer vision and NLP, learning from datasets with long-tail behavior is a recurring theme, especially for naturally occurring labels. Existing solutions mostly appeal to sampling or weighting adjustments to alleviate the extreme imbalance, or impose inductive bias to prioritize generalizable associations. Here we take a novel perspective to promote sample efficiency and model generalization based on the invariance principles of causality. Our contribution posits a meta-distributional scenario, where the causal generating mechanism for label-conditional features is invariant across different labels. Such causal assumption enables efficient knowledge transfer from the dominant classes to their under-represented counterparts, even if their feature distributions show apparent disparities. This allows us to leverage a causal data augmentation procedure to enlarge the representation of minority classes. Our development is orthogonal to the existing imbalanced data learning techniques thus can be seamlessly integrated. The proposed approach is validated on an extensive set of synthetic and real-world tasks against state-of-the-art solutions.
UR - http://hdl.handle.net/10754/679145
UR - http://www.scopus.com/inward/record.url?scp=85131784336&partnerID=8YFLogxK
M3 - Conference contribution
SN - 9781713845393
SP - 21229
EP - 21243
BT - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
PB - Neural information processing systems foundation
ER -