TY - JOUR
T1 - Towards artificial general intelligence via a multimodal foundation model
AU - Fei, Nanyi
AU - Lu, Zhiwu
AU - Gao, Yizhao
AU - Yang, Guoxing
AU - Huo, Yuqi
AU - Wen, Jingyuan
AU - Lu, Haoyu
AU - Song, Ruihua
AU - Gao, Xin
AU - Xiang, Tao
AU - Sun, Hao
AU - Wen, Ji-Rong
N1 - KAUST Repository Item: Exported on 2022-06-06
Acknowledgements: Z.L. acknowledges National Natural Science Foundation of China (61976220). J.R.W. acknowledges National Natural Science Foundation of China (61832017), Beijing Outstanding Young Scientist Program (BJJWZYJH012019100020098), and Large-Scale Pre-Training Program 468 of Beijing Academy of Artificial Intelligence (BAAI). N.F. acknowledges the Outstanding Innovative Talents Cultivation Funded Programs 2021 of Renmin Univertity of China. We acknowledge the WenLan Data Group for helping us collect the pre-training dataset.
PY - 2022/6/2
Y1 - 2022/6/2
N2 - The fundamental goal of artificial intelligence (AI) is to mimic the core cognitive activities of human. Despite tremendous success in the AI research, most of existing methods have only single-cognitive ability. To overcome this limitation and take a solid step towards artificial general intelligence (AGI), we develop a foundation model pre-trained with huge multimodal data, which can be quickly adapted for various downstream cognitive tasks. To achieve this goal, we propose to pre-train our foundation model by self-supervised learning with weak semantic correlation data crawled from the Internet and show that promising results can be obtained on a wide range of downstream tasks. Particularly, with the developed model-interpretability tools, we demonstrate that strong imagination ability is now possessed by our foundation model. We believe that our work makes a transformative stride towards AGI, from our common practice of “weak or narrow AI” to that of “strong or generalized AI”.
AB - The fundamental goal of artificial intelligence (AI) is to mimic the core cognitive activities of human. Despite tremendous success in the AI research, most of existing methods have only single-cognitive ability. To overcome this limitation and take a solid step towards artificial general intelligence (AGI), we develop a foundation model pre-trained with huge multimodal data, which can be quickly adapted for various downstream cognitive tasks. To achieve this goal, we propose to pre-train our foundation model by self-supervised learning with weak semantic correlation data crawled from the Internet and show that promising results can be obtained on a wide range of downstream tasks. Particularly, with the developed model-interpretability tools, we demonstrate that strong imagination ability is now possessed by our foundation model. We believe that our work makes a transformative stride towards AGI, from our common practice of “weak or narrow AI” to that of “strong or generalized AI”.
UR - http://hdl.handle.net/10754/678565
UR - https://www.nature.com/articles/s41467-022-30761-2
U2 - 10.1038/s41467-022-30761-2
DO - 10.1038/s41467-022-30761-2
M3 - Article
C2 - 35655064
SN - 2041-1723
VL - 13
JO - Nature Communications
JF - Nature Communications
IS - 1
ER -