TY - JOUR
T1 - Mindstorms in natural language-based societies of mind
AU - Zhuge, Mingchen
AU - Liu, Haozhe
AU - Faccio, Francesco
AU - Ashley, Dylan R.
AU - Csordás, Robert
AU - Gopalakrishnan, Anand
AU - Hamdi, Abdullah
AU - Hammoud, Hasan Abed Al Kader
AU - Herrmann, Vincent
AU - Irie, Kazuki
AU - Kirsch, Louis
AU - Li, Bing
AU - Li, Guohao
AU - Liu, Shuming
AU - Mai, Jinjie
AU - Piȩkos, Piotr
AU - Ramesh, Aditya A.
AU - Schlag, Imanol
AU - Shi, Weimin
AU - Stanić, Aleksandar
AU - Wang, Wenyi
AU - Wang, Yuhui
AU - Xu, Mengmeng
AU - Fan, Deng Ping
AU - Ghanem, Bernard
AU - Schmidhuber, Jurgen
N1 - Publisher Copyright:
© 2024 Tsinghua University Press.
PY - 2025
Y1 - 2025
N2 - Inspired by Minsky's Society of Mind, Schmidhuber's Learning to Think, and other more recent works, this paper proposes and advocates for the concept of natural language-based societies of mind (NLSOMs). We imagine these societies as consisting of a collection of multimodal neural networks, including large language models, which engage in a 'mindstorm' to solve problems using a shared natural language interface. Here, we work to identify and discuss key questions about the social structure, governance, and economic principles for NLSOMs, emphasizing their impact on the future of AI. Our demonstrations with NLSOMs - which feature up to 129 agents - show their effectiveness in various tasks, including visual question answering, image captioning, and prompt generation for text-to-image synthesis.
AB - Inspired by Minsky's Society of Mind, Schmidhuber's Learning to Think, and other more recent works, this paper proposes and advocates for the concept of natural language-based societies of mind (NLSOMs). We imagine these societies as consisting of a collection of multimodal neural networks, including large language models, which engage in a 'mindstorm' to solve problems using a shared natural language interface. Here, we work to identify and discuss key questions about the social structure, governance, and economic principles for NLSOMs, emphasizing their impact on the future of AI. Our demonstrations with NLSOMs - which feature up to 129 agents - show their effectiveness in various tasks, including visual question answering, image captioning, and prompt generation for text-to-image synthesis.
KW - large language models (LLMs)
KW - learning to think
KW - mindstorm
KW - multimodal learning
KW - society of mind (SOM)
UR - http://www.scopus.com/inward/record.url?scp=105002705285&partnerID=8YFLogxK
U2 - 10.26599/CVM.2025.9450460
DO - 10.26599/CVM.2025.9450460
M3 - Article
AN - SCOPUS:105002705285
SN - 2096-0433
VL - 11
SP - 29
EP - 81
JO - Computational Visual Media
JF - Computational Visual Media
IS - 1
ER -