TY - GEN
T1 - Multimodal Representation and Retrieval [MRR 2024]
AU - Zhu, Xinliang
AU - Dhua, Arnab
AU - Gray, Douglas
AU - Yalniz, I. Zeki
AU - Yu, Tan
AU - Elhoseiny, Mohamed
AU - Plummer, Bryan
N1 - Publisher Copyright:
© 2024 Owner/Author.
PY - 2024/7/10
Y1 - 2024/7/10
N2 - Multimodal data is available in many applications like e-commerce production listings, social media posts and short videos. However, existing algorithms dealing with those types of data still focus on uni-modal representation learning by vision-language alignment and cross-modal retrieval. In this workshop, we target to bring a new retrieval problem where both queries and documents are multimodal. With the popularity of vision language modeling, large language models (LLMs), retrieval augmented generation (RAG), and multimodal LLM, we see a lot of new opportunities for multimodal representation and retrieval tasks. This event will be a comprehensive half-day workshop focusing on the subject of multimodal representation and retrieval. The agenda includes keynote speeches, oral presentations, and an interactive panel discussion.
AB - Multimodal data is available in many applications like e-commerce production listings, social media posts and short videos. However, existing algorithms dealing with those types of data still focus on uni-modal representation learning by vision-language alignment and cross-modal retrieval. In this workshop, we target to bring a new retrieval problem where both queries and documents are multimodal. With the popularity of vision language modeling, large language models (LLMs), retrieval augmented generation (RAG), and multimodal LLM, we see a lot of new opportunities for multimodal representation and retrieval tasks. This event will be a comprehensive half-day workshop focusing on the subject of multimodal representation and retrieval. The agenda includes keynote speeches, oral presentations, and an interactive panel discussion.
KW - large language model
KW - multimodal large language model
KW - multimodal representation
KW - multimodal retrieval
KW - vision language modeling
UR - http://www.scopus.com/inward/record.url?scp=85200589158&partnerID=8YFLogxK
U2 - 10.1145/3626772.3657987
DO - 10.1145/3626772.3657987
M3 - Conference contribution
AN - SCOPUS:85200589158
T3 - SIGIR 2024 - Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 3047
EP - 3050
BT - SIGIR 2024 - Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval
PB - Association for Computing Machinery, Inc
T2 - 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2024
Y2 - 14 July 2024 through 18 July 2024
ER -