Related papers: Foundation Model is Efficient Multimodal Multitask…

Less is More: Parameter-Efficient Selection of Intermediate Tasks for Transfer Learning

Intermediate task transfer learning can greatly improve model performance. If, for example, one has little training data for emotion detection, first fine-tuning a language model on a sentiment classification dataset may improve performance…

Computation and Language · Computer Science 2024-10-22 David Schulte , Felix Hamborg , Alan Akbik

EMS: Efficient and Effective Massively Multilingual Sentence Embedding Learning

Massively multilingual sentence representation models, e.g., LASER, SBERT-distill, and LaBSE, help significantly improve cross-lingual downstream tasks. However, the use of a large amount of data or inefficient model architectures results…

Computation and Language · Computer Science 2024-05-31 Zhuoyuan Mao , Chenhui Chu , Sadao Kurohashi

Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning

Foundation models have emerged as a powerful tool for many AI problems. Despite the tremendous success of foundation models, effective adaptation to new tasks, particularly those with limited labels, remains an open question and lacks…

Machine Learning · Computer Science 2024-02-26 Zhuoyan Xu , Zhenmei Shi , Junyi Wei , Fangzhou Mu , Yin Li , Yingyu Liang

Label Efficient Learning of Transferable Representations across Domains and Tasks

We propose a framework that learns a representation transferable across different domains and tasks in a label efficient manner. Our approach battles domain shift with a domain adversarial loss, and generalizes the embedding to novel task…

Machine Learning · Statistics 2017-12-04 Zelun Luo , Yuliang Zou , Judy Hoffman , Li Fei-Fei

Efficient Transferability Assessment for Selection of Pre-trained Detectors

Large-scale pre-training followed by downstream fine-tuning is an effective solution for transferring deep-learning-based models. Since finetuning all possible pre-trained models is computational costly, we aim to predict the…

Computer Vision and Pattern Recognition · Computer Science 2024-03-15 Zhao Wang , Aoxue Li , Zhenguo Li , Qi Dou

Multi-Task Label Embedding for Text Classification

Multi-task learning in text classification leverages implicit correlations among related tasks to extract common features and yield performance gains. However, most previous works treat labels of each task as independent and meaningless…

Computation and Language · Computer Science 2017-10-20 Honglun Zhang , Liqiang Xiao , Wenqing Chen , Yongkun Wang , Yaohui Jin

A Mixture of Experts Foundation Model for Scanning Electron Microscopy Image Analysis

Scanning Electron Microscopy (SEM) is indispensable in modern materials science, enabling high-resolution imaging across a wide range of structural, chemical, and functional investigations. However, SEM imaging remains constrained by…

Machine Learning · Computer Science 2026-04-08 Sk Miraj Ahmed , Yuewei Lin , Chuntian Cao , Shinjae Yoo , Xinpei Wu , Won-Il Lee , Nikhil Tiwale , Dan N. Le , Thi Thu Huong Chu , Jiyoung Kim , Kevin G. Yager , Chang-Yong Nam

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

Tabular Foundation Models have recently established the state of the art in supervised tabular learning, by leveraging pretraining to learn generalizable representations of numerical and categorical structured data. However, they lack…

Machine Learning · Computer Science 2026-05-12 Alan Arazi , Eilam Shapira , Shoham Grunblat , Mor Ventura , Elad Hoffer , Gioia Blayer , David Holzmüller , Lennart Purucker , Gaël Varoquaux , Frank Hutter , Roi Reichart

TSEmbed: Unlocking Task Scaling in Universal Multimodal Embeddings

Despite the exceptional reasoning capabilities of Multimodal Large Language Models (MLLMs), their adaptation into universal embedding models is significantly impeded by task conflict. To address this, we propose TSEmbed, a universal…

Computation and Language · Computer Science 2026-03-06 Yebo Wu , Feng Liu , Ziwei Xie , Zhiyuan Liu , Changwang Zhang , Jun Wang , Li Li

Model Predictive Task Sampling for Efficient and Robust Adaptation

Foundation models have revolutionized general-purpose problem-solving, offering rapid task adaptation through pretraining, meta-training, and finetuning. Recent crucial advances in these paradigms reveal the importance of challenging task…

Machine Learning · Computer Science 2025-10-21 Qi Wang , Zehao Xiao , Yixiu Mao , Yun Qu , Jiayi Shen , Yiqin Lv , Xiangyang Ji

Efficient Domain Adaptation of Multimodal Embeddings using Constrastive Learning

Recent advancements in machine learning (ML), natural language processing (NLP), and foundational models have shown promise for real-life applications in critical, albeit compute-constrainted fields like healthcare. In such areas, combining…

Machine Learning · Computer Science 2025-02-05 Georgios Margaritis , Periklis Petridis , Dimitris J. Bertsimas

How Good are Foundation Models in Step-by-Step Embodied Reasoning?

Embodied agents operating in the physical world must make decisions that are not only effective but also safe, spatially coherent, and grounded in context. While recent advances in large multimodal models (LMMs) have shown promising…

Computer Vision and Pattern Recognition · Computer Science 2025-09-23 Dinura Dissanayake , Ahmed Heakl , Omkar Thawakar , Noor Ahsan , Ritesh Thawkar , Ketan More , Jean Lahoud , Rao Anwer , Hisham Cholakkal , Ivan Laptev , Fahad Shahbaz Khan , Salman Khan

MFE-ETP: A Comprehensive Evaluation Benchmark for Multi-modal Foundation Models on Embodied Task Planning

In recent years, Multi-modal Foundation Models (MFMs) and Embodied Artificial Intelligence (EAI) have been advancing side by side at an unprecedented pace. The integration of the two has garnered significant attention from the AI research…

Artificial Intelligence · Computer Science 2024-10-08 Min Zhang , Xian Fu , Jianye Hao , Peilong Han , Hao Zhang , Lei Shi , Hongyao Tang , Yan Zheng

Emu: Generative Pretraining in Multimodality

We present Emu, a Transformer-based multimodal foundation model, which can seamlessly generate images and texts in multimodal context. This omnivore model can take in any single-modality or multimodal data input indiscriminately (e.g.,…

Computer Vision and Pattern Recognition · Computer Science 2024-05-09 Quan Sun , Qiying Yu , Yufeng Cui , Fan Zhang , Xiaosong Zhang , Yueze Wang , Hongcheng Gao , Jingjing Liu , Tiejun Huang , Xinlong Wang

Pick the Best Pre-trained Model: Towards Transferability Estimation for Medical Image Segmentation

Transfer learning is a critical technique in training deep neural networks for the challenging medical image segmentation task that requires enormous resources. With the abundance of medical image data, many research institutions release…

Computer Vision and Pattern Recognition · Computer Science 2023-07-25 Yuncheng Yang , Meng Wei , Junjun He , Jie Yang , Jin Ye , Yun Gu

MMTEB: Massive Multilingual Text Embedding Benchmark

Text embeddings are typically evaluated on a limited set of tasks, which are constrained by language, domain, and task diversity. To address these limitations and provide a more comprehensive evaluation, we introduce the Massive…

Computation and Language · Computer Science 2025-11-14 Kenneth Enevoldsen , Isaac Chung , Imene Kerboua , Márton Kardos , Ashwin Mathur , David Stap , Jay Gala , Wissam Siblini , Dominik Krzemiński , Genta Indra Winata , Saba Sturua , Saiteja Utpala , Mathieu Ciancone , Marion Schaeffer , Gabriel Sequeira , Diganta Misra , Shreeya Dhakal , Jonathan Rystrøm , Roman Solomatin , Ömer Çağatan , Akash Kundu , Martin Bernstorff , Shitao Xiao , Akshita Sukhlecha , Bhavish Pahwa , Rafał Poświata , Kranthi Kiran GV , Shawon Ashraf , Daniel Auras , Björn Plüster , Jan Philipp Harries , Loïc Magne , Isabelle Mohr , Mariya Hendriksen , Dawei Zhu , Hippolyte Gisserot-Boukhlef , Tom Aarsen , Jan Kostkan , Konrad Wojtasik , Taemin Lee , Marek Šuppa , Crystina Zhang , Roberta Rocca , Mohammed Hamdy , Andrianos Michail , John Yang , Manuel Faysse , Aleksei Vatolin , Nandan Thakur , Manan Dey , Dipam Vasani , Pranjal Chitale , Simone Tedeschi , Nguyen Tai , Artem Snegirev , Michael Günther , Mengzhou Xia , Weijia Shi , Xing Han Lù , Jordan Clive , Gayatri Krishnakumar , Anna Maksimova , Silvan Wehrli , Maria Tikhonova , Henil Panchal , Aleksandr Abramov , Malte Ostendorff , Zheng Liu , Simon Clematide , Lester James Miranda , Alena Fenogenova , Guangyu Song , Ruqiya Bin Safi , Wen-Ding Li , Alessia Borghini , Federico Cassano , Hongjin Su , Jimmy Lin , Howard Yen , Lasse Hansen , Sara Hooker , Chenghao Xiao , Vaibhav Adlakha , Orion Weller , Siva Reddy , Niklas Muennighoff

AMES: A Differentiable Embedding Space Selection Framework for Latent Graph Inference

In real-world scenarios, although data entities may possess inherent relationships, the specific graph illustrating their connections might not be directly accessible. Latent graph inference addresses this issue by enabling Graph Neural…

Machine Learning · Computer Science 2023-11-21 Yuan Lu , Haitz Sáez de Ocáriz Borde , Pietro Liò

Training Compact Models for Low Resource Entity Tagging using Pre-trained Language Models

Training models on low-resource named entity recognition tasks has been shown to be a challenge, especially in industrial applications where deploying updated models is a continuous effort and crucial for business operations. In such cases…

Computation and Language · Computer Science 2019-10-18 Peter Izsak , Shira Guskin , Moshe Wasserblat

CEMTM: Contextual Embedding-based Multimodal Topic Modeling

We introduce CEMTM, a context-enhanced multimodal topic model designed to infer coherent and interpretable topic structures from both short and long documents containing text and images. CEMTM builds on fine-tuned large vision language…

Computation and Language · Computer Science 2025-10-07 Amirhossein Abaskohi , Raymond Li , Chuyuan Li , Shafiq Joty , Giuseppe Carenini

Application of frozen large-scale models to multimodal task-oriented dialogue

In this study, we use the existing Large Language Models ENnhanced to See Framework (LENS Framework) to test the feasibility of multimodal task-oriented dialogues. The LENS Framework has been proposed as a method to solve computer vision…

Computation and Language · Computer Science 2023-10-03 Tatsuki Kawamoto , Takuma Suzuki , Ko Miyama , Takumi Meguro , Tomohiro Takagi