English
Related papers

Related papers: Foundation Model is Efficient Multimodal Multitask…

200 papers

Intermediate task transfer learning can greatly improve model performance. If, for example, one has little training data for emotion detection, first fine-tuning a language model on a sentiment classification dataset may improve performance…

Computation and Language · Computer Science 2024-10-22 David Schulte , Felix Hamborg , Alan Akbik

Massively multilingual sentence representation models, e.g., LASER, SBERT-distill, and LaBSE, help significantly improve cross-lingual downstream tasks. However, the use of a large amount of data or inefficient model architectures results…

Computation and Language · Computer Science 2024-05-31 Zhuoyuan Mao , Chenhui Chu , Sadao Kurohashi

Foundation models have emerged as a powerful tool for many AI problems. Despite the tremendous success of foundation models, effective adaptation to new tasks, particularly those with limited labels, remains an open question and lacks…

Machine Learning · Computer Science 2024-02-26 Zhuoyan Xu , Zhenmei Shi , Junyi Wei , Fangzhou Mu , Yin Li , Yingyu Liang

We propose a framework that learns a representation transferable across different domains and tasks in a label efficient manner. Our approach battles domain shift with a domain adversarial loss, and generalizes the embedding to novel task…

Machine Learning · Statistics 2017-12-04 Zelun Luo , Yuliang Zou , Judy Hoffman , Li Fei-Fei

Large-scale pre-training followed by downstream fine-tuning is an effective solution for transferring deep-learning-based models. Since finetuning all possible pre-trained models is computational costly, we aim to predict the…

Computer Vision and Pattern Recognition · Computer Science 2024-03-15 Zhao Wang , Aoxue Li , Zhenguo Li , Qi Dou

Multi-task learning in text classification leverages implicit correlations among related tasks to extract common features and yield performance gains. However, most previous works treat labels of each task as independent and meaningless…

Computation and Language · Computer Science 2017-10-20 Honglun Zhang , Liqiang Xiao , Wenqing Chen , Yongkun Wang , Yaohui Jin

Scanning Electron Microscopy (SEM) is indispensable in modern materials science, enabling high-resolution imaging across a wide range of structural, chemical, and functional investigations. However, SEM imaging remains constrained by…

Tabular Foundation Models have recently established the state of the art in supervised tabular learning, by leveraging pretraining to learn generalizable representations of numerical and categorical structured data. However, they lack…

Despite the exceptional reasoning capabilities of Multimodal Large Language Models (MLLMs), their adaptation into universal embedding models is significantly impeded by task conflict. To address this, we propose TSEmbed, a universal…

Computation and Language · Computer Science 2026-03-06 Yebo Wu , Feng Liu , Ziwei Xie , Zhiyuan Liu , Changwang Zhang , Jun Wang , Li Li

Foundation models have revolutionized general-purpose problem-solving, offering rapid task adaptation through pretraining, meta-training, and finetuning. Recent crucial advances in these paradigms reveal the importance of challenging task…

Machine Learning · Computer Science 2025-10-21 Qi Wang , Zehao Xiao , Yixiu Mao , Yun Qu , Jiayi Shen , Yiqin Lv , Xiangyang Ji

Recent advancements in machine learning (ML), natural language processing (NLP), and foundational models have shown promise for real-life applications in critical, albeit compute-constrainted fields like healthcare. In such areas, combining…

Machine Learning · Computer Science 2025-02-05 Georgios Margaritis , Periklis Petridis , Dimitris J. Bertsimas

Embodied agents operating in the physical world must make decisions that are not only effective but also safe, spatially coherent, and grounded in context. While recent advances in large multimodal models (LMMs) have shown promising…

Computer Vision and Pattern Recognition · Computer Science 2025-09-23 Dinura Dissanayake , Ahmed Heakl , Omkar Thawakar , Noor Ahsan , Ritesh Thawkar , Ketan More , Jean Lahoud , Rao Anwer , Hisham Cholakkal , Ivan Laptev , Fahad Shahbaz Khan , Salman Khan

In recent years, Multi-modal Foundation Models (MFMs) and Embodied Artificial Intelligence (EAI) have been advancing side by side at an unprecedented pace. The integration of the two has garnered significant attention from the AI research…

Artificial Intelligence · Computer Science 2024-10-08 Min Zhang , Xian Fu , Jianye Hao , Peilong Han , Hao Zhang , Lei Shi , Hongyao Tang , Yan Zheng

We present Emu, a Transformer-based multimodal foundation model, which can seamlessly generate images and texts in multimodal context. This omnivore model can take in any single-modality or multimodal data input indiscriminately (e.g.,…

Computer Vision and Pattern Recognition · Computer Science 2024-05-09 Quan Sun , Qiying Yu , Yufeng Cui , Fan Zhang , Xiaosong Zhang , Yueze Wang , Hongcheng Gao , Jingjing Liu , Tiejun Huang , Xinlong Wang

Transfer learning is a critical technique in training deep neural networks for the challenging medical image segmentation task that requires enormous resources. With the abundance of medical image data, many research institutions release…

Computer Vision and Pattern Recognition · Computer Science 2023-07-25 Yuncheng Yang , Meng Wei , Junjun He , Jie Yang , Jin Ye , Yun Gu

Text embeddings are typically evaluated on a limited set of tasks, which are constrained by language, domain, and task diversity. To address these limitations and provide a more comprehensive evaluation, we introduce the Massive…

In real-world scenarios, although data entities may possess inherent relationships, the specific graph illustrating their connections might not be directly accessible. Latent graph inference addresses this issue by enabling Graph Neural…

Machine Learning · Computer Science 2023-11-21 Yuan Lu , Haitz Sáez de Ocáriz Borde , Pietro Liò

Training models on low-resource named entity recognition tasks has been shown to be a challenge, especially in industrial applications where deploying updated models is a continuous effort and crucial for business operations. In such cases…

Computation and Language · Computer Science 2019-10-18 Peter Izsak , Shira Guskin , Moshe Wasserblat

We introduce CEMTM, a context-enhanced multimodal topic model designed to infer coherent and interpretable topic structures from both short and long documents containing text and images. CEMTM builds on fine-tuned large vision language…

Computation and Language · Computer Science 2025-10-07 Amirhossein Abaskohi , Raymond Li , Chuyuan Li , Shafiq Joty , Giuseppe Carenini

In this study, we use the existing Large Language Models ENnhanced to See Framework (LENS Framework) to test the feasibility of multimodal task-oriented dialogues. The LENS Framework has been proposed as a method to solve computer vision…

Computation and Language · Computer Science 2023-10-03 Tatsuki Kawamoto , Takuma Suzuki , Ko Miyama , Takumi Meguro , Tomohiro Takagi
‹ Prev 1 2 3 10 Next ›