Related papers: Language Models are General-Purpose Interfaces

What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?

Large pretrained Transformer language models have been shown to exhibit zero-shot generalization, i.e. they can perform a wide variety of tasks that they were not explicitly trained on. However, the architectures and pretraining objectives…

Computation and Language · Computer Science 2022-04-13 Thomas Wang , Adam Roberts , Daniel Hesslow , Teven Le Scao , Hyung Won Chung , Iz Beltagy , Julien Launay , Colin Raffel

Pre-Trained Language Models for Interactive Decision-Making

Language model (LM) pre-training is useful in many language processing tasks. But can pre-trained LMs be further leveraged for more general machine learning problems? We propose an approach for using LMs to scaffold learning and…

Machine Learning · Computer Science 2022-11-01 Shuang Li , Xavier Puig , Chris Paxton , Yilun Du , Clinton Wang , Linxi Fan , Tao Chen , De-An Huang , Ekin Akyürek , Anima Anandkumar , Jacob Andreas , Igor Mordatch , Antonio Torralba , Yuke Zhu

Foundational Models Defining a New Era in Vision: A Survey and Outlook

Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world. The complex relations between objects and their locations, ambiguities, and variations in the real-world…

Computer Vision and Pattern Recognition · Computer Science 2023-07-27 Muhammad Awais , Muzammal Naseer , Salman Khan , Rao Muhammad Anwer , Hisham Cholakkal , Mubarak Shah , Ming-Hsuan Yang , Fahad Shahbaz Khan

Pretrained Generative Language Models as General Learning Frameworks for Sequence-Based Tasks

We propose that small pretrained foundational generative language models with millions of parameters can be utilized as a general learning framework for sequence-based tasks. Our proposal overcomes the computational resource, skill set, and…

Computation and Language · Computer Science 2024-02-09 Ben Fauber

Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks

Foundation models or pre-trained models have substantially improved the performance of various language, vision, and vision-language understanding tasks. However, existing foundation models can only perform the best in one type of tasks,…

Computer Vision and Pattern Recognition · Computer Science 2023-10-18 Xinsong Zhang , Yan Zeng , Jipeng Zhang , Hang Li

Foundation Models for Decision Making: Problems, Methods, and Opportunities

Foundation models pretrained on diverse data at scale have demonstrated extraordinary capabilities in a wide range of vision and language tasks. When such models are deployed in real world environments, they inevitably interface with other…

Artificial Intelligence · Computer Science 2023-03-08 Sherry Yang , Ofir Nachum , Yilun Du , Jason Wei , Pieter Abbeel , Dale Schuurmans

Foundation Models in Robotics: Applications, Challenges, and the Future

We survey applications of pretrained foundation models in robotics. Traditional deep learning models in robotics are trained on small datasets tailored for specific tasks, which limits their adaptability across diverse applications. In…

Robotics · Computer Science 2023-12-14 Roya Firoozi , Johnathan Tucker , Stephen Tian , Anirudha Majumdar , Jiankai Sun , Weiyu Liu , Yuke Zhu , Shuran Song , Ashish Kapoor , Karol Hausman , Brian Ichter , Danny Driess , Jiajun Wu , Cewu Lu , Mac Schwager

Grounding Language Models to Images for Multimodal Inputs and Outputs

We propose an efficient method to ground pretrained text-only language models to the visual domain, enabling them to process arbitrarily interleaved image-and-text data, and generate text interleaved with retrieved images. Our method…

Computation and Language · Computer Science 2023-06-16 Jing Yu Koh , Ruslan Salakhutdinov , Daniel Fried

Does language help generalization in vision models?

Vision models trained on multimodal datasets can benefit from the wide availability of large image-caption datasets. A recent model (CLIP) was found to generalize well in zero-shot and transfer learning settings. This could imply that…

Artificial Intelligence · Computer Science 2021-09-16 Benjamin Devillers , Bhavin Choksi , Romain Bielawski , Rufin VanRullen

Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning

The recent surge of generative AI has been fueled by the generative power of diffusion probabilistic models and the scalable capabilities of large language models. Despite their potential, it remains elusive whether diffusion language…

Computation and Language · Computer Science 2025-02-25 Jiasheng Ye , Zaixiang Zheng , Yu Bao , Lihua Qian , Quanquan Gu

Universal computation is intrinsic to language model decoding

Language models now provide an interface to express and often solve general problems in natural language, yet their ultimate computational capabilities remain a major topic of scientific debate. Unlike a formal computer, a language model is…

Computation and Language · Computer Science 2026-02-11 Alex Lewandowski , Marlos C. Machado , Dale Schuurmans

An Overview on Language Models: Recent Developments and Outlook

Language modeling studies the probability distributions over strings of texts. It is one of the most fundamental tasks in natural language processing (NLP). It has been widely used in text generation, speech recognition, machine…

Computation and Language · Computer Science 2024-07-18 Chengwei Wei , Yun-Cheng Wang , Bin Wang , C. -C. Jay Kuo

Latent Diffusion for Language Generation

Diffusion models have achieved great success in modeling continuous data modalities such as images, audio, and video, but have seen limited use in discrete domains such as language. Recent attempts to adapt diffusion to language have…

Computation and Language · Computer Science 2023-11-08 Justin Lovelace , Varsha Kishore , Chao Wan , Eliot Shekhtman , Kilian Q. Weinberger

Language-Pretraining-Induced Bias: A Strong Foundation for General Vision Tasks

The ratio of outlier parameters in language pre-training models and vision pre-training models differs significantly, making cross-modality (language and vision) inherently more challenging than cross-domain adaptation. As a result, many…

Computer Vision and Pattern Recognition · Computer Science 2026-04-06 Yaxin Luo , Zhiqiang Shen

FLAVA: A Foundational Language And Vision Alignment Model

State-of-the-art vision and vision-and-language models rely on large-scale visio-linguistic pretraining for obtaining good performance on a variety of downstream tasks. Generally, such models are often either cross-modal (contrastive) or…

Computer Vision and Pattern Recognition · Computer Science 2022-03-31 Amanpreet Singh , Ronghang Hu , Vedanuj Goswami , Guillaume Couairon , Wojciech Galuba , Marcus Rohrbach , Douwe Kiela

Language Models are Universal Embedders

In the large language model (LLM) revolution, embedding is a key component of various systems, such as retrieving knowledge or memories for LLMs or building content moderation filters. As such cases span from English to other natural or…

Computation and Language · Computer Science 2025-05-23 Xin Zhang , Zehan Li , Yanzhao Zhang , Dingkun Long , Pengjun Xie , Meishan Zhang , Min Zhang

Constrained Language Models Yield Few-Shot Semantic Parsers

We explore the use of large pretrained language models as few-shot semantic parsers. The goal in semantic parsing is to generate a structured meaning representation given a natural language input. However, language models are trained to…

Computation and Language · Computer Science 2021-11-18 Richard Shin , Christopher H. Lin , Sam Thomson , Charles Chen , Subhro Roy , Emmanouil Antonios Platanios , Adam Pauls , Dan Klein , Jason Eisner , Benjamin Van Durme

Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks

A big convergence of language, vision, and multimodal pretraining is emerging. In this work, we introduce a general-purpose multimodal foundation model BEiT-3, which achieves state-of-the-art transfer performance on both vision and…

Computer Vision and Pattern Recognition · Computer Science 2022-09-01 Wenhui Wang , Hangbo Bao , Li Dong , Johan Bjorck , Zhiliang Peng , Qiang Liu , Kriti Aggarwal , Owais Khan Mohammed , Saksham Singhal , Subhojit Som , Furu Wei

Fine-Tune Language Models as Multi-Modal Differential Equation Solvers

In the growing domain of scientific machine learning, in-context operator learning has shown notable potential in building foundation models, as in this framework the model is trained to learn operators and solve differential equations…

Machine Learning · Computer Science 2024-02-02 Liu Yang , Siting Liu , Stanley J. Osher

Remodeling Semantic Relationships in Vision-Language Fine-Tuning

Vision-language fine-tuning has emerged as an efficient paradigm for constructing multimodal foundation models. While textual context often highlights semantic relationships within an image, existing fine-tuning methods typically overlook…

Computer Vision and Pattern Recognition · Computer Science 2025-11-14 Xiangyang Wu , Liu Liu , Baosheng Yu , Jiayan Qiu , Zhenwei Shi