English
Related papers

Related papers: Language Models are General-Purpose Interfaces

200 papers

Large pretrained Transformer language models have been shown to exhibit zero-shot generalization, i.e. they can perform a wide variety of tasks that they were not explicitly trained on. However, the architectures and pretraining objectives…

Computation and Language · Computer Science 2022-04-13 Thomas Wang , Adam Roberts , Daniel Hesslow , Teven Le Scao , Hyung Won Chung , Iz Beltagy , Julien Launay , Colin Raffel

Language model (LM) pre-training is useful in many language processing tasks. But can pre-trained LMs be further leveraged for more general machine learning problems? We propose an approach for using LMs to scaffold learning and…

Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world. The complex relations between objects and their locations, ambiguities, and variations in the real-world…

Computer Vision and Pattern Recognition · Computer Science 2023-07-27 Muhammad Awais , Muzammal Naseer , Salman Khan , Rao Muhammad Anwer , Hisham Cholakkal , Mubarak Shah , Ming-Hsuan Yang , Fahad Shahbaz Khan

We propose that small pretrained foundational generative language models with millions of parameters can be utilized as a general learning framework for sequence-based tasks. Our proposal overcomes the computational resource, skill set, and…

Computation and Language · Computer Science 2024-02-09 Ben Fauber

Foundation models or pre-trained models have substantially improved the performance of various language, vision, and vision-language understanding tasks. However, existing foundation models can only perform the best in one type of tasks,…

Computer Vision and Pattern Recognition · Computer Science 2023-10-18 Xinsong Zhang , Yan Zeng , Jipeng Zhang , Hang Li

Foundation models pretrained on diverse data at scale have demonstrated extraordinary capabilities in a wide range of vision and language tasks. When such models are deployed in real world environments, they inevitably interface with other…

Artificial Intelligence · Computer Science 2023-03-08 Sherry Yang , Ofir Nachum , Yilun Du , Jason Wei , Pieter Abbeel , Dale Schuurmans

We survey applications of pretrained foundation models in robotics. Traditional deep learning models in robotics are trained on small datasets tailored for specific tasks, which limits their adaptability across diverse applications. In…

We propose an efficient method to ground pretrained text-only language models to the visual domain, enabling them to process arbitrarily interleaved image-and-text data, and generate text interleaved with retrieved images. Our method…

Computation and Language · Computer Science 2023-06-16 Jing Yu Koh , Ruslan Salakhutdinov , Daniel Fried

Vision models trained on multimodal datasets can benefit from the wide availability of large image-caption datasets. A recent model (CLIP) was found to generalize well in zero-shot and transfer learning settings. This could imply that…

Artificial Intelligence · Computer Science 2021-09-16 Benjamin Devillers , Bhavin Choksi , Romain Bielawski , Rufin VanRullen

The recent surge of generative AI has been fueled by the generative power of diffusion probabilistic models and the scalable capabilities of large language models. Despite their potential, it remains elusive whether diffusion language…

Computation and Language · Computer Science 2025-02-25 Jiasheng Ye , Zaixiang Zheng , Yu Bao , Lihua Qian , Quanquan Gu

Language models now provide an interface to express and often solve general problems in natural language, yet their ultimate computational capabilities remain a major topic of scientific debate. Unlike a formal computer, a language model is…

Computation and Language · Computer Science 2026-02-11 Alex Lewandowski , Marlos C. Machado , Dale Schuurmans

Language modeling studies the probability distributions over strings of texts. It is one of the most fundamental tasks in natural language processing (NLP). It has been widely used in text generation, speech recognition, machine…

Computation and Language · Computer Science 2024-07-18 Chengwei Wei , Yun-Cheng Wang , Bin Wang , C. -C. Jay Kuo

Diffusion models have achieved great success in modeling continuous data modalities such as images, audio, and video, but have seen limited use in discrete domains such as language. Recent attempts to adapt diffusion to language have…

Computation and Language · Computer Science 2023-11-08 Justin Lovelace , Varsha Kishore , Chao Wan , Eliot Shekhtman , Kilian Q. Weinberger

The ratio of outlier parameters in language pre-training models and vision pre-training models differs significantly, making cross-modality (language and vision) inherently more challenging than cross-domain adaptation. As a result, many…

Computer Vision and Pattern Recognition · Computer Science 2026-04-06 Yaxin Luo , Zhiqiang Shen

State-of-the-art vision and vision-and-language models rely on large-scale visio-linguistic pretraining for obtaining good performance on a variety of downstream tasks. Generally, such models are often either cross-modal (contrastive) or…

Computer Vision and Pattern Recognition · Computer Science 2022-03-31 Amanpreet Singh , Ronghang Hu , Vedanuj Goswami , Guillaume Couairon , Wojciech Galuba , Marcus Rohrbach , Douwe Kiela

In the large language model (LLM) revolution, embedding is a key component of various systems, such as retrieving knowledge or memories for LLMs or building content moderation filters. As such cases span from English to other natural or…

Computation and Language · Computer Science 2025-05-23 Xin Zhang , Zehan Li , Yanzhao Zhang , Dingkun Long , Pengjun Xie , Meishan Zhang , Min Zhang

We explore the use of large pretrained language models as few-shot semantic parsers. The goal in semantic parsing is to generate a structured meaning representation given a natural language input. However, language models are trained to…

A big convergence of language, vision, and multimodal pretraining is emerging. In this work, we introduce a general-purpose multimodal foundation model BEiT-3, which achieves state-of-the-art transfer performance on both vision and…

Computer Vision and Pattern Recognition · Computer Science 2022-09-01 Wenhui Wang , Hangbo Bao , Li Dong , Johan Bjorck , Zhiliang Peng , Qiang Liu , Kriti Aggarwal , Owais Khan Mohammed , Saksham Singhal , Subhojit Som , Furu Wei

In the growing domain of scientific machine learning, in-context operator learning has shown notable potential in building foundation models, as in this framework the model is trained to learn operators and solve differential equations…

Machine Learning · Computer Science 2024-02-02 Liu Yang , Siting Liu , Stanley J. Osher

Vision-language fine-tuning has emerged as an efficient paradigm for constructing multimodal foundation models. While textual context often highlights semantic relationships within an image, existing fine-tuning methods typically overlook…

Computer Vision and Pattern Recognition · Computer Science 2025-11-14 Xiangyang Wu , Liu Liu , Baosheng Yu , Jiayan Qiu , Zhenwei Shi
‹ Prev 1 2 3 10 Next ›