English
Related papers

Related papers: Foundation Transformers

200 papers

A big convergence of language, vision, and multimodal pretraining is emerging. In this work, we introduce a general-purpose multimodal foundation model BEiT-3, which achieves state-of-the-art transfer performance on both vision and…

Computer Vision and Pattern Recognition · Computer Science 2022-09-01 Wenhui Wang , Hangbo Bao , Li Dong , Johan Bjorck , Zhiliang Peng , Qiang Liu , Kriti Aggarwal , Owais Khan Mohammed , Saksham Singhal , Subhojit Som , Furu Wei

Over the past few decades, Artificial Intelligence(AI) has progressed from the initial machine learning stage to the deep learning stage, and now to the stage of foundational models. Foundational models have the characteristics of…

Computation and Language · Computer Science 2024-11-28 Lewen Yang , Xuanyu Zhou , Juao Fan , Xinyi Xie , Shengxin Zhu

Large-scale Transformer models have significantly promoted the recent development of natural language processing applications. However, little effort has been made to unify the effective models. In this paper, driven by providing a new set…

Computation and Language · Computer Science 2022-04-12 Dezhou Shen

Parameter-efficient fine-tuning approaches have recently garnered a lot of attention. Having considerably lower number of trainable weights, these methods can bring about scalability and computational effectiveness. In this paper, we look…

Computation and Language · Computer Science 2023-02-23 Mohammad Akbar-Tajari , Sara Rajaee , Mohammad Taher Pilehvar

Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks with different data modalities. A PFM (e.g., BERT, ChatGPT, and GPT-4) is trained on large-scale data which provides a reasonable parameter…

Large language models such as BERT and the GPT series started a paradigm shift that calls for building general-purpose models via pre-training on large datasets, followed by fine-tuning on task-specific datasets. There is now a plethora of…

Computation and Language · Computer Science 2023-06-13 Jeremy Gwinnup , Kevin Duh

Large pre-trained models, or foundation models, have shown impressive performance when adapted to a variety of downstream tasks, often out-performing specialized models. Hypernetworks, neural networks that generate some or all of the…

Machine Learning · Computer Science 2025-03-04 Jeffrey Gu , Serena Yeung-Levy

In this paper, we explore the possibility of building a unified foundation model that can be adapted to both vision-only and text-only tasks. Starting from BERT and ViT, we design a unified transformer consisting of modality-specific…

Computer Vision and Pattern Recognition · Computer Science 2021-12-15 Qing Li , Boqing Gong , Yin Cui , Dan Kondratyuk , Xianzhi Du , Ming-Hsuan Yang , Matthew Brown

Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning. In the field of natural language processing for example,…

Machine Learning · Computer Science 2022-03-15 Yi Tay , Mostafa Dehghani , Dara Bahri , Donald Metzler

Multilayer transformer networks consist of interleaved self-attention and feedforward sublayers. Could ordering the sublayers in a different pattern lead to better performance? We generate randomly ordered transformers and train them with…

Computation and Language · Computer Science 2020-04-24 Ofir Press , Noah A. Smith , Omer Levy

Transformer is the backbone of modern NLP models. In this paper, we propose RealFormer, a simple and generic technique to create Residual Attention Layer Transformer networks that significantly outperform the canonical Transformer and its…

Machine Learning · Computer Science 2021-09-14 Ruining He , Anirudh Ravula , Bhargav Kanagal , Joshua Ainslie

More transformer blocks with residual connections have recently achieved impressive results on various tasks. To achieve better performance with fewer trainable parameters, recent methods are proposed to go shallower by parameter sharing or…

Machine Learning · Computer Science 2021-09-08 Fuzhao Xue , Ziji Shi , Futao Wei , Yuxuan Lou , Yong Liu , Yang You

Inspired by the success of Transformer-based models in natural language processing, this paper investigates their potential as foundation models for network traffic analysis. We propose a unified pre-training and fine-tuning pipeline for…

Networking and Internet Architecture · Computer Science 2026-02-09 Samara Mayhoub , Chuan Heng Foh , Mahdi Boloursaz Mashhadi , Mohammad Shojafar , Rahim Tafazolli

Transfer learning with large pretrained transformer-based language models like BERT has become a dominating approach for most NLP tasks. Simply fine-tuning those large language models on downstream tasks or combining it with task-specific…

Computation and Language · Computer Science 2021-08-06 Wenjuan Han , Bo Pang , Yingnian Wu

Generative pretraining (the "GPT" in ChatGPT) enables language models to learn from vast amounts of internet text without human supervision. This approach has driven breakthroughs across AI by allowing deep neural networks to learn from…

Neurons and Cognition · Quantitative Biology 2025-09-23 Thomas Serre , Ellie Pavlick

Attention based language models have become a critical component in state-of-the-art natural language processing systems. However, these models have significant computational requirements, due to long training times, dense operations and…

Computation and Language · Computer Science 2021-06-11 Ivan Chelombiev , Daniel Justus , Douglas Orr , Anastasia Dietrich , Frithjof Gressmann , Alexandros Koliousis , Carlo Luschi

Recent advances in neural architectures, such as the Transformer, coupled with the emergence of large-scale pre-trained models such as BERT, have revolutionized the field of Natural Language Processing (NLP), pushing the state of the art…

Computation and Language · Computer Science 2021-09-24 Anton Chernyavskiy , Dmitry Ilvovsky , Preslav Nakov

Natural Language Processing (NLP) has witnessed a transformative leap with the advent of transformer-based architectures, which have significantly enhanced the ability of machines to understand and generate human-like text. This paper…

Computation and Language · Computer Science 2025-03-27 Tianhao Wu , Yu Wang , Ngoc Quach

Large Transformer-based language models are pre-trained on corpora of varying sizes, for a different number of steps and with different batch sizes. At the same time, more fundamental components, such as the pre-training objective or…

Computation and Language · Computer Science 2021-05-12 M. Aßenmacher , P. Schulze , C. Heumann

Fine-tuning pre-trained transformers is a powerful technique for enhancing the performance of base models on specific tasks. From early applications in models like BERT to fine-tuning Large Language Models (LLMs), this approach has been…

Computation and Language · Computer Science 2025-02-25 Suneel Nadipalli
‹ Prev 1 2 3 10 Next ›