English
Related papers

Related papers: Adaptive Transformers for Learning Multimodal Repr…

200 papers

Attention mechanisms have become ubiquitous in NLP. Recent architectures, notably the Transformer, learn powerful context-aware word representations through layered, multi-headed attention. The multiple heads learn diverse types of word…

Computation and Language · Computer Science 2019-09-09 Gonçalo M. Correia , Vlad Niculae , André F. T. Martins

Transformer is a ubiquitous model for natural language processing and has attracted wide attentions in computer vision. The attention maps are indispensable for a transformer model to encode the dependencies among input tokens. However,…

Machine Learning · Computer Science 2021-02-26 Yujing Wang , Yaming Yang , Jiangang Bai , Mingliang Zhang , Jing Bai , Jing Yu , Ce Zhang , Gao Huang , Yunhai Tong

The state of the art in learning meaningful semantic representations of words is the Transformer model and its attention mechanisms. Simply put, the attention mechanisms learn to attend to specific parts of the input dispensing recurrence…

Computation and Language · Computer Science 2020-12-24 Dongsheng Wang , Casper Hansen , Lucas Chaves Lima , Christian Hansen , Maria Maistro , Jakob Grue Simonsen , Christina Lioma

Attention mechanisms represent a fundamental paradigm shift in neural network architectures, enabling models to selectively focus on relevant portions of input sequences through learned weighting functions. This monograph provides a…

Machine Learning · Computer Science 2026-01-08 Hasi Hays

Machine learning methods are emerging as a universal paradigm for constructing correlative structure-property relationships in materials science based on multimodal characterization. However, this necessitates development of methods for…

We conduct a systematic study of the approximation properties of Transformer for sequence modeling with long, sparse and complicated memory. We investigate the mechanisms through which different components of Transformer, such as the…

Machine Learning · Computer Science 2024-10-31 Mingze Wang , Weinan E

Recurrent neural networks are effective models to process sequences. However, they are unable to learn long-term dependencies because of their inherent sequential nature. As a solution, Vaswani et al. introduced the Transformer, a model…

Machine Learning · Computer Science 2023-03-28 Quentin Fournier , Gaétan Marceau Caron , Daniel Aloise

A growing intuition in machine learning suggests a link between sparsity and interpretability. We introduce a novel self-ablation mechanism to investigate this connection ante-hoc in the context of language transformers. Our approach…

Machine Learning · Computer Science 2025-05-02 Jeremias Ferrao , Luhan Mikaelson , Keenan Pepper , Natalia Perez-Campanero Antolin

This document provides a brief introduction to the attention mechanism used in modern language models based on the Transformer architecture. We first illustrate how text is encoded as vectors and how the attention mechanism processes these…

Numerical Analysis · Mathematics 2026-04-02 Michel Fabrice Serret

Language and vision-language models have shown impressive performance across a wide range of tasks, but their internal mechanisms remain only partly understood. In this work, we study how individual attention heads in text-generative models…

Computer Vision and Pattern Recognition · Computer Science 2026-01-15 Lorenzo Basile , Valentino Maiorca , Diego Doimo , Francesco Locatello , Alberto Cazzaniga

Recently multimodal transformer models have gained popularity because their performance on language and vision tasks suggest they learn rich visual-linguistic representations. Focusing on zero-shot image retrieval tasks, we study three…

Computation and Language · Computer Science 2021-02-02 Lisa Anne Hendricks , John Mellor , Rosalia Schneider , Jean-Baptiste Alayrac , Aida Nematzadeh

Transformers are widely used in natural language processing, where they consistently achieve state-of-the-art performance. This is mainly due to their attention-based architecture, which allows them to model rich linguistic relations…

Computation and Language · Computer Science 2022-11-29 Nikolaos Mylonas , Ioannis Mollas , Grigorios Tsoumakas

The multi-head self-attention mechanism of the transformer model has been thoroughly investigated recently. In one vein of study, researchers are interested in understanding why and how transformers work. In another vein, researchers…

Computation and Language · Computer Science 2022-10-28 Raymond Li , Wen Xiao , Linzi Xing , Lanjun Wang , Gabriel Murray , Giuseppe Carenini

Sparse neural networks are often hypothesized to be more interpretable than dense models, motivated by findings that weight sparsity can produce compact circuits in language models. However, it remains unclear whether structural sparsity…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Siyu Zhang

Initially developed for natural language processing (NLP), Transformer model is now widely used for speech processing tasks such as speaker recognition, due to its powerful sequence modeling capabilities. However, conventional…

Audio and Speech Processing · Electrical Eng. & Systems 2022-01-28 Rui Wang , Junyi Ao , Long Zhou , Shujie Liu , Zhihua Wei , Tom Ko , Qing Li , Yu Zhang

Recurrent Neural Networks were, until recently, one of the best ways to capture the timely dependencies in sequences. However, with the introduction of the Transformer, it has been proven that an architecture with only attention-mechanisms…

Machine Learning · Computer Science 2021-08-19 Radostin Cholakov , Todor Kolev

Transformers are built upon multi-head scaled dot-product attention and positional encoding, which aim to learn the feature representations and token dependencies. In this work, we focus on enhancing the distinctive representation by…

Computer Vision and Pattern Recognition · Computer Science 2022-07-12 Litao Yu , Jian Zhang

Neural networks are growing more capable on their own, but we do not understand their neural mechanisms. Understanding these mechanisms' decision-making processes, or mechanistic interpretability, enables (1) accountability and control in…

Computation and Language · Computer Science 2026-03-02 Mason Kadem , Rong Zheng

Transformer-based models have been achieving state-of-the-art results in several fields of Natural Language Processing. However, its direct application to speech tasks is not trivial. The nature of this sequences carries problems such as…

Computation and Language · Computer Science 2022-05-17 Gerard Sant , Gerard I. Gállego , Belen Alastruey , Marta R. Costa-Jussà

Although researchers' attention is more focused on the performance of Transformer models, the interpretation of Transformer can never be ignored. Gradient is widely utilized in Transformer interpretation. From the perspective of attention…

Artificial Intelligence · Computer Science 2026-05-13 Yongjin Cui , Xiaohui Fan , Huajun Chen
‹ Prev 1 2 3 10 Next ›