Related papers: In-Context Compositional Learning via Sparse Codin…

Structural Deep Encoding for Table Question Answering

Although Transformers-based architectures excel at processing textual information, their naive adaptation for tabular data often involves flattening the table structure. This simplification can lead to the loss of essential…

Computation and Language · Computer Science 2025-03-04 Raphaël Mouravieff , Benjamin Piwowarski , Sylvain Lamprier

Sparse Fine-Tuning of Transformers for Generative Tasks

Large pre-trained transformers have revolutionized artificial intelligence across various domains, and fine-tuning remains the dominant approach for adapting these models to downstream tasks due to the cost of training from scratch.…

Computer Vision and Pattern Recognition · Computer Science 2025-07-16 Wei Chen , Jingxi Yu , Zichen Miao , Qiang Qiu

Attention as a Hypernetwork

Transformers can under some circumstances generalize to novel problem instances whose constituent parts might have been encountered during training, but whose compositions have not. What mechanisms underlie this ability for compositional…

Machine Learning · Computer Science 2025-02-18 Simon Schug , Seijin Kobayashi , Yassir Akram , João Sacramento , Razvan Pascanu

Sparse coding for multitask and transfer learning

We investigate the use of sparse coding and dictionary learning in the context of multitask and transfer learning. The central assumption of our learning method is that the tasks parameters are well approximated by sparse linear…

Machine Learning · Computer Science 2014-06-17 Andreas Maurer , Massimiliano Pontil , Bernardino Romera-Paredes

Learning Dictionaries with Bounded Self-Coherence

Sparse coding in learned dictionaries has been established as a successful approach for signal denoising, source separation and solving inverse problems in general. A dictionary learning method adapts an initial dictionary to a particular…

Machine Learning · Statistics 2012-10-18 Christian D. Sigg , Tomas Dikk , Joachim M. Buhmann

Extra Global Attention Designation Using Keyword Detection in Sparse Transformer Architectures

In this paper, we propose an extension to Longformer Encoder-Decoder, a popular sparse transformer architecture. One common challenge with sparse transformers is that they can struggle with encoding of long range context, such as…

Computation and Language · Computer Science 2024-10-14 Evan Lucas , Dylan Kangas , Timothy C Havens

Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection

Self-attention based Transformer has demonstrated the state-of-the-art performances in a number of natural language processing tasks. Self-attention is able to model long-term dependencies, but it may suffer from the extraction of…

Computation and Language · Computer Science 2019-12-30 Guangxiang Zhao , Junyang Lin , Zhiyuan Zhang , Xuancheng Ren , Qi Su , Xu Sun

Contextualized word senses: from attention to compositionality

The neural architectures of language models are becoming increasingly complex, especially that of Transformers, based on the attention mechanism. Although their application to numerous natural language processing tasks has proven to be very…

Computation and Language · Computer Science 2023-12-04 Pablo Gamallo

SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision

Selective attention helps us focus on task-relevant aspects in the constant flood of our sensory input. This constraint in our perception allows us to robustly generalize under distractions and to new compositions of perceivable concepts.…

Computer Vision and Pattern Recognition · Computer Science 2024-09-17 Ankit Vani , Bac Nguyen , Samuel Lavoie , Ranjay Krishna , Aaron Courville

Sparse Coding on Cascaded Residuals

This paper seeks to combine dictionary learning and hierarchical image representation in a principled way. To make dictionary atoms capturing additional information from extended receptive fields and attain improved descriptive capacity, we…

Computer Vision and Pattern Recognition · Computer Science 2019-11-11 Tong Zhang , Fatih Porikli

Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs

To apply neural sequence models such as the Transformers to music generation tasks, one has to represent a piece of music by a sequence of tokens drawn from a finite set of pre-defined vocabulary. Such a vocabulary usually involves tokens…

Sound · Computer Science 2021-01-08 Wen-Yi Hsiao , Jen-Yu Liu , Yin-Cheng Yeh , Yi-Hsuan Yang

Music Transformer

Music relies heavily on repetition to build structure and meaning. Self-reference occurs on multiple timescales, from motifs to phrases to reusing of entire sections of music, such as in pieces with ABA structure. The Transformer (Vaswani…

Machine Learning · Computer Science 2018-12-13 Cheng-Zhi Anna Huang , Ashish Vaswani , Jakob Uszkoreit , Noam Shazeer , Ian Simon , Curtis Hawthorne , Andrew M. Dai , Matthew D. Hoffman , Monica Dinculescu , Douglas Eck

Supervised learning of sparse context reconstruction coefficients for data representation and classification

Context of data points, which is usually defined as the other data points in a data set, has been found to play important roles in data representation and classification. In this paper, we study the problem of using context of a data point…

Machine Learning · Computer Science 2015-08-19 Xuejie Liu , Jingbin Wang , Ming Yin , Benjamin Edwards , Peijuan Xu

Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks

Many machine learning tasks such as multiple instance learning, 3D shape recognition, and few-shot image classification are defined on sets of instances. Since solutions to such problems do not depend on the order of elements of the set,…

Machine Learning · Computer Science 2019-05-28 Juho Lee , Yoonho Lee , Jungtaek Kim , Adam R. Kosiorek , Seungjin Choi , Yee Whye Teh

Folded Context Condensation in Path Integral Formalism for Infinite Context Transformers

In this work, we present a generalized formulation of the Transformer algorithm by reinterpreting its core mechanisms within the framework of Path Integral formalism. In this perspective, the attention mechanism is recast as a process that…

High Energy Physics - Phenomenology · Physics 2025-05-02 Won-Gi Paeng , Daesuk Kwon , Kyungwon Jeong , Honggyo Suh

Longer Attention Span: Increasing Transformer Context Length with Sparse Graph Processing Techniques

Transformers have demonstrated great success in numerous domains including natural language processing and bioinformatics. This success stems from the use of the attention mechanism by these models in order to represent and propagate…

Machine Learning · Computer Science 2025-02-10 Nathaniel Tomczak , Sanmukh Kuppannagari

Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency

In this paper, we show that a simple self-supervised pre-trained audio model can achieve comparable inference efficiency to more complicated pre-trained models with speech transformer encoders. These speech transformers rely on mixing…

Sound · Computer Science 2024-02-09 Sungho Jeon , Ching-Feng Yeh , Hakan Inan , Wei-Ning Hsu , Rashi Rungta , Yashar Mehdad , Daniel Bikel

Shift-Invariance Sparse Coding for Audio Classification

Sparse coding is an unsupervised learning algorithm that learns a succinct high-level representation of the inputs given only unlabeled data; it represents each input as a sparse linear combination of a set of basis functions. Originally…

Machine Learning · Computer Science 2012-06-26 Roger Grosse , Rajat Raina , Helen Kwong , Andrew Y. Ng

Sparse Coding Representation of 2-way Data

Sparse dictionary coding represents signals as linear combinations of a few dictionary atoms. It has been applied to images, time series, graph signals and multi-way spatio-temporal data by jointly employing temporal and spatial…

Machine Learning · Computer Science 2025-09-15 Boya Ma , Abram Magner , Maxwell McNeil , Petko Bogdanov

Sparsity and Sentence Structure in Encoder-Decoder Attention of Summarization Systems

Transformer models have achieved state-of-the-art results in a wide range of NLP tasks including summarization. Training and inference using large transformer models can be computationally expensive. Previous work has focused on one…

Computation and Language · Computer Science 2021-09-10 Potsawee Manakul , Mark J. F. Gales