English
Related papers

Related papers: In-Context Compositional Learning via Sparse Codin…

200 papers

Although Transformers-based architectures excel at processing textual information, their naive adaptation for tabular data often involves flattening the table structure. This simplification can lead to the loss of essential…

Computation and Language · Computer Science 2025-03-04 Raphaël Mouravieff , Benjamin Piwowarski , Sylvain Lamprier

Large pre-trained transformers have revolutionized artificial intelligence across various domains, and fine-tuning remains the dominant approach for adapting these models to downstream tasks due to the cost of training from scratch.…

Computer Vision and Pattern Recognition · Computer Science 2025-07-16 Wei Chen , Jingxi Yu , Zichen Miao , Qiang Qiu

Transformers can under some circumstances generalize to novel problem instances whose constituent parts might have been encountered during training, but whose compositions have not. What mechanisms underlie this ability for compositional…

Machine Learning · Computer Science 2025-02-18 Simon Schug , Seijin Kobayashi , Yassir Akram , João Sacramento , Razvan Pascanu

We investigate the use of sparse coding and dictionary learning in the context of multitask and transfer learning. The central assumption of our learning method is that the tasks parameters are well approximated by sparse linear…

Machine Learning · Computer Science 2014-06-17 Andreas Maurer , Massimiliano Pontil , Bernardino Romera-Paredes

Sparse coding in learned dictionaries has been established as a successful approach for signal denoising, source separation and solving inverse problems in general. A dictionary learning method adapts an initial dictionary to a particular…

Machine Learning · Statistics 2012-10-18 Christian D. Sigg , Tomas Dikk , Joachim M. Buhmann

In this paper, we propose an extension to Longformer Encoder-Decoder, a popular sparse transformer architecture. One common challenge with sparse transformers is that they can struggle with encoding of long range context, such as…

Computation and Language · Computer Science 2024-10-14 Evan Lucas , Dylan Kangas , Timothy C Havens

Self-attention based Transformer has demonstrated the state-of-the-art performances in a number of natural language processing tasks. Self-attention is able to model long-term dependencies, but it may suffer from the extraction of…

Computation and Language · Computer Science 2019-12-30 Guangxiang Zhao , Junyang Lin , Zhiyuan Zhang , Xuancheng Ren , Qi Su , Xu Sun

The neural architectures of language models are becoming increasingly complex, especially that of Transformers, based on the attention mechanism. Although their application to numerous natural language processing tasks has proven to be very…

Computation and Language · Computer Science 2023-12-04 Pablo Gamallo

Selective attention helps us focus on task-relevant aspects in the constant flood of our sensory input. This constraint in our perception allows us to robustly generalize under distractions and to new compositions of perceivable concepts.…

Computer Vision and Pattern Recognition · Computer Science 2024-09-17 Ankit Vani , Bac Nguyen , Samuel Lavoie , Ranjay Krishna , Aaron Courville

This paper seeks to combine dictionary learning and hierarchical image representation in a principled way. To make dictionary atoms capturing additional information from extended receptive fields and attain improved descriptive capacity, we…

Computer Vision and Pattern Recognition · Computer Science 2019-11-11 Tong Zhang , Fatih Porikli

To apply neural sequence models such as the Transformers to music generation tasks, one has to represent a piece of music by a sequence of tokens drawn from a finite set of pre-defined vocabulary. Such a vocabulary usually involves tokens…

Sound · Computer Science 2021-01-08 Wen-Yi Hsiao , Jen-Yu Liu , Yin-Cheng Yeh , Yi-Hsuan Yang

Music relies heavily on repetition to build structure and meaning. Self-reference occurs on multiple timescales, from motifs to phrases to reusing of entire sections of music, such as in pieces with ABA structure. The Transformer (Vaswani…

Context of data points, which is usually defined as the other data points in a data set, has been found to play important roles in data representation and classification. In this paper, we study the problem of using context of a data point…

Machine Learning · Computer Science 2015-08-19 Xuejie Liu , Jingbin Wang , Ming Yin , Benjamin Edwards , Peijuan Xu

Many machine learning tasks such as multiple instance learning, 3D shape recognition, and few-shot image classification are defined on sets of instances. Since solutions to such problems do not depend on the order of elements of the set,…

Machine Learning · Computer Science 2019-05-28 Juho Lee , Yoonho Lee , Jungtaek Kim , Adam R. Kosiorek , Seungjin Choi , Yee Whye Teh

In this work, we present a generalized formulation of the Transformer algorithm by reinterpreting its core mechanisms within the framework of Path Integral formalism. In this perspective, the attention mechanism is recast as a process that…

High Energy Physics - Phenomenology · Physics 2025-05-02 Won-Gi Paeng , Daesuk Kwon , Kyungwon Jeong , Honggyo Suh

Transformers have demonstrated great success in numerous domains including natural language processing and bioinformatics. This success stems from the use of the attention mechanism by these models in order to represent and propagate…

Machine Learning · Computer Science 2025-02-10 Nathaniel Tomczak , Sanmukh Kuppannagari

In this paper, we show that a simple self-supervised pre-trained audio model can achieve comparable inference efficiency to more complicated pre-trained models with speech transformer encoders. These speech transformers rely on mixing…

Sound · Computer Science 2024-02-09 Sungho Jeon , Ching-Feng Yeh , Hakan Inan , Wei-Ning Hsu , Rashi Rungta , Yashar Mehdad , Daniel Bikel

Sparse coding is an unsupervised learning algorithm that learns a succinct high-level representation of the inputs given only unlabeled data; it represents each input as a sparse linear combination of a set of basis functions. Originally…

Machine Learning · Computer Science 2012-06-26 Roger Grosse , Rajat Raina , Helen Kwong , Andrew Y. Ng

Sparse dictionary coding represents signals as linear combinations of a few dictionary atoms. It has been applied to images, time series, graph signals and multi-way spatio-temporal data by jointly employing temporal and spatial…

Machine Learning · Computer Science 2025-09-15 Boya Ma , Abram Magner , Maxwell McNeil , Petko Bogdanov

Transformer models have achieved state-of-the-art results in a wide range of NLP tasks including summarization. Training and inference using large transformer models can be computationally expensive. Previous work has focused on one…

Computation and Language · Computer Science 2021-09-10 Potsawee Manakul , Mark J. F. Gales
‹ Prev 1 2 3 10 Next ›