English
Related papers

Related papers: IOT: Instance-wise Layer Reordering for Transforme…

200 papers

Multilayer transformer networks consist of interleaved self-attention and feedforward sublayers. Could ordering the sublayers in a different pattern lead to better performance? We generate randomly ordered transformers and train them with…

Computation and Language · Computer Science 2020-04-24 Ofir Press , Noah A. Smith , Omer Levy

Transformer models have established new benchmarks in natural language processing; however, their increasing depth results in substantial growth in parameter counts. While existing recurrent transformer methods address this issue by…

Computation and Language · Computer Science 2025-05-27 Anthony Nguyen , Wenjun Lin

Transformer-based models have demonstrated remarkable in-context learning capabilities, prompting extensive research into its underlying mechanisms. Recent studies have suggested that Transformers can implement first-order optimization…

Machine Learning · Computer Science 2024-03-06 Angeliki Giannou , Liu Yang , Tianhao Wang , Dimitris Papailiopoulos , Jason D. Lee

Transformers excel at in-context learning (ICL) -- learning from demonstrations without parameter updates -- but how they do so remains a mystery. Recent work suggests that Transformers may internally run Gradient Descent (GD), a…

Machine Learning · Computer Science 2024-11-19 Deqing Fu , Tian-Qi Chen , Robin Jia , Vatsal Sharan

In Transformer-based neural machine translation (NMT), the positional encoding mechanism helps the self-attention networks to learn the source representation with order dependency, which makes the Transformer-based NMT achieve…

Computation and Language · Computer Science 2020-04-09 Kehai Chen , Rui Wang , Masao Utiyama , Eiichiro Sumita

Transformer is a ubiquitous model for natural language processing and has attracted wide attentions in computer vision. The attention maps are indispensable for a transformer model to encode the dependencies among input tokens. However,…

Machine Learning · Computer Science 2021-02-26 Yujing Wang , Yaming Yang , Jiangang Bai , Mingliang Zhang , Jing Bai , Jing Yu , Ce Zhang , Gao Huang , Yunhai Tong

Due to their architecture and how they are trained, artificial neural networks are typically not robust toward pruning or shuffling layers at test time. However, such properties would be desirable for different applications, such as…

Computer Vision and Pattern Recognition · Computer Science 2024-12-09 Matthias Freiberger , Peter Kun , Anders Sundnes Løvlie , Sebastian Risi

Recent research has demonstrated that transformers, particularly linear attention models, implicitly execute gradient-descent-like algorithms on data provided in-context during their forward inference step. However, their capability in…

Machine Learning · Computer Science 2024-10-31 Max Vladymyrov , Johannes von Oswald , Mark Sandler , Rong Ge

Transformers have become the dominant architecture for sequence modeling tasks such as natural language processing or audio processing, and they are now even considered for tasks that are not naturally sequential such as image…

Machine Learning · Computer Science 2024-03-05 Jorg Bornschein , Yazhe Li , Amal Rannen-Triki

Transformers are arguably the main workhorse in recent Natural Language Processing research. By definition a Transformer is invariant with respect to reordering of the input. However, language is inherently sequential and word order is…

Computation and Language · Computer Science 2021-09-10 Philipp Dufter , Martin Schmitt , Hinrich Schütze

The Transformer is an extremely powerful and prominent deep learning architecture. In this work, we challenge the commonly held belief in deep learning that going deeper is better, and show an alternative design approach that is building…

Machine Learning · Computer Science 2022-11-10 Jason Ross Brown , Yiren Zhao , Ilia Shumailov , Robert D Mullins

Transformers have become one of the dominant architectures in deep learning, particularly as a powerful alternative to convolutional neural networks (CNNs) in computer vision. However, Transformer training and inference in previous works…

Computer Vision and Pattern Recognition · Computer Science 2021-12-24 Zizheng Pan , Bohan Zhuang , Haoyu He , Jing Liu , Jianfei Cai

Transformer layers, which use an alternating pattern of multi-head attention and multi-layer perceptron (MLP) layers, provide an effective tool for a variety of machine learning problems. As the transformer layers use residual connections…

Machine Learning · Computer Science 2022-12-13 Yaofeng Desmond Zhong , Tongtao Zhang , Amit Chakraborty , Biswadip Dey

Sequence models such as transformers require inputs to be represented as one-dimensional sequences. In vision, this typically involves flattening images using a fixed row-major (raster-scan) order. While full self-attention is…

Machine Learning · Computer Science 2025-10-24 Declan Kutscher , David M. Chan , Yutong Bai , Trevor Darrell , Ritwik Gupta

Recurrent Neural Networks were, until recently, one of the best ways to capture the timely dependencies in sequences. However, with the introduction of the Transformer, it has been proven that an architecture with only attention-mechanisms…

Machine Learning · Computer Science 2021-08-19 Radostin Cholakov , Todor Kolev

We present a novel Transformer-based network architecture for instance-aware image-to-image translation, dubbed InstaFormer, to effectively integrate global- and instance-level information. By considering extracted content features from an…

Computer Vision and Pattern Recognition · Computer Science 2022-03-31 Soohyun Kim , Jongbeom Baek , Jihye Park , Gyeongnyeon Kim , Seungryong Kim

We present a new training methodology for transformers using a multilevel, layer-parallel approach. Through a neural ODE formulation of transformers, our application of a multilevel parallel-in-time algorithm for the forward and…

Machine Learning · Computer Science 2026-01-27 Shuai Jiang , Marc Salvadó-Benasco , Eric C. Cyr , Alena Kopaničáková , Rolf Krause , Jacob B. Schroder

Visual object tracking performance has been dramatically improved in recent years, but some severe challenges remain open, like distractors and occlusions. We suspect the reason is that the feature representations of the tracking targets…

Computer Vision and Pattern Recognition · Computer Science 2021-10-29 Mengmeng Wang , Xiaoqian Yang , Yong Liu

High-order numerical methods enhance Transformer performance in tasks like NLP and CV, but introduce a performance-efficiency trade-off due to increased computational overhead. Our analysis reveals that conventional efficiency techniques,…

Machine Learning · Computer Science 2025-10-14 Xinyu Liu , Bei Li , Jiahao Liu , Junhao Ruan , Kechen Jiao , Hongyin Tang , Jingang Wang , Xiao Tong , Jingbo Zhu

Transformer encoders contextualize token representations by attending to all other tokens at each layer, leading to quadratic increase in compute effort with the input length. In practice, however, the input text of many NLP tasks can be…

Computation and Language · Computer Science 2023-06-01 Jeremiah Milbauer , Annie Louis , Mohammad Javad Hosseini , Alex Fabrikant , Donald Metzler , Tal Schuster
‹ Prev 1 2 3 10 Next ›