Related papers: Efficient Representation Learning via Adaptive Con…

Context-Aware Self-Attention Networks

Self-attention model have shown its flexibility in parallel computation and the effectiveness on modeling both long- and short-term dependencies. However, it calculates the dependencies between representations without considering the…

Computation and Language · Computer Science 2019-02-18 Baosong Yang , Jian Li , Derek Wong , Lidia S. Chao , Xing Wang , Zhaopeng Tu

Pool Me Wisely: On the Effect of Pooling in Transformer-Based Models

Transformer models have become the dominant backbone for sequence modeling, leveraging self-attention to produce contextualized token representations. These are typically aggregated into fixed-size vectors via pooling operations for…

Machine Learning · Computer Science 2025-10-07 Sofiane Ennadir , Levente Zólyomi , Oleg Smirnov , Tianze Wang , John Pertoft , Filip Cornell , Lele Cao

PoNet: Pooling Network for Efficient Token Mixing in Long Sequences

Transformer-based models have achieved great success in various NLP, vision, and speech tasks. However, the core of Transformer, the self-attention mechanism, has a quadratic time and memory complexity with respect to the sequence length,…

Computation and Language · Computer Science 2023-05-23 Chao-Hong Tan , Qian Chen , Wen Wang , Qinglin Zhang , Siqi Zheng , Zhen-Hua Ling

AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling

Pooling layers are essential building blocks of convolutional neural networks (CNNs), to reduce computational overhead and increase the receptive fields of proceeding convolutional operations. Their goal is to produce downsampled volumes…

Computer Vision and Pattern Recognition · Computer Science 2022-12-05 Alexandros Stergiou , Ronald Poppe

Supervised attention for speaker recognition

The recently proposed self-attentive pooling (SAP) has shown good performance in several speaker recognition systems. In SAP systems, the context vector is trained end-to-end together with the feature extractor, where the role of context…

Sound · Computer Science 2020-12-04 Seong Min Kye , Joon Son Chung , Hoirin Kim

Context-aware Attentional Pooling (CAP) for Fine-grained Visual Classification

Deep convolutional neural networks (CNNs) have shown a strong ability in mining discriminative object pose and parts information for image recognition. For fine-grained recognition, context-aware rich feature representation of object/scene…

Computer Vision and Pattern Recognition · Computer Science 2021-01-19 Ardhendu Behera , Zachary Wharton , Pradeep Hewage , Asish Bera

Attentive Convolution: Equipping CNNs with RNN-style Attention Mechanisms

In NLP, convolutional neural networks (CNNs) have benefited less than recurrent neural networks (RNNs) from attention mechanisms. We hypothesize that this is because the attention in CNNs has been mainly implemented as attentive pooling…

Computation and Language · Computer Science 2018-11-14 Wenpeng Yin , Hinrich Schütze

Self-Attentive Pooling for Efficient Deep Learning

Efficient custom pooling techniques that can aggressively trim the dimensions of a feature map and thereby reduce inference compute and memory footprint for resource-constrained computer vision applications have recently gained significant…

Computer Vision and Pattern Recognition · Computer Science 2023-01-02 Fang Chen , Gourav Datta , Souvik Kundu , Peter Beerel

Enhancing compact convolutional transformers with super attention

In this paper, we propose a vision model that adopts token mixing, sequence-pooling, and convolutional tokenizers to achieve state-of-the-art performance and efficient inference in fixed context-length tasks. In the CIFAR100 benchmark, our…

Computer Vision and Pattern Recognition · Computer Science 2025-08-27 Simpenzwe Honore Leandre , Natenaile Asmamaw Shiferaw , Dillip Rout

Core Context Aware Transformers for Long Context Language Modeling

Transformer-based Large Language Models (LLMs) have exhibited remarkable success in extensive tasks primarily attributed to self-attention mechanism, which requires a token to consider all preceding tokens as its context to compute…

Computation and Language · Computer Science 2025-08-05 Yaofo Chen , Zeng You , Shuhai Zhang , Haokun Li , Yirui Li , Yaowei Wang , Mingkui Tan

Efficient Transformers with Dynamic Token Pooling

Transformers achieve unrivalled performance in modelling language, but remain inefficient in terms of memory and time complexity. A possible remedy is to reduce the sequence length in the intermediate layers by pooling fixed-length segments…

Computation and Language · Computer Science 2023-10-25 Piotr Nawrot , Jan Chorowski , Adrian Łańcucki , Edoardo M. Ponti

Attentive Pooling Networks

In this work, we propose Attentive Pooling (AP), a two-way attention mechanism for discriminative model training. In the context of pair-wise ranking or classification with neural networks, AP enables the pooling layer to be aware of the…

Computation and Language · Computer Science 2016-02-12 Cicero dos Santos , Ming Tan , Bing Xiang , Bowen Zhou

Spatio-Temporal Attention Pooling for Audio Scene Classification

Acoustic scenes are rich and redundant in their content. In this work, we present a spatio-temporal attention pooling layer coupled with a convolutional recurrent neural network to learn from patterns that are discriminative while…

Sound · Computer Science 2019-07-01 Huy Phan , Oliver Y. Chén , Lam Pham , Philipp Koch , Maarten De Vos , Ian McLoughlin , Alfred Mertins

Inceptive Transformers: Enhancing Contextual Representations through Multi-Scale Feature Learning Across Domains and Languages

Encoder transformer models compress information from all tokens in a sequence into a single [CLS] token to represent global context. This approach risks diluting fine-grained or hierarchical features, leading to information loss in…

Computation and Language · Computer Science 2025-09-23 Asif Shahriar , Rifat Shahriyar , M Saifur Rahman

Looking around you: external information enhances representations for event sequences

Representation learning produces models in different domains, such as store purchases, client transactions, and general people's behavior. However, such models for event sequences usually process each sequence in isolation, ignoring context…

Machine Learning · Computer Science 2026-05-29 Petr Sokerin , Maria Kovaleva , Ekaterina Boyarina , Pavel Tikhomirov , Denis Vorobiyov , Alexey Zaytsev

Why and when should you pool? Analyzing Pooling in Recurrent Architectures

Pooling-based recurrent neural architectures consistently outperform their counterparts without pooling. However, the reasons for their enhanced performance are largely unexamined. In this work, we examine three commonly used pooling…

Computation and Language · Computer Science 2020-10-29 Pratyush Maini , Keshav Kolluru , Danish Pruthi , Mausam

Self Multi-Head Attention for Speaker Recognition

Most state-of-the-art Deep Learning (DL) approaches for speaker recognition work on a short utterance level. Given the speech signal, these algorithms extract a sequence of speaker embeddings from short segments and those are averaged to…

Sound · Computer Science 2019-07-03 Miquel India , Pooyan Safari , Javier Hernando

LAP: An Attention-Based Module for Concept Based Self-Interpretation and Knowledge Injection in Convolutional Neural Networks

Despite the state-of-the-art performance of deep convolutional neural networks, they are susceptible to bias and malfunction in unseen situations. Moreover, the complex computation behind their reasoning is not human-understandable to…

Computer Vision and Pattern Recognition · Computer Science 2023-10-25 Rassa Ghavami Modegh , Ahmad Salimi , Alireza Dizaji , Hamid R. Rabiee

Self-attention encoding and pooling for speaker recognition

The computing power of mobile devices limits the end-user applications in terms of storage size, processing, memory and energy consumption. These limitations motivate researchers for the design of more efficient deep models. On the other…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-05 Pooyan Safari , Miquel India , Javier Hernando

Keep It SimPool: Who Said Supervised Transformers Suffer from Attention Deficit?

Convolutional networks and vision transformers have different forms of pairwise interactions, pooling across layers and pooling at the end of the network. Does the latter really need to be different? As a by-product of pooling, vision…

Computer Vision and Pattern Recognition · Computer Science 2023-09-14 Bill Psomas , Ioannis Kakogeorgiou , Konstantinos Karantzalos , Yannis Avrithis