English
Related papers

Related papers: DropKey

200 papers

Variants dropout methods have been designed for the fully-connected layer, convolutional layer and recurrent layer in neural networks, and shown to be effective to avoid overfitting. As an appealing alternative to recurrent and…

Computation and Language · Computer Science 2019-07-29 Lin Zehui , Pengfei Liu , Luyao Huang , Junkun Chen , Xipeng Qiu , Xuanjing Huang

In this paper, we introduce DropHead, a structured dropout method specifically designed for regularizing the multi-head attention mechanism, which is a key component of transformer, a state-of-the-art model for various NLP tasks. In…

Computation and Language · Computer Science 2020-11-03 Wangchunshu Zhou , Tao Ge , Ke Xu , Furu Wei , Ming Zhou

This work aims to improve the efficiency of vision transformers (ViT). While ViTs use computationally expensive self-attention operations in every layer, we identify that these operations are highly correlated across layers -- a key…

Computer Vision and Pattern Recognition · Computer Science 2023-01-18 Shashanka Venkataramanan , Amir Ghodrati , Yuki M. Asano , Fatih Porikli , Amirhossein Habibian

Dropout is a widely used regularization technique which improves the generalization ability of a model by randomly dropping neurons. In light of this, we propose Dropout Prompt Learning, which aims for applying dropout to improve the…

Computer Vision and Pattern Recognition · Computer Science 2025-12-09 Biao Chen , Lin Zuo , Mengmeng Jing , Kunbin He , Yuchen Wang

We introduceDropDim, a structured dropout method designed for regularizing the self-attention mechanism, which is a key component of the transformer. In contrast to the general dropout method, which randomly drops neurons, DropDim drops…

Computation and Language · Computer Science 2023-04-21 Hao Zhang , Dan Qu , Keji Shao , Xukui Yang

Dropout and DropConnect are well-known techniques that apply a consistent drop rate to randomly deactivate neurons or edges in a neural network layer during training. This paper introduces a novel methodology that assigns dynamic drop rates…

Machine Learning · Computer Science 2025-02-28 Yuan-Chih Yang , Hung-Hsuan Chen

Despite dropout's ubiquity in machine learning, its effectiveness as a form of data augmentation remains under-explored. We address two key questions: (i) When is dropout effective as an augmentation strategy? (ii) Is dropout uniquely…

Machine Learning · Computer Science 2025-06-02 Rickard Brüel-Gabrielsson , Tongzhou Wang , Manel Baradad , Justin Solomon

As it is empirically observed that Vision Transformers (ViTs) are quite insensitive to the order of input tokens, the need for an appropriate self-supervised pretext task that enhances the location awareness of ViTs is becoming evident. To…

Computer Vision and Pattern Recognition · Computer Science 2023-09-25 Haochen Wang , Junsong Fan , Yuxi Wang , Kaiyou Song , Tong Wang , Zhaoxiang Zhang

Predicting the altered acoustic frames is an effective way of self-supervised learning for speech representation. However, it is challenging to prevent the pretrained model from overfitting. In this paper, we proposed to introduce two…

Audio and Speech Processing · Electrical Eng. & Systems 2021-07-12 Jian Luo , Jianzong Wang , Ning Cheng , Jing Xiao

Vision Transformer (ViT) has recently demonstrated promise in computer vision problems. However, unlike Convolutional Neural Networks (CNN), it is known that the performance of ViT saturates quickly with depth increasing, due to the…

Computer Vision and Pattern Recognition · Computer Science 2022-03-14 Peihao Wang , Wenqing Zheng , Tianlong Chen , Zhangyang Wang

Vision Transformer (ViT) self-attention mechanism is characterized by feature collapse in deeper layers, resulting in the vanishing of low-level visual features. However, such features can be helpful to accurately represent and identify…

Computer Vision and Pattern Recognition · Computer Science 2024-08-06 Anxhelo Diko , Danilo Avola , Marco Cascio , Luigi Cinque

Deep neural networks often work well when they are over-parameterized and trained with a massive amount of noise and regularization, such as weight decay and dropout. Although dropout is widely used as a regularization technique for fully…

Computer Vision and Pattern Recognition · Computer Science 2018-10-31 Golnaz Ghiasi , Tsung-Yi Lin , Quoc V. Le

In convolutional neural network (CNN), dropout cannot work well because dropped information is not entirely obscured in convolutional layers where features are correlated spatially. Except randomly discarding regions or channels, many…

Computer Vision and Pattern Recognition · Computer Science 2021-03-30 Tianshu Xie , Minghui Liu , Jiali Deng , Xuan Cheng , Xiaomin Wang , Ming Liu

Fine-tuning large pre-trained language models on downstream tasks is apt to suffer from overfitting when limited training data is available. While dropout proves to be an effective antidote by randomly dropping a proportion of units,…

Computation and Language · Computer Science 2022-10-13 Tao Yang , Jinghao Deng , Xiaojun Quan , Qifan Wang , Shaoliang Nie

We introduce Dynamic Dropout, a novel regularization technique designed to enhance the training efficiency of Transformer models by dynamically adjusting the dropout rate based on training epochs or validation loss improvements. This…

Machine Learning · Computer Science 2024-11-06 Hanrui Yan , Dan Shao

Beyond the success story of pre-trained language models (PrLMs) in recent natural language processing, they are susceptible to over-fitting due to unusual large model size. To this end, dropout serves as a therapy. However, existing methods…

Computation and Language · Computer Science 2021-06-02 Hongqiu Wu , Hai Zhao , Min Zhang

Dropout is a simple but efficient regularization technique for achieving better generalization of deep neural networks (DNNs); hence it is widely used in tasks based on DNNs. During training, dropout randomly discards a portion of the…

Neural and Evolutionary Computing · Computer Science 2020-10-22 Hiroshi Inoue

Self-attention mechanism is the key of the Transformer but often criticized for its computation demands. Previous token pruning works motivate their methods from the view of computation redundancy but still need to load the full network and…

Computer Vision and Pattern Recognition · Computer Science 2024-04-09 Sihao Lin , Pumeng Lyu , Dongrui Liu , Tao Tang , Xiaodan Liang , Andy Song , Xiaojun Chang

Intrigued by the inherent ability of the human visual system to identify salient regions in complex scenes, attention mechanisms have been seamlessly integrated into various Computer Vision (CV) tasks. Building upon this paradigm, Vision…

Vision transformers have demonstrated the potential to outperform CNNs in a variety of vision tasks. But the computational and memory requirements of these models prohibit their use in many applications, especially those that depend on…

Computer Vision and Pattern Recognition · Computer Science 2022-10-06 Yue Liu , Christos Matsoukas , Fredrik Strand , Hossein Azizpour , Kevin Smith
‹ Prev 1 2 3 10 Next ›