English
Related papers

Related papers: Optimizing Deeper Transformers on Small Datasets

200 papers

Fine-tuning pre-trained transformers is a powerful technique for enhancing the performance of base models on specific tasks. From early applications in models like BERT to fine-tuning Large Language Models (LLMs), this approach has been…

Computation and Language · Computer Science 2025-02-25 Suneel Nadipalli

This paper provides a starting point for Software Engineering (SE) researchers and practitioners faced with the problem of training machine learning models on small datasets. Due to the high costs associated with labeling data, in Software…

Software Engineering · Computer Science 2021-06-30 Julian Aron Prenner , Romain Robbes

Deep encoders have been proven to be effective in improving neural machine translation (NMT) systems, but training an extremely deep encoder is time consuming. Moreover, why deep models help NMT is an open question. In this paper, we…

Computation and Language · Computer Science 2020-10-09 Bei Li , Ziyang Wang , Hui Liu , Yufan Jiang , Quan Du , Tong Xiao , Huizhen Wang , Jingbo Zhu

Transformers have been recently adapted for large scale image classification, achieving high scores shaking up the long supremacy of convolutional neural networks. However the optimization of image transformers has been little studied so…

Computer Vision and Pattern Recognition · Computer Science 2021-04-08 Hugo Touvron , Matthieu Cord , Alexandre Sablayrolles , Gabriel Synnaeve , Hervé Jégou

The Transformer translation model employs residual connection and layer normalization to ease the optimization difficulties caused by its multi-layer encoder/decoder structure. Previous research shows that even with residual connection and…

Computation and Language · Computer Science 2020-05-06 Hongfei Xu , Qiuhui Liu , Josef van Genabith , Deyi Xiong , Jingyi Zhang

The impressive performance of deep learning architectures is associated with a massive increase in model complexity. Millions of parameters need to be tuned, with training and inference time scaling accordingly, together with energy…

Machine Learning · Computer Science 2023-11-10 Paolo Didier Alfano , Vito Paolo Pastore , Lorenzo Rosasco , Francesca Odone

State-of-the-art performance on language understanding tasks is now achieved with increasingly large networks; the current record holder has billions of parameters. Given a language model pre-trained on massive unlabeled text corpora, only…

Computation and Language · Computer Science 2020-04-30 Evani Radiya-Dixit , Xin Wang

Transformer is the state-of-the-art model in recent machine translation evaluations. Two strands of research are promising to improve models of this kind: the first uses wide networks (a.k.a. Transformer-Big) and has been the de facto…

Computation and Language · Computer Science 2019-06-06 Qiang Wang , Bei Li , Tong Xiao , Jingbo Zhu , Changliang Li , Derek F. Wong , Lidia S. Chao

Training deep neural networks may be challenging in real world data. Using models as black-boxes, even with transfer learning, can result in poor generalization or inconclusive results when it comes to small datasets or specific…

Model depth is a double-edged sword in deep learning: deeper models achieve higher accuracy but require higher computational cost. To efficiently train models at scale, an effective strategy is the progressive training, which scales up…

Machine Learning · Computer Science 2025-11-10 Zhiqi Bu

State of the art sequence-to-sequence models for large scale tasks perform a fixed number of computations for each input sequence regardless of whether it is easy or hard to process. In this paper, we train Transformer models which can make…

Computation and Language · Computer Science 2020-02-18 Maha Elbayad , Jiatao Gu , Edouard Grave , Michael Auli

Understanding whether deep neural networks are effectively optimized remains challenging, as training occurs in highly nonconvex landscapes and standard metrics provide limited visibility into layer-wise learning quality. This challenge is…

Machine Learning · Computer Science 2026-05-05 Arian Eamaz , Farhang Yeganegi , Mojtaba Soltanalian

Text-to-SQL translation enables non-expert users to query relational databases using natural language, with applications in education and business intelligence. This study evaluates three lightweight transformer models - T5-Small,…

Computation and Language · Computer Science 2025-08-07 Chirag Seth , Utkarsh Singh

Transformers have reshaped machine learning by utilizing attention mechanisms to capture complex patterns in large datasets, leading to significant improvements in performance. This success has contributed to the belief that "bigger means…

Machine Learning · Computer Science 2025-05-28 Hemanth Saratchandran , Damien Teney , Simon Lucey

There is currently a significant gap between the performance of fine-tuned models and prompting approaches using Large Language Models (LLMs) on the challenging task of text-to-SQL, as evaluated on datasets such as Spider. To improve the…

Computation and Language · Computer Science 2023-11-06 Mohammadreza Pourreza , Davood Rafiei

For most natural language processing tasks, the dominant practice is to finetune large pretrained transformer models (e.g., BERT) using smaller downstream datasets. Despite the success of this approach, it remains unclear to what extent…

Computation and Language · Computer Science 2023-05-29 Kundan Krishna , Saurabh Garg , Jeffrey P. Bigham , Zachary C. Lipton

In this paper, we assess the viability of transformer models in end-to-end InfoSec settings, in which no intermediate feature representations or processing steps occur outside the model. We implement transformer models for two distinct…

Machine Learning · Computer Science 2022-12-07 Ethan M. Rudd , Mohammad Saidur Rahman , Philip Tully

The Transformer model has achieved state-of-the-art performance in many sequence modeling tasks. However, how to leverage model capacity with large or variable depths is still an open challenge. We present a probabilistic framework to…

Computation and Language · Computer Science 2020-10-19 Xian Li , Asa Cooper Stickland , Yuqing Tang , Xiang Kong

Current transformer-based change detection (CD) approaches either employ a pre-trained model trained on large-scale image classification ImageNet dataset or rely on first pre-training on another CD dataset and then fine-tuning on the target…

Computer Vision and Pattern Recognition · Computer Science 2023-04-14 Mubashir Noman , Mustansar Fiaz , Hisham Cholakkal , Sanath Narayan , Rao Muhammad Anwer , Salman Khan , Fahad Shahbaz Khan

It is notoriously difficult to train Transformers on small datasets; typically, large pre-trained models are instead used as the starting point. We explore the weights of such pre-trained Transformers (particularly for vision) to attempt to…

Computer Vision and Pattern Recognition · Computer Science 2023-05-18 Asher Trockman , J. Zico Kolter
‹ Prev 1 2 3 10 Next ›