English
Related papers

Related papers: Perplexed by Perplexity: Perplexity-Based Data Pru…

200 papers

Large volumes of text data have contributed significantly to the development of large language models (LLMs) in recent years. This data is typically acquired by scraping the internet, leading to pretraining datasets comprised of noisy web…

Computation and Language · Computer Science 2023-09-12 Max Marion , Ahmet Üstün , Luiza Pozzobon , Alex Wang , Marzieh Fadaee , Sara Hooker

Transformer-based language models have shown state-of-the-art performance on a variety of natural language understanding tasks. To achieve this performance, these models are first pre-trained on general corpus and then fine-tuned on…

Computation and Language · Computer Science 2024-07-15 Mohammadreza Tayaranian , Seyyed Hasan Mozafari , Brett H. Meyer , James J. Clark , Warren J. Gross

Large language models(LLMs) containing tens of billions of parameters (or even more) have demonstrated impressive capabilities in various NLP tasks. However, substantial model size poses challenges to training, inference, and deployment so…

Artificial Intelligence · Computer Science 2023-10-11 Yupeng Ji , Yibo Cao , Jiucai Liu

Pre-trained models have become indispensable for efficiently building models across a broad spectrum of downstream tasks. The advantages of pre-trained models have been highlighted by empirical studies on scaling laws, which demonstrate…

Machine Learning · Statistics 2026-02-05 Kazuto Fukuchi , Ryuichiro Hataya , Kota Matsui

Quality pretraining data is often seen as the key to high-performance language models. However, progress in understanding pretraining data has been slow due to the costly pretraining runs required for data selection experiments. We present…

Computation and Language · Computer Science 2025-03-11 Tristan Thrush , Christopher Potts , Tatsunori Hashimoto

For most natural language processing tasks, the dominant practice is to finetune large pretrained transformer models (e.g., BERT) using smaller downstream datasets. Despite the success of this approach, it remains unclear to what extent…

Computation and Language · Computer Science 2023-05-29 Kundan Krishna , Saurabh Garg , Jeffrey P. Bigham , Zachary C. Lipton

We study model pruning methods applied to Transformer-based neural network language models for automatic speech recognition. We explore three aspects of the pruning frame work, namely criterion, method and scheduler, analyzing their…

Machine Learning · Computer Science 2023-10-06 Leonardo Emili , Thiago Fraga-Silva , Ernest Pusateri , Markus Nußbaum-Thom , Youssef Oualil

Transformer models have revolutionized natural language processing with their unparalleled ability to grasp complex contextual relationships. However, the vast number of parameters in these models has raised concerns regarding computational…

Machine Learning · Computer Science 2023-10-10 Sia Gholami , Marwan Omar

Training data compositions for Large Language Models (LLMs) can significantly affect their downstream performance. However, a thorough data ablation study exploring large sets of candidate data mixtures is typically prohibitively expensive…

Computation and Language · Computer Science 2024-12-10 Clara Na , Ian Magnusson , Ananya Harsh Jha , Tom Sherborne , Emma Strubell , Jesse Dodge , Pradeep Dasigi

Language models can be prompted to perform a wide variety of zero- and few-shot learning problems. However, performance varies significantly with the choice of prompt, and we do not yet understand why this happens or how to pick the best…

Computation and Language · Computer Science 2024-09-16 Hila Gonen , Srini Iyer , Terra Blevins , Noah A. Smith , Luke Zettlemoyer

Large Language Models (LLMs) demonstrate exceptional reasoning abilities, enabling strong generalization across diverse tasks such as commonsense reasoning and instruction following. However, as LLMs scale, inference costs become…

Computation and Language · Computer Science 2025-02-06 Rhea Sanjay Sukthanker , Benedikt Staffler , Frank Hutter , Aaron Klein

We surely enjoy the larger the better models for their superior performance in the last couple of years when both the hardware and software support the birth of such extremely huge models. The applied fields include text mining and others.…

Computation and Language · Computer Science 2024-06-04 Hanjuan Huang , Hao-Jia Song , Hsing-Kuo Pao

Deep networks are typically trained with many more parameters than the size of the training dataset. Recent empirical evidence indicates that the practice of overparameterization not only benefits training large models, but also assists -…

Machine Learning · Computer Science 2020-12-17 Xiangyu Chang , Yingcong Li , Samet Oymak , Christos Thrampoulidis

Large language models (LLMs) have revolutionized natural language processing, yet their substantial model sizes often require substantial computational resources. To preserve computing resources and accelerate inference speed, it is crucial…

Computation and Language · Computer Science 2025-06-04 Yirao Zhao , Guizhen Chen , Kenji Kawaguchi , Lidong Bing , Wenxuan Zhang

Transformer-based NLP models are trained using hundreds of millions or even billions of parameters, limiting their applicability in computationally constrained environments. While the number of parameters generally correlates with…

Computation and Language · Computer Science 2022-08-16 Hassan Sajjad , Fahim Dalvi , Nadir Durrani , Preslav Nakov

As language models increase in size by the day, methods for efficient inference are critical to leveraging their capabilities for various applications. Prior work has investigated techniques like model pruning, knowledge distillation, and…

Machine Learning · Computer Science 2023-08-25 Yushan Su , Vishvak Murahari , Karthik Narasimhan , Kai Li

While metrics available during pre-training, such as perplexity, correlate well with model performance at scaling-laws studies, their predictive capacities at a fixed model size remain unclear, hindering effective model selection and…

Computation and Language · Computer Science 2025-10-17 Hansi Zeng , Kai Hui , Honglei Zhuang , Zhen Qin , Zhenrui Yue , Hamed Zamani , Dana Alon

As large language models (LLMs) are pretrained on massive web corpora, careful selection of data becomes essential to ensure effective and efficient learning. While perplexity (PPL)-based filtering has shown strong performance, it suffers…

Computation and Language · Computer Science 2026-03-04 Yeongbin Seo , Gayoung Kim , Jaehyung Kim , Jinyoung Yeo

The excellent performance of deep neural networks is usually accompanied by a large number of parameters and computations, which have limited their usage on the resource-limited edge devices. To address this issue, abundant methods such as…

Computer Vision and Pattern Recognition · Computer Science 2023-05-23 Muzhou Yu , Linfeng Zhang , Kaisheng Ma

Fine-tuning large language models (LLMs) typically relies on producing large sets of input-output pairs. Yet for a given question, there can be many valid outputs. In practice, these outputs are often derived by distilling knowledge from…

Computation and Language · Computer Science 2025-08-28 Xuan Ren , Qi Chen , Lingqiao Liu
‹ Prev 1 2 3 10 Next ›