English
Related papers

Related papers: Stable Anisotropic Regularization

200 papers

Fine-tuning pre-trained language models (PTLMs), such as BERT and its better variant RoBERTa, has been a common practice for advancing performance in natural language understanding (NLU) tasks. Recent advance in representation learning…

Computation and Language · Computer Science 2021-02-05 Wenxuan Zhou , Bill Yuchen Lin , Xiang Ren

The recent success of distributed word representations has led to an increased interest in analyzing the properties of their spatial distribution. Several studies have suggested that contextualized word embedding models do not isotropically…

Computation and Language · Computer Science 2023-02-23 William Rudman , Nate Gillman , Taylor Rayne , Carsten Eickhoff

The advent of large-scale pre-trained language models has contributed greatly to the recent progress in natural language processing. Many state-of-the-art language models are first trained on a large text corpus and then fine-tuned on…

Computation and Language · Computer Science 2023-11-13 Hang Hua , Xingjian Li , Dejing Dou , Cheng-Zhong Xu , Jiebo Luo

Fine-tuning pre-trained language models such as BERT has become a common practice dominating leaderboards across various NLP tasks. Despite its recent success and wide adoption, this process is unstable when there are only a small number of…

Computation and Language · Computer Science 2021-07-13 Hang Hua , Xingjian Li , Dejing Dou , Cheng-Zhong Xu , Jiebo Luo

As Large Language Models (LLMs) expand in capability and application scope, their trustworthiness becomes critical. A vital risk is intrinsic deception, wherein models strategically mislead users to achieve their own objectives. Existing…

Machine Learning · Computer Science 2026-03-31 Guoxi Zhang , Jiawei Chen , Tianzhuo Yang , Lang Qin , Juntao Dai , Yaodong Yang , Jingwei Yi

As large language models (LLMs) are increasingly deployed in high-stakes and operational settings, evaluation strategies based solely on aggregate accuracy are often insucient to characterize system reliability. This study proposes a…

Artificial Intelligence · Computer Science 2026-05-06 Hikmat Karimov , Rahid Zahid Alekberli

Looped Language Models (LoopLMs) enable efficient latent reasoning through depth recurrence, yet exhibit unreliable test-time scaling behavior: performance often peaks at a certain iteration depth and then collapses with further recurrence.…

Machine Learning · Computer Science 2026-05-27 Xiao-Wen Yang , Ziyu Han , Xi-Hua Zhang , Wen-Da Wei , Jie-Jing Shao , Lan-Zhe Guo , Yu-Feng Li

We study generalization in an overparameterized continual linear regression setting, where a model is trained with L2 (isotropic) regularization across a sequence of tasks. We derive a closed-form expression for the expected generalization…

Machine Learning · Computer Science 2026-04-14 Gilad Karpel , Edward Moroshko , Ran Levinstein , Ron Meir , Daniel Soudry , Itay Evron

Text data augmentation is a widely used strategy for mitigating data sparsity in natural language processing (NLP), particularly in low-resource settings where limited samples hinder effective semantic modeling. While augmentation can…

Computation and Language · Computer Science 2025-07-17 Payal Bhattad , Sai Manoj Pudukotai Dinakarrao , Anju Gupta

Large pretrained language models have transformed natural language processing, and their adaptation to protein sequences -- viewed as strings of amino acid characters -- has advanced protein analysis. However, the distinct properties of…

Other Quantitative Biology · Quantitative Biology 2025-10-14 Sheikh Azizul Hakim , Kowshic Roy , M Saifur Rahman

Recent advances in the LLM-as-Extractor paradigm leverage large language models (LLMs) to transfer semantically rich item embeddings into sequential recommendation (SR) backbones. However, LLM-generated embeddings often suffer from strong…

Information Retrieval · Computer Science 2026-05-29 Dongcheol Lee , Hye-young Kim , Jongwuk Lee

We develop a flexible framework for low-rank matrix estimation that allows us to transform noise models into regularization schemes via a simple bootstrap algorithm. Effectively, our procedure seeks an autoencoding basis for the observed…

Methodology · Statistics 2016-06-29 Julie Josse , Stefan Wager

Centralized training is the standard paradigm in deep learning, enabling models to learn from a unified dataset in a single location. In such setup, isotropic feature distributions naturally arise as a mean to support well-structured and…

Machine Learning · Computer Science 2026-02-09 Chiara Lanza , Roberto Pereira , Marco Miozzo , Eduard Angelats , Paolo Dini

Deep reinforcement learning systems often suffer from unstable training dynamics due to non-stationarity, where learning objectives and data distributions evolve over time. We show that under non-stationary targets, isotropic Gaussian…

Machine Learning · Computer Science 2026-03-20 Ali Saheb Pasand , Johan Obando-Ceron , Aaron Courville , Pouya Bashivan , Pablo Samuel Castro

Artificial and biological agents cannon learn given completely random and unstructured data. The structure of data is encoded in the metric relationships between data points. In the context of neural networks, neuronal activity within a…

Machine Learning · Computer Science 2022-11-03 Kosio Beshkov , Jonas Verhellen , Mikkel Elle Lepperød

Consistency regularization is a commonly used practice to encourage the model to generate consistent representation from distorted input features and improve model generalization. It shows significant improvement on various speech…

Computation and Language · Computer Science 2024-11-12 Cindy Tseng , Yun Tang , Vijendra Raj Apsingekar

Recent advances show that large language models (LLMs) generalize strong performance across different natural language benchmarks. However, the large size of LLMs makes training and inference expensive and impractical to run in…

Computation and Language · Computer Science 2024-10-22 Laurence Liang

Pre-trained language models such as BERT have become a more common choice of natural language processing (NLP) tasks. Research in word representation shows that isotropic embeddings can significantly improve performance on downstream tasks.…

Computation and Language · Computer Science 2021-08-30 Yuxin Liang , Rui Cao , Jie Zheng , Jie Ren , Ling Gao

Evaluations of large language models (LLMs) suffer from instability, where small changes of random factors such as few-shot examples can lead to drastic fluctuations of scores and even model rankings. Moreover, different LLMs can have…

Machine Learning · Computer Science 2025-09-17 Yiyang Li , Yonghuang Wu , Ying Luo , Liangtai Sun , Zishu Qin , Lin Qiu , Xuezhi Cao , Xunliang Cai

Previous work has shown that the representations output by contextual language models are more anisotropic than static type embeddings, and typically display outlier dimensions. This seems to be true for both monolingual and multilingual…

Computation and Language · Computer Science 2023-06-08 Katharina Hämmerl , Alina Fastowski , Jindřich Libovický , Alexander Fraser
‹ Prev 1 2 3 10 Next ›