Related papers: Stable Language Model Pre-training by Reducing Emb…

STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens

Reinforcement Learning (RL) has significantly improved large language model reasoning, but existing RL fine-tuning methods rely heavily on heuristic techniques such as entropy regularization and reweighting to maintain stability. In…

Computation and Language · Computer Science 2026-05-26 Shiqi Liu , Zeyu He , Guojian Zhan , Letian Tao , Zhilong Zheng , Jiang Wu , Yinuo Wang , Yang Guan , Kehua Sheng , Bo Zhang , Keqiang Li , Jingliang Duan , Shengbo Eben Li

Word Embeddings: Stability and Semantic Change

Word embeddings are computed by a class of techniques within natural language processing (NLP), that create continuous vector representations of words in a language from a large text corpus. The stochastic nature of the training process of…

Computation and Language · Computer Science 2020-08-03 Lucas Rettenmeier

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

This paper presents a new pre-trained language model, DeBERTaV3, which improves the original DeBERTa model by replacing mask language modeling (MLM) with replaced token detection (RTD), a more sample-efficient pre-training task. Our…

Computation and Language · Computer Science 2023-03-27 Pengcheng He , Jianfeng Gao , Weizhu Chen

Investigating Training Strategies and Model Robustness of Low-Rank Adaptation for Language Modeling in Speech Recognition

The use of low-rank adaptation (LoRA) with frozen pretrained language models (PLMs) has become increasing popular as a mainstream, resource-efficient modeling approach for memory-constrained hardware. In this study, we first explore how to…

Computation and Language · Computer Science 2024-01-22 Yu Yu , Chao-Han Huck Yang , Tuan Dinh , Sungho Ryu , Jari Kolehmainen , Roger Ren , Denis Filimonov , Prashanth G. Shivakumar , Ankur Gandhe , Ariya Rastow , Jia Xu , Ivan Bulyko , Andreas Stolcke

Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning

While transferring a pretrained language model, common approaches conventionally attach their task-specific classifiers to the top layer and adapt all the pretrained layers. We investigate whether one could make a task-specific selection on…

Computation and Language · Computer Science 2022-10-20 Shuo Xie , Jiahao Qiu , Ankita Pasad , Li Du , Qing Qu , Hongyuan Mei

Stable-LoRA: Stabilizing Feature Learning of Low-Rank Adaptation

Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient method for fine-tuning Large Langauge Models. It updates the weight matrix as $W=W_0+sBA$, where $W_0$ is the original frozen weight, $s$ is a scaling factor and $A$,$B$ are…

Machine Learning · Computer Science 2026-03-06 Yize Wu , Ke Gao , Ling Li , Yanjun Wu

TEVR: Improving Speech Recognition by Token Entropy Variance Reduction

This paper presents TEVR, a speech recognition model designed to minimize the variation in token entropy w.r.t. to the language model. This takes advantage of the fact that if the language model will reliably and accurately predict a token…

Computation and Language · Computer Science 2022-06-28 Hajo Nils Krabbenhöft , Erhardt Barth

Virtual Data Augmentation: A Robust and General Framework for Fine-tuning Pre-trained Models

Recent works have shown that powerful pre-trained language models (PLM) can be fooled by small perturbations or intentional attacks. To solve this issue, various data augmentation techniques are proposed to improve the robustness of PLMs.…

Computation and Language · Computer Science 2021-09-14 Kun Zhou , Wayne Xin Zhao , Sirui Wang , Fuzheng Zhang , Wei Wu , Ji-Rong Wen

Flexible Realignment of Language Models

Realignment becomes necessary when a language model (LM) fails to meet expected performance. We propose a flexible realignment framework that supports quantitative control of alignment degree during training and inference. This framework…

Computation and Language · Computer Science 2026-01-13 Wenhong Zhu , Ruobing Xie , Weinan Zhang , Rui Wang

Improving Stability of Fine-Tuning Pretrained Language Models via Component-Wise Gradient Norm Clipping

Fine-tuning over large pretrained language models (PLMs) has established many state-of-the-art results. Despite its superior performance, such fine-tuning can be unstable, resulting in significant variance in performance and potential risks…

Computation and Language · Computer Science 2022-10-20 Chenghao Yang , Xuezhe Ma

Stability as a Liability:Systematic Breakdown of Linguistic Structure in LLMs

Training stability is typically regarded as a prerequisite for reliable optimization in large language models. In this work, we analyze how stabilizing training dynamics affects the induced generation distribution. We show that under…

Artificial Intelligence · Computer Science 2026-02-10 Xianzhe Meng , Qiangsheng Zeng , Ling Luo , Qinghan Yang , Jiarui Hao , Wenbo Wu , Qinyu Wang , Rui Yin , Lin Qi , Renzhi Lu

Multi-Head Low-Rank Attention

Long-context inference in large language models is bottlenecked by Key--Value (KV) cache loading during the decoding stage, where the sequential nature of generation requires repeatedly transferring the KV cache from off-chip High-Bandwidth…

Machine Learning · Computer Science 2026-03-03 Songtao Liu , Hongwu Peng , Zhiwei Zhang , Zhengyu Chen , Yue Guo

Efficient Pre-Training of LLMs through Truncated SVD Layers

The massive scaling of Large Language Models (LLMs) has made pretraining increasingly cost-prohibitive. While low-rank representation and orthonormal weight matrices could in principle reduce parameter counts and computational overhead,…

Machine Learning · Computer Science 2026-05-28 Kaivan Kamali , Kajetan Schweighofer , Hormoz Shahrzad , Olivier Francon , Babak Hodjat , Risto Miikkulainen

Probing RLVR training instability through the lens of objective-level hacking

Prolonged reinforcement learning with verifiable rewards (RLVR) has been shown to drive continuous improvements in the reasoning capabilities of large language models, but the training is often prone to instabilities, especially in…

Artificial Intelligence · Computer Science 2026-05-13 Yiming Dong , Kun Fu , Haoyu Li , Xinyuan Zhu , Yurou Liu , Lijing Shao , Jieping Ye , Zheng Wang

Are Word Embedding Methods Stable and Should We Care About It?

A representation learning method is considered stable if it consistently generates similar representation of the given data across multiple runs. Word Embedding Methods (WEMs) are a class of representation learning methods that generate…

Computation and Language · Computer Science 2024-06-13 Angana Borah , Manash Pratim Barman , Amit Awekar

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

This paper proposes a novel formulation for reinforcement learning (RL) with large language models, explaining why and under what conditions the true sequence-level reward can be optimized via a surrogate token-level objective in policy…

Machine Learning · Computer Science 2025-12-04 Chujie Zheng , Kai Dang , Bowen Yu , Mingze Li , Huiqiang Jiang , Junrong Lin , Yuqiong Liu , Hao Lin , Chencan Wu , Feng Hu , An Yang , Jingren Zhou , Junyang Lin

Same Answer, Different Representations: Hidden instability in VLMs

The robustness of Vision Language Models (VLMs) is commonly assessed through output-level invariance, implicitly assuming that stable predictions reflect stable multimodal processing. In this work, we argue that this assumption is…

Artificial Intelligence · Computer Science 2026-02-09 Farooq Ahmad Wani , Alessandro Suglia , Rohit Saxena , Aryo Pradipta Gema , Wai-Chung Kwan , Fazl Barez , Maria Sofia Bucarelli , Fabrizio Silvestri , Pasquale Minervini

Token Constraint Decoding Improves Robustness on Question Answering for Large Language Models

Large Language Models (LLMs) have demonstrated impressive performance on multiple-choice question answering (MCQA) benchmarks, yet they remain highly vulnerable to minor input perturbations. In this paper, we introduce and evaluate Token…

Computation and Language · Computer Science 2025-06-12 Jui-Ming Yao , Hao-Yuan Chen , Zi-Xian Tang , Bing-Jia Tan , Sheng-Wei Peng , Bing-Cheng Xie , Shun-Feng Su

VDLM: Variable Diffusion LMs via Robust Latent-to-Text Rendering

Autoregressive language models decode left-to-right with irreversible commitments, limiting revision during multi-step reasoning. We propose \textbf{VDLM}, a modular variable diffusion language model that separates semantic planning from…

Computation and Language · Computer Science 2026-02-19 Shuhui Qu

TAVAT: Token-Aware Virtual Adversarial Training for Language Understanding

Gradient-based adversarial training is widely used in improving the robustness of neural networks, while it cannot be easily adapted to natural language processing tasks since the embedding space is discrete. In natural language processing…

Computation and Language · Computer Science 2020-12-07 Linyang Li , Xipeng Qiu