English
Related papers

Related papers: Stable Language Model Pre-training by Reducing Emb…

200 papers

Reinforcement Learning (RL) has significantly improved large language model reasoning, but existing RL fine-tuning methods rely heavily on heuristic techniques such as entropy regularization and reweighting to maintain stability. In…

Computation and Language · Computer Science 2026-05-26 Shiqi Liu , Zeyu He , Guojian Zhan , Letian Tao , Zhilong Zheng , Jiang Wu , Yinuo Wang , Yang Guan , Kehua Sheng , Bo Zhang , Keqiang Li , Jingliang Duan , Shengbo Eben Li

Word embeddings are computed by a class of techniques within natural language processing (NLP), that create continuous vector representations of words in a language from a large text corpus. The stochastic nature of the training process of…

Computation and Language · Computer Science 2020-08-03 Lucas Rettenmeier

This paper presents a new pre-trained language model, DeBERTaV3, which improves the original DeBERTa model by replacing mask language modeling (MLM) with replaced token detection (RTD), a more sample-efficient pre-training task. Our…

Computation and Language · Computer Science 2023-03-27 Pengcheng He , Jianfeng Gao , Weizhu Chen

The use of low-rank adaptation (LoRA) with frozen pretrained language models (PLMs) has become increasing popular as a mainstream, resource-efficient modeling approach for memory-constrained hardware. In this study, we first explore how to…

While transferring a pretrained language model, common approaches conventionally attach their task-specific classifiers to the top layer and adapt all the pretrained layers. We investigate whether one could make a task-specific selection on…

Computation and Language · Computer Science 2022-10-20 Shuo Xie , Jiahao Qiu , Ankita Pasad , Li Du , Qing Qu , Hongyuan Mei

Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient method for fine-tuning Large Langauge Models. It updates the weight matrix as $W=W_0+sBA$, where $W_0$ is the original frozen weight, $s$ is a scaling factor and $A$,$B$ are…

Machine Learning · Computer Science 2026-03-06 Yize Wu , Ke Gao , Ling Li , Yanjun Wu

This paper presents TEVR, a speech recognition model designed to minimize the variation in token entropy w.r.t. to the language model. This takes advantage of the fact that if the language model will reliably and accurately predict a token…

Computation and Language · Computer Science 2022-06-28 Hajo Nils Krabbenhöft , Erhardt Barth

Recent works have shown that powerful pre-trained language models (PLM) can be fooled by small perturbations or intentional attacks. To solve this issue, various data augmentation techniques are proposed to improve the robustness of PLMs.…

Computation and Language · Computer Science 2021-09-14 Kun Zhou , Wayne Xin Zhao , Sirui Wang , Fuzheng Zhang , Wei Wu , Ji-Rong Wen

Realignment becomes necessary when a language model (LM) fails to meet expected performance. We propose a flexible realignment framework that supports quantitative control of alignment degree during training and inference. This framework…

Computation and Language · Computer Science 2026-01-13 Wenhong Zhu , Ruobing Xie , Weinan Zhang , Rui Wang

Fine-tuning over large pretrained language models (PLMs) has established many state-of-the-art results. Despite its superior performance, such fine-tuning can be unstable, resulting in significant variance in performance and potential risks…

Computation and Language · Computer Science 2022-10-20 Chenghao Yang , Xuezhe Ma

Training stability is typically regarded as a prerequisite for reliable optimization in large language models. In this work, we analyze how stabilizing training dynamics affects the induced generation distribution. We show that under…

Artificial Intelligence · Computer Science 2026-02-10 Xianzhe Meng , Qiangsheng Zeng , Ling Luo , Qinghan Yang , Jiarui Hao , Wenbo Wu , Qinyu Wang , Rui Yin , Lin Qi , Renzhi Lu

Long-context inference in large language models is bottlenecked by Key--Value (KV) cache loading during the decoding stage, where the sequential nature of generation requires repeatedly transferring the KV cache from off-chip High-Bandwidth…

Machine Learning · Computer Science 2026-03-03 Songtao Liu , Hongwu Peng , Zhiwei Zhang , Zhengyu Chen , Yue Guo

The massive scaling of Large Language Models (LLMs) has made pretraining increasingly cost-prohibitive. While low-rank representation and orthonormal weight matrices could in principle reduce parameter counts and computational overhead,…

Machine Learning · Computer Science 2026-05-28 Kaivan Kamali , Kajetan Schweighofer , Hormoz Shahrzad , Olivier Francon , Babak Hodjat , Risto Miikkulainen

Prolonged reinforcement learning with verifiable rewards (RLVR) has been shown to drive continuous improvements in the reasoning capabilities of large language models, but the training is often prone to instabilities, especially in…

Artificial Intelligence · Computer Science 2026-05-13 Yiming Dong , Kun Fu , Haoyu Li , Xinyuan Zhu , Yurou Liu , Lijing Shao , Jieping Ye , Zheng Wang

A representation learning method is considered stable if it consistently generates similar representation of the given data across multiple runs. Word Embedding Methods (WEMs) are a class of representation learning methods that generate…

Computation and Language · Computer Science 2024-06-13 Angana Borah , Manash Pratim Barman , Amit Awekar

This paper proposes a novel formulation for reinforcement learning (RL) with large language models, explaining why and under what conditions the true sequence-level reward can be optimized via a surrogate token-level objective in policy…

Machine Learning · Computer Science 2025-12-04 Chujie Zheng , Kai Dang , Bowen Yu , Mingze Li , Huiqiang Jiang , Junrong Lin , Yuqiong Liu , Hao Lin , Chencan Wu , Feng Hu , An Yang , Jingren Zhou , Junyang Lin

The robustness of Vision Language Models (VLMs) is commonly assessed through output-level invariance, implicitly assuming that stable predictions reflect stable multimodal processing. In this work, we argue that this assumption is…

Large Language Models (LLMs) have demonstrated impressive performance on multiple-choice question answering (MCQA) benchmarks, yet they remain highly vulnerable to minor input perturbations. In this paper, we introduce and evaluate Token…

Computation and Language · Computer Science 2025-06-12 Jui-Ming Yao , Hao-Yuan Chen , Zi-Xian Tang , Bing-Jia Tan , Sheng-Wei Peng , Bing-Cheng Xie , Shun-Feng Su

Autoregressive language models decode left-to-right with irreversible commitments, limiting revision during multi-step reasoning. We propose \textbf{VDLM}, a modular variable diffusion language model that separates semantic planning from…

Computation and Language · Computer Science 2026-02-19 Shuhui Qu

Gradient-based adversarial training is widely used in improving the robustness of neural networks, while it cannot be easily adapted to natural language processing tasks since the embedding space is discrete. In natural language processing…

Computation and Language · Computer Science 2020-12-07 Linyang Li , Xipeng Qiu
‹ Prev 1 2 3 10 Next ›