Related papers: Large Margin Neural Language Model

Investigation of Large-Margin Softmax in Neural Language Modeling

To encourage intra-class compactness and inter-class separability among trainable feature vectors, large-margin softmax methods are developed and widely applied in the face recognition community. The introduction of the large-margin concept…

Audio and Speech Processing · Electrical Eng. & Systems 2021-04-22 Jingjing Huo , Yingbo Gao , Weiyue Wang , Ralf Schlüter , Hermann Ney

Pruning Large Language Models via Accuracy Predictor

Large language models(LLMs) containing tens of billions of parameters (or even more) have demonstrated impressive capabilities in various NLP tasks. However, substantial model size poses challenges to training, inference, and deployment so…

Artificial Intelligence · Computer Science 2023-10-11 Yupeng Ji , Yibo Cao , Jiucai Liu

Exploring the Limits of Language Modeling

In this work we explore recent advances in Recurrent Neural Networks for large scale Language Modeling, a task central to language understanding. We extend current models to deal with two key challenges present in this task: corpora and…

Computation and Language · Computer Science 2016-02-15 Rafal Jozefowicz , Oriol Vinyals , Mike Schuster , Noam Shazeer , Yonghui Wu

Larger or Smaller Reward Margins to Select Preferences for Alignment?

Preference learning is critical for aligning large language models (LLMs) with human values, with the quality of preference datasets playing a crucial role in this process. While existing metrics primarily assess data quality based on…

Machine Learning · Computer Science 2025-03-05 Kexin Huang , Junkang Wu , Ziqian Chen , Xue Wang , Jinyang Gao , Bolin Ding , Jiancan Wu , Xiangnan He , Xiang Wang

Efficient Response Generation Strategy Selection for Fine-Tuning Large Language Models Through Self-Aligned Perplexity

Fine-tuning large language models (LLMs) typically relies on producing large sets of input-output pairs. Yet for a given question, there can be many valid outputs. In practice, these outputs are often derived by distilling knowledge from…

Computation and Language · Computer Science 2025-08-28 Xuan Ren , Qi Chen , Lingqiao Liu

Large Language Model Pruning

We surely enjoy the larger the better models for their superior performance in the last couple of years when both the hardware and software support the birth of such extremely huge models. The applied fields include text mining and others.…

Computation and Language · Computer Science 2024-06-04 Hanjuan Huang , Hao-Jia Song , Hsing-Kuo Pao

Improving the Language Understanding Capabilities of Large Language Models Using Reinforcement Learning

Instruction-fine-tuned large language models (LLMs) under 14B parameters continue to underperform on natural language understanding (NLU) tasks, often trailing smaller models like BERT-base on benchmarks such as GLUE and SuperGLUE.…

Computation and Language · Computer Science 2025-09-29 Bokai Hu , Sai Ashish Somayajula , Xin Pan , Pengtao Xie

Discriminative training of RNNLMs with the average word error criterion

In automatic speech recognition (ASR), recurrent neural language models (RNNLM) are typically used to refine hypotheses in the form of lattices or n-best lists, which are generated by a beam search decoder with a weaker language model. The…

Computation and Language · Computer Science 2018-11-09 Rémi Francis , Tom Ash , Will Williams

Scaling Recurrent Neural Network Language Models

This paper investigates the scaling properties of Recurrent Neural Network Language Models (RNNLMs). We discuss how to train very large RNNs on GPUs and address the questions of how RNNLMs scale with respect to model size, training-set…

Computation and Language · Computer Science 2015-02-03 Will Williams , Niranjani Prasad , David Mrva , Tom Ash , Tony Robinson

Improving Neural Language Modeling via Adversarial Training

Recently, substantial progress has been made in language modeling by using deep neural networks. However, in practice, large scale neural language models have been shown to be prone to overfitting. In this paper, we present a simple yet…

Machine Learning · Computer Science 2019-09-10 Dilin Wang , Chengyue Gong , Qiang Liu

Achieving Peak Performance for Large Language Models: A Systematic Review

In recent years, large language models (LLMs) have achieved remarkable success in natural language processing (NLP). LLMs require an extreme amount of parameters to attain high performance. As models grow into the trillion-parameter range,…

Computation and Language · Computer Science 2024-09-10 Zhyar Rzgar K Rostam , Sándor Szénási , Gábor Kertész

Multi-stage Large Language Model Correction for Speech Recognition

In this paper, we investigate the usage of large language models (LLMs) to improve the performance of competitive speech recognition systems. Different from previous LLM-based ASR error correction methods, we propose a novel multi-stage…

Computation and Language · Computer Science 2024-06-18 Jie Pu , Thai-Son Nguyen , Sebastian Stüker

Large Margin Deep Networks for Classification

We present a formulation of deep learning that aims at producing a large margin classifier. The notion of margin, minimum distance to a decision boundary, has served as the foundation of several theoretically profound and empirically…

Machine Learning · Statistics 2018-12-05 Gamaleldin F. Elsayed , Dilip Krishnan , Hossein Mobahi , Kevin Regan , Samy Bengio

MarginSel : Max-Margin Demonstration Selection for LLMs

Large Language Models (LLMs) excel at few-shot learning via in-context learning (ICL). However, the effectiveness of ICL is often sensitive to the selection and ordering of demonstration examples. To address this, we present MarginSel:…

Machine Learning · Computer Science 2025-06-10 Rajeev Bhatt Ambati , James Lester , Shashank Srivastava , Snigdha Chaturvedi

Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization

This work suggests fundamentally rethinking the current practice of pruning large language models (LLMs). The way it is done is by divide and conquer: split the model into submodels, sequentially prune them, and reconstruct predictions of…

Computation and Language · Computer Science 2024-10-14 Sungbin Shin , Wonpyo Park , Jaeho Lee , Namhoon Lee

Learning to Compare for Better Training and Evaluation of Open Domain Natural Language Generation Models

Automated evaluation of open domain natural language generation (NLG) models remains a challenge and widely used metrics such as BLEU and Perplexity can be misleading in some cases. In our paper, we propose to evaluate natural language…

Computation and Language · Computer Science 2020-02-13 Wangchunshu Zhou , Ke Xu

Large Language Models as Annotators: Enhancing Generalization of NLP Models at Minimal Cost

State-of-the-art supervised NLP models achieve high accuracy but are also susceptible to failures on inputs from low-data regimes, such as domains that are not represented in training data. As an approximation to collecting ground-truth…

Computation and Language · Computer Science 2023-06-29 Parikshit Bansal , Amit Sharma

Preserving Pre-trained Features Helps Calibrate Fine-tuned Language Models

Large pre-trained language models (PLMs) have demonstrated strong performance on natural language understanding (NLU) tasks through fine-tuning. However, fine-tuned models still suffer from overconfident predictions, especially in…

Computation and Language · Computer Science 2023-05-31 Guande He , Jianfei Chen , Jun Zhu

Optimal Embedding Learning Rate in LLMs: The Effect of Vocabulary Size

Pretraining large language models is a costly process. To make this process more efficient, several methods have been proposed to optimize model architecture/parametrization and hardware use. On the parametrization side, $\mu P$ (Maximal…

Machine Learning · Computer Science 2025-06-19 Soufiane Hayou , Liyuan Liu

On the N-gram Approximation of Pre-trained Language Models

Large pre-trained language models (PLMs) have shown remarkable performance across various natural language understanding (NLU) tasks, particularly in low-resource settings. Nevertheless, their potential in Automatic Speech Recognition (ASR)…

Computation and Language · Computer Science 2023-06-13 Aravind Krishnan , Jesujoba Alabi , Dietrich Klakow