Related papers: Adding Alignment Control to Language Models

Steering into New Embedding Spaces: Analyzing Cross-Lingual Alignment Induced by Model Interventions in Multilingual Language Models

Aligned representations across languages is a desired property in multilingual large language models (mLLMs), as alignment can improve performance in cross-lingual tasks. Typically alignment requires fine-tuning a model, which is…

Computation and Language · Computer Science 2025-07-22 Anirudh Sundar , Sinead Williamson , Katherine Metcalf , Barry-John Theobald , Skyler Seto , Masha Fedzechkina

Cross-model Control: Improving Multiple Large Language Models in One-time Training

The number of large language models (LLMs) with varying parameter scales and vocabularies is increasing. While they deliver powerful performance, they also face a set of common optimization needs to meet specific requirements or standards,…

Computation and Language · Computer Science 2024-10-24 Jiayi Wu , Hao Sun , Hengyi Cai , Lixin Su , Shuaiqiang Wang , Dawei Yin , Xiang Li , Ming Gao

Aligning Large Language Models with Representation Editing: A Control Perspective

Aligning large language models (LLMs) with human objectives is crucial for real-world applications. However, fine-tuning LLMs for alignment often suffers from unstable training and requires substantial computing resources. Test-time…

Artificial Intelligence · Computer Science 2024-11-05 Lingkai Kong , Haorui Wang , Wenhao Mu , Yuanqi Du , Yuchen Zhuang , Yifei Zhou , Yue Song , Rongzhi Zhang , Kai Wang , Chao Zhang

Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs

While large language models demonstrate remarkable capabilities at task-specific applications through fine-tuning, extending these benefits across diverse languages is essential for broad accessibility. However, effective cross-lingual…

Computation and Language · Computer Science 2025-06-03 Danni Liu , Jan Niehues

Pref-CTRL: Preference Driven LLM Alignment using Representation Editing

Test-time alignment methods offer a promising alternative to fine-tuning by steering the outputs of large language models (LLMs) at inference time with lightweight interventions on their internal representations. Recently, a prominent and…

Computation and Language · Computer Science 2026-04-28 Imranul Ashrafi , Inigo Jauregi Unanue , Massimo Piccardi

Word Alignment by Fine-tuning Embeddings on Parallel Corpora

Word alignment over parallel corpora has a wide variety of applications, including learning translation lexicons, cross-lingual transfer of language processing tools, and automatic evaluation or analysis of translation outputs. The great…

Computation and Language · Computer Science 2021-08-13 Zi-Yi Dou , Graham Neubig

Aligning Large Language Models for Controllable Recommendations

Inspired by the exceptional general intelligence of Large Language Models (LLMs), researchers have begun to explore their application in pioneering the next generation of recommender systems - systems that are conversational, explainable,…

Information Retrieval · Computer Science 2024-08-06 Wensheng Lu , Jianxun Lian , Wei Zhang , Guanghua Li , Mingyang Zhou , Hao Liao , Xing Xie

Bag of Tricks for In-Distribution Calibration of Pretrained Transformers

While pre-trained language models (PLMs) have become a de-facto standard promoting the accuracy of text classification tasks, recent studies find that PLMs often predict over-confidently. Although various calibration methods have been…

Computation and Language · Computer Science 2023-02-15 Jaeyoung Kim , Dongbin Na , Sungchul Choi , Sungbin Lim

Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach

One of the key technologies for the success of Large Language Models (LLMs) is preference alignment. However, a notable side effect of preference alignment is poor calibration: while the pre-trained models are typically well-calibrated,…

Machine Learning · Computer Science 2025-10-17 Jiancong Xiao , Bojian Hou , Zhanliang Wang , Ruochen Jin , Qi Long , Weijie J. Su , Li Shen

The Hidden Space of Safety: Understanding Preference-Tuned LLMs in Multilingual context

Alignment tuning has enabled large language models to excel in reasoning, instruction-following, and minimizing harmful generations. However, despite their widespread deployment, these models exhibit a monolingual bias, raising concerns…

Computation and Language · Computer Science 2025-04-04 Nikhil Verma , Manasa Bharadwaj

Language Surgery in Multilingual Large Language Models

Large Language Models (LLMs) have demonstrated remarkable generalization capabilities across tasks and languages, revolutionizing natural language processing. This paper investigates the naturally emerging representation alignment in LLMs,…

Computation and Language · Computer Science 2025-10-14 Joanito Agili Lopo , Muhammad Ravi Shulthan Habibi , Tack Hwa Wong , Muhammad Ilham Ghozali , Fajri Koto , Genta Indra Winata , Peerat Limkonchotiwat , Alham Fikri Aji , Samuel Cahyawijaya

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to…

Computation and Language · Computer Science 2024-11-01 Bofei Gao , Feifan Song , Yibo Miao , Zefan Cai , Zhe Yang , Liang Chen , Helan Hu , Runxin Xu , Qingxiu Dong , Ce Zheng , Shanghaoran Quan , Wen Xiao , Ge Zhang , Daoguang Zan , Keming Lu , Bowen Yu , Dayiheng Liu , Zeyu Cui , Jian Yang , Lei Sha , Houfeng Wang , Zhifang Sui , Peiyi Wang , Tianyu Liu , Baobao Chang

Continuous Language Model Interpolation for Dynamic and Controllable Text Generation

As large language models (LLMs) have gained popularity for a variety of use cases, making them adaptable and controllable has become increasingly important, especially for user-facing applications. While the existing literature on LLM…

Computation and Language · Computer Science 2025-09-30 Sara Kangaslahti , David Alvarez-Melis

Align-then-Unlearn: Embedding Alignment for LLM Unlearning

As large language models (LLMs) are trained on massive datasets, they have raised significant privacy and ethical concerns due to their potential to inadvertently retain sensitive information. Unlearning seeks to selectively remove specific…

Computation and Language · Computer Science 2025-06-17 Philipp Spohn , Leander Girrbach , Jessica Bader , Zeynep Akata

Supervised Fine-Tuning as Inverse Reinforcement Learning

The prevailing approach to aligning Large Language Models (LLMs) typically relies on human or AI feedback and assumes access to specific types of preference datasets. In our work, we question the efficacy of such datasets and explore…

Machine Learning · Computer Science 2024-03-19 Hao Sun

Data Selection for LLM Alignment Using Fine-Grained Preferences

Large language models (LLMs) alignment aims to ensure that the behavior of LLMs meets human preferences. While collecting data from multiple fine-grained, aspect-specific preferences becomes more and more feasible, existing alignment…

Machine Learning · Computer Science 2026-03-03 Jia Zhang , Yao Liu , Chen-Xi Zhang , Yi Liu , Yi-Xuan Jin , Lan-Zhe Guo , Yu-Feng Li

Control LLM: Controlled Evolution for Intelligence Retention in LLM

Large Language Models (LLMs) demand significant computational resources, making it essential to enhance their capabilities without retraining from scratch. A key challenge in this domain is \textit{catastrophic forgetting} (CF), which…

Machine Learning · Computer Science 2025-01-31 Haichao Wei , Yunxiang Ren , Zhoutong Fu , Aman Lunia , Yi-Lin Chen , Alice Leung , Ya Xu

ALIGN-MLM: Word Embedding Alignment is Crucial for Multilingual Pre-training

Multilingual pre-trained models exhibit zero-shot cross-lingual transfer, where a model fine-tuned on a source language achieves surprisingly good performance on a target language. While studies have attempted to understand transfer, they…

Computation and Language · Computer Science 2022-11-17 Henry Tang , Ameet Deshpande , Karthik Narasimhan

MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time

Large Language Models (LLMs) acquire extensive knowledge and remarkable abilities from extensive text corpora, making them powerful tools for various applications. To make LLMs more usable, aligning them with human preferences is essential.…

Computation and Language · Computer Science 2024-10-21 Mozhi Zhang , Pengyu Wang , Chenkun Tan , Mianqiu Huang , Dong Zhang , Yaqian Zhou , Xipeng Qiu

ChipAlign: Instruction Alignment in Large Language Models for Chip Design via Geodesic Interpolation

Recent advancements in large language models (LLMs) have expanded their application across various domains, including chip design, where domain-adapted chip models like ChipNeMo have emerged. However, these models often struggle with…

Hardware Architecture · Computer Science 2025-07-17 Chenhui Deng , Yunsheng Bai , Haoxing Ren