Related papers: Correcting Large Language Model Behavior via Influ…

Understanding the Learning Dynamics of Alignment with Human Feedback

Aligning large language models (LLMs) with human intentions has become a critical task for safely deploying models in real-world systems. While existing alignment approaches have seen empirical success, theoretically understanding how these…

Machine Learning · Computer Science 2024-08-08 Shawn Im , Yixuan Li

MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time

Large Language Models (LLMs) acquire extensive knowledge and remarkable abilities from extensive text corpora, making them powerful tools for various applications. To make LLMs more usable, aligning them with human preferences is essential.…

Computation and Language · Computer Science 2024-10-21 Mozhi Zhang , Pengyu Wang , Chenkun Tan , Mianqiu Huang , Dong Zhang , Yaqian Zhou , Xipeng Qiu

RAIN: Your Language Models Can Align Themselves without Finetuning

Large language models (LLMs) often demonstrate inconsistencies with human preferences. Previous research typically gathered human preference data and then aligned the pre-trained models using reinforcement learning or instruction tuning,…

Computation and Language · Computer Science 2023-10-10 Yuhui Li , Fangyun Wei , Jinjing Zhao , Chao Zhang , Hongyang Zhang

Parameter-Efficient Tuning Helps Language Model Alignment

Aligning large language models (LLMs) with human preferences is essential for safe and useful LLMs. Previous works mainly adopt reinforcement learning (RLHF) and direct preference optimization (DPO) with human feedback for alignment.…

Computation and Language · Computer Science 2023-10-03 Tianci Xue , Ziqi Wang , Heng Ji

Towards Efficient and Effective Alignment of Large Language Models

Large language models (LLMs) exhibit remarkable capabilities across diverse tasks, yet aligning them efficiently and effectively with human expectations remains a critical challenge. This thesis advances LLM alignment by introducing novel…

Computation and Language · Computer Science 2025-06-12 Yuxin Jiang

The Real, the Better: Aligning Large Language Models with Online Human Behaviors

Large language model alignment is widely used and studied to avoid LLM producing unhelpful and harmful responses. However, the lengthy training process and predefined preference bias hinder adaptation to online diverse human preferences. To…

Computation and Language · Computer Science 2024-05-02 Guanying Jiang , Lingyong Yan , Haibo Shi , Dawei Yin

Combining LLM decision and RL action selection to improve RL policy for adaptive interventions

Reinforcement learning (RL) is increasingly being used in the healthcare domain, particularly for the development of personalized health adaptive interventions. Inspired by the success of Large Language Models (LLMs), we are interested in…

Machine Learning · Computer Science 2025-01-14 Karine Karine , Benjamin M. Marlin

Aligning Large Language Models for Controllable Recommendations

Inspired by the exceptional general intelligence of Large Language Models (LLMs), researchers have begun to explore their application in pioneering the next generation of recommender systems - systems that are conversational, explainable,…

Information Retrieval · Computer Science 2024-08-06 Wensheng Lu , Jianxun Lian , Wei Zhang , Guanghua Li , Mingyang Zhou , Hao Liao , Xing Xie

Stayin' Aligned Over Time: Towards Longitudinal Human-LLM Alignment via Contextual Reflection and Privacy-Preserving Behavioral Data

Current human-AI alignment and evaluation methods for large language models (LLMs) often rely on preference signals collected immediately after an interaction. This practice implicitly treats preference as static, even though many…

Human-Computer Interaction · Computer Science 2026-05-06 Simret Araya Gebreegziabher , Allison E Sproul , Yinuo Yang , Chaoran Chen , Diego Gómez-Zará , Toby Jia-Jun Li

Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback

The success of AI assistants based on Language Models (LLMs) hinges on Reinforcement Learning from Human Feedback (RLHF) to comprehend and align with user intentions. However, traditional alignment algorithms, such as PPO, are hampered by…

Computation and Language · Computer Science 2024-07-03 Songyang Gao , Qiming Ge , Wei Shen , Shihan Dou , Junjie Ye , Xiao Wang , Rui Zheng , Yicheng Zou , Zhi Chen , Hang Yan , Qi Zhang , Dahua Lin

Supervised Fine-Tuning as Inverse Reinforcement Learning

The prevailing approach to aligning Large Language Models (LLMs) typically relies on human or AI feedback and assumes access to specific types of preference datasets. In our work, we question the efficacy of such datasets and explore…

Machine Learning · Computer Science 2024-03-19 Hao Sun

Predicting and Understanding Human Action Decisions: Insights from Large Language Models and Cognitive Instance-Based Learning

Large Language Models (LLMs) have demonstrated their capabilities across various tasks, from language translation to complex reasoning. Understanding and predicting human behavior and biases are crucial for artificial intelligence (AI)…

Artificial Intelligence · Computer Science 2024-08-06 Thuy Ngoc Nguyen , Kasturi Jamale , Cleotilde Gonzalez

Aligning Large Language Models via Fine-grained Supervision

Pre-trained large-scale language models (LLMs) excel at producing coherent articles, yet their outputs may be untruthful, toxic, or fail to align with user expectations. Current approaches focus on using reinforcement learning with human…

Computation and Language · Computer Science 2024-06-06 Dehong Xu , Liang Qiu , Minseok Kim , Faisal Ladhak , Jaeyoung Do

A Survey on Human Preference Learning for Large Language Models

The recent surge of versatile large language models (LLMs) largely depends on aligning increasingly capable foundation models with human intentions by preference learning, enhancing LLMs with excellent applicability and effectiveness in a…

Computation and Language · Computer Science 2024-06-19 Ruili Jiang , Kehai Chen , Xuefeng Bai , Zhixuan He , Juntao Li , Muyun Yang , Tiejun Zhao , Liqiang Nie , Min Zhang

Aligning Large Language Models from Self-Reference AI Feedback with one General Principle

In aligning large language models (LLMs), utilizing feedback from existing advanced AI rather than humans is an important method to scale supervisory signals. However, it is highly challenging for AI to understand human intentions and…

Computation and Language · Computer Science 2024-06-18 Rong Bao , Rui Zheng , Shihan Dou , Xiao Wang , Enyu Zhou , Bo Wang , Qi Zhang , Liang Ding , Dacheng Tao

In2Core: Leveraging Influence Functions for Coreset Selection in Instruction Finetuning of Large Language Models

Despite advancements, fine-tuning Large Language Models (LLMs) remains costly due to the extensive parameter count and substantial data requirements for model generalization. Accessibility to computing resources remains a barrier for the…

Machine Learning · Computer Science 2024-10-04 Ayrton San Joaquin , Bin Wang , Zhengyuan Liu , Nicholas Asher , Brian Lim , Philippe Muller , Nancy F. Chen

Aligning LLMs with Individual Preferences via Interaction

As large language models (LLMs) demonstrate increasingly advanced capabilities, aligning their behaviors with human values and preferences becomes crucial for their wide adoption. While previous research focuses on general alignment to…

Computation and Language · Computer Science 2024-12-17 Shujin Wu , May Fung , Cheng Qian , Jeonghwan Kim , Dilek Hakkani-Tur , Heng Ji

Studying Large Language Model Generalization with Influence Functions

When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior?…

Machine Learning · Computer Science 2023-08-08 Roger Grosse , Juhan Bae , Cem Anil , Nelson Elhage , Alex Tamkin , Amirhossein Tajdini , Benoit Steiner , Dustin Li , Esin Durmus , Ethan Perez , Evan Hubinger , Kamilė Lukošiūtė , Karina Nguyen , Nicholas Joseph , Sam McCandlish , Jared Kaplan , Samuel R. Bowman

Hybrid Alignment Training for Large Language Models

Alignment training is crucial for enabling large language models (LLMs) to cater to human intentions and preferences. It is typically performed based on two stages with different objectives: instruction-following alignment and…

Computation and Language · Computer Science 2024-06-24 Chenglong Wang , Hang Zhou , Kaiyan Chang , Bei Li , Yongyu Mu , Tong Xiao , Tongran Liu , Jingbo Zhu

Investigating on RLHF methodology

In this article, we investigate the alignment of Large Language Models according to human preferences. We discuss the features of training a Preference Model, which simulates human preferences, and the methods and details we found essential…

Machine Learning · Computer Science 2024-10-03 Alexey Kutalev , Sergei Markoff