English
Related papers

Related papers: Correcting Large Language Model Behavior via Influ…

200 papers

Aligning large language models (LLMs) with human intentions has become a critical task for safely deploying models in real-world systems. While existing alignment approaches have seen empirical success, theoretically understanding how these…

Machine Learning · Computer Science 2024-08-08 Shawn Im , Yixuan Li

Large Language Models (LLMs) acquire extensive knowledge and remarkable abilities from extensive text corpora, making them powerful tools for various applications. To make LLMs more usable, aligning them with human preferences is essential.…

Computation and Language · Computer Science 2024-10-21 Mozhi Zhang , Pengyu Wang , Chenkun Tan , Mianqiu Huang , Dong Zhang , Yaqian Zhou , Xipeng Qiu

Large language models (LLMs) often demonstrate inconsistencies with human preferences. Previous research typically gathered human preference data and then aligned the pre-trained models using reinforcement learning or instruction tuning,…

Computation and Language · Computer Science 2023-10-10 Yuhui Li , Fangyun Wei , Jinjing Zhao , Chao Zhang , Hongyang Zhang

Aligning large language models (LLMs) with human preferences is essential for safe and useful LLMs. Previous works mainly adopt reinforcement learning (RLHF) and direct preference optimization (DPO) with human feedback for alignment.…

Computation and Language · Computer Science 2023-10-03 Tianci Xue , Ziqi Wang , Heng Ji

Large language models (LLMs) exhibit remarkable capabilities across diverse tasks, yet aligning them efficiently and effectively with human expectations remains a critical challenge. This thesis advances LLM alignment by introducing novel…

Computation and Language · Computer Science 2025-06-12 Yuxin Jiang

Large language model alignment is widely used and studied to avoid LLM producing unhelpful and harmful responses. However, the lengthy training process and predefined preference bias hinder adaptation to online diverse human preferences. To…

Computation and Language · Computer Science 2024-05-02 Guanying Jiang , Lingyong Yan , Haibo Shi , Dawei Yin

Reinforcement learning (RL) is increasingly being used in the healthcare domain, particularly for the development of personalized health adaptive interventions. Inspired by the success of Large Language Models (LLMs), we are interested in…

Machine Learning · Computer Science 2025-01-14 Karine Karine , Benjamin M. Marlin

Inspired by the exceptional general intelligence of Large Language Models (LLMs), researchers have begun to explore their application in pioneering the next generation of recommender systems - systems that are conversational, explainable,…

Information Retrieval · Computer Science 2024-08-06 Wensheng Lu , Jianxun Lian , Wei Zhang , Guanghua Li , Mingyang Zhou , Hao Liao , Xing Xie

Current human-AI alignment and evaluation methods for large language models (LLMs) often rely on preference signals collected immediately after an interaction. This practice implicitly treats preference as static, even though many…

Human-Computer Interaction · Computer Science 2026-05-06 Simret Araya Gebreegziabher , Allison E Sproul , Yinuo Yang , Chaoran Chen , Diego Gómez-Zará , Toby Jia-Jun Li

The success of AI assistants based on Language Models (LLMs) hinges on Reinforcement Learning from Human Feedback (RLHF) to comprehend and align with user intentions. However, traditional alignment algorithms, such as PPO, are hampered by…

Computation and Language · Computer Science 2024-07-03 Songyang Gao , Qiming Ge , Wei Shen , Shihan Dou , Junjie Ye , Xiao Wang , Rui Zheng , Yicheng Zou , Zhi Chen , Hang Yan , Qi Zhang , Dahua Lin

The prevailing approach to aligning Large Language Models (LLMs) typically relies on human or AI feedback and assumes access to specific types of preference datasets. In our work, we question the efficacy of such datasets and explore…

Machine Learning · Computer Science 2024-03-19 Hao Sun

Large Language Models (LLMs) have demonstrated their capabilities across various tasks, from language translation to complex reasoning. Understanding and predicting human behavior and biases are crucial for artificial intelligence (AI)…

Artificial Intelligence · Computer Science 2024-08-06 Thuy Ngoc Nguyen , Kasturi Jamale , Cleotilde Gonzalez

Pre-trained large-scale language models (LLMs) excel at producing coherent articles, yet their outputs may be untruthful, toxic, or fail to align with user expectations. Current approaches focus on using reinforcement learning with human…

Computation and Language · Computer Science 2024-06-06 Dehong Xu , Liang Qiu , Minseok Kim , Faisal Ladhak , Jaeyoung Do

The recent surge of versatile large language models (LLMs) largely depends on aligning increasingly capable foundation models with human intentions by preference learning, enhancing LLMs with excellent applicability and effectiveness in a…

Computation and Language · Computer Science 2024-06-19 Ruili Jiang , Kehai Chen , Xuefeng Bai , Zhixuan He , Juntao Li , Muyun Yang , Tiejun Zhao , Liqiang Nie , Min Zhang

In aligning large language models (LLMs), utilizing feedback from existing advanced AI rather than humans is an important method to scale supervisory signals. However, it is highly challenging for AI to understand human intentions and…

Computation and Language · Computer Science 2024-06-18 Rong Bao , Rui Zheng , Shihan Dou , Xiao Wang , Enyu Zhou , Bo Wang , Qi Zhang , Liang Ding , Dacheng Tao

Despite advancements, fine-tuning Large Language Models (LLMs) remains costly due to the extensive parameter count and substantial data requirements for model generalization. Accessibility to computing resources remains a barrier for the…

Machine Learning · Computer Science 2024-10-04 Ayrton San Joaquin , Bin Wang , Zhengyuan Liu , Nicholas Asher , Brian Lim , Philippe Muller , Nancy F. Chen

As large language models (LLMs) demonstrate increasingly advanced capabilities, aligning their behaviors with human values and preferences becomes crucial for their wide adoption. While previous research focuses on general alignment to…

Computation and Language · Computer Science 2024-12-17 Shujin Wu , May Fung , Cheng Qian , Jeonghwan Kim , Dilek Hakkani-Tur , Heng Ji

When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior?…

Alignment training is crucial for enabling large language models (LLMs) to cater to human intentions and preferences. It is typically performed based on two stages with different objectives: instruction-following alignment and…

Computation and Language · Computer Science 2024-06-24 Chenglong Wang , Hang Zhou , Kaiyan Chang , Bei Li , Yongyu Mu , Tong Xiao , Tongran Liu , Jingbo Zhu

In this article, we investigate the alignment of Large Language Models according to human preferences. We discuss the features of training a Preference Model, which simulates human preferences, and the methods and details we found essential…

Machine Learning · Computer Science 2024-10-03 Alexey Kutalev , Sergei Markoff
‹ Prev 1 2 3 10 Next ›