Related papers: Contextual Moral Value Alignment Through Context-B…

Conversational Alignment with Artificial Intelligence in Context

The development of sophisticated artificial intelligence (AI) conversational agents based on large language models raises important questions about the relationship between human norms, values, and practices and AI design and performance.…

Computers and Society · Computer Science 2025-05-30 Rachel Katharine Sterken , James Ravi Kirkpatrick

Probabilistic Aggregation and Targeted Embedding Optimization for Collective Moral Reasoning in Large Language Models

Large Language Models (LLMs) have shown impressive moral reasoning abilities. Yet they often diverge when confronted with complex, multi-factor moral dilemmas. To address these discrepancies, we propose a framework that synthesizes multiple…

Computation and Language · Computer Science 2026-02-09 Chenchen Yuan , Zheyu Zhang , Shuo Yang , Bardh Prenkaj , Gjergji Kasneci

ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMs

As AI systems become more advanced, ensuring their alignment with a diverse range of individuals and societal values becomes increasingly critical. But how can we capture fundamental human values and assess the degree to which AI systems…

Human-Computer Interaction · Computer Science 2025-11-05 Hua Shen , Tiffany Knearem , Reshmi Ghosh , Yu-Ju Yang , Nicholas Clark , Tanushree Mitra , Yun Huang

Diverse Human Value Alignment for Large Language Models via Ethical Reasoning

Ensuring that Large Language Models (LLMs) align with the diverse and evolving human values across different regions and cultures remains a critical challenge in AI ethics. Current alignment approaches often yield superficial conformity…

Artificial Intelligence · Computer Science 2025-11-04 Jiahao Wang , Songkai Xue , Jinghui Li , Xiaozhen Wang

Training Socially Aligned Language Models on Simulated Social Interactions

Social alignment in AI systems aims to ensure that these models behave according to established societal values. However, unlike humans, who derive consensus on value judgments through social interaction, current language models (LMs) are…

Computation and Language · Computer Science 2023-10-31 Ruibo Liu , Ruixin Yang , Chenyan Jia , Ge Zhang , Denny Zhou , Andrew M. Dai , Diyi Yang , Soroush Vosoughi

Beyond Single-Sentence Prompts: Upgrading Value Alignment Benchmarks with Dialogues and Stories

Evaluating the value alignment of large language models (LLMs) has traditionally relied on single-sentence adversarial prompts, which directly probe models with ethically sensitive or controversial questions. However, with the rapid…

Computation and Language · Computer Science 2025-03-31 Yazhou Zhang , Qimeng Liu , Qiuchi Li , Peng Zhang , Jing Qin

Cultural Bias in Large Language Models: Evaluating AI Agents through Moral Questionnaires

Are AI systems truly representing human values, or merely averaging across them? Our study suggests a concerning reality: Large Language Models (LLMs) fail to represent diverse cultural moral frameworks despite their linguistic…

Computation and Language · Computer Science 2025-08-01 Simon Münker

Methodological reflections for AI alignment research using human feedback

The field of artificial intelligence (AI) alignment aims to investigate whether AI technologies align with human interests and values and function in a safe and ethical manner. AI alignment is particularly relevant for large language models…

Human-Computer Interaction · Computer Science 2023-01-18 Thilo Hagendorff , Sarah Fabi

A Moral Imperative: The Need for Continual Superalignment of Large Language Models

This paper examines the challenges associated with achieving life-long superalignment in AI systems, particularly large language models (LLMs). Superalignment is a theoretical framework that aspires to ensure that superintelligent AI…

Computers and Society · Computer Science 2024-03-25 Gokul Puthumanaillam , Manav Vora , Pranay Thangeda , Melkior Ornik

Multi-level Value Alignment in Agentic AI Systems: Survey and Perspectives

The ongoing evolution of AI paradigms has propelled AI research into the agentic AI stage. Consequently, the focus of research has shifted from single agents and simple applications towards multi-agent autonomous decision-making and task…

Artificial Intelligence · Computer Science 2025-08-08 Wei Zeng , Hengshu Zhu , Chuan Qin , Han Wu , Yihang Cheng , Sirui Zhang , Xiaowei Jin , Yinuo Shen , Zhenxing Wang , Feimin Zhong , Hui Xiong

Value Lens: Using Large Language Models to Understand Human Values

The autonomous decision-making process, which is increasingly applied to computer systems, requires that the choices made by these systems align with human values. In this context, systems must assess how well their decisions reflect human…

Computers and Society · Computer Science 2025-12-19 Eduardo de la Cruz Fernández , Marcelo Karanik , Sascha Ossowski

Evaluating and Improving Value Judgments in AI: A Scenario-Based Study on Large Language Models' Depiction of Social Conventions

The adoption of generative AI technologies is swiftly expanding. Services employing both linguistic and mul-timodal models are evolving, offering users increasingly precise responses. Consequently, human reliance on these technologies is…

Computers and Society · Computer Science 2023-11-17 Jaeyoun You , Bongwon Suh

Self-Alignment: Improving Alignment of Cultural Values in LLMs via In-Context Learning

Improving the alignment of Large Language Models (LLMs) with respect to the cultural values that they encode has become an increasingly important topic. In this work, we study whether we can exploit existing knowledge about cultural values…

Computation and Language · Computer Science 2025-09-09 Rochelle Choenni , Ekaterina Shutova

Hybrid Approaches for Moral Value Alignment in AI Agents: a Manifesto

Increasing interest in ensuring the safety of next-generation Artificial Intelligence (AI) systems calls for novel approaches to embedding morality into autonomous agents. This goal differs qualitatively from traditional task-specific AI…

Artificial Intelligence · Computer Science 2025-01-17 Elizaveta Tennant , Stephen Hailes , Mirco Musolesi

Improving Large Language Model (LLM) fidelity through context-aware grounding: A systematic approach to reliability and veracity

As Large Language Models (LLMs) become increasingly sophisticated and ubiquitous in natural language processing (NLP) applications, ensuring their robustness, trustworthiness, and alignment with human values has become a critical challenge.…

Computation and Language · Computer Science 2024-08-09 Wrick Talukdar , Anjanava Biswas

Towards Dialogues for Joint Human-AI Reasoning and Value Alignment

We argue that enabling human-AI dialogue, purposed to support joint reasoning (i.e., 'inquiry'), is important for ensuring that AI decision making is aligned with human values and preferences. In particular, we point to logic-based models…

Artificial Intelligence · Computer Science 2024-05-29 Elfia Bezou-Vrakatseli , Oana Cocarascu , Sanjay Modgil

Strong and weak alignment of large language models with human values

Minimizing negative impacts of Artificial Intelligent (AI) systems on human societies without human supervision requires them to be able to align with human values. However, most current work only addresses this issue from a technical point…

Computation and Language · Computer Science 2024-08-13 Mehdi Khamassi , Marceau Nahon , Raja Chatila

Enhancing Value Alignment of LLMs with Multi-agent system and Combinatorial Fusion

Aligning large language models (LLMs) with human values is a central challenge for ensuring trustworthy and safe deployment. While existing methods such as Reinforcement Learning from Human Feedback (RLHF) and its variants have improved…

Multiagent Systems · Computer Science 2026-03-13 Yuanhong Wu , Djallel Bouneffouf , D. Frank Hsu

Knowing But Not Doing: Convergent Morality and Divergent Action in LLMs

Value alignment is central to the development of safe and socially compatible artificial intelligence. However, how Large Language Models (LLMs) represent and enact human values in real-world decision contexts remains under-explored. We…

Computation and Language · Computer Science 2026-01-14 Jen-tse Huang , Jiantong Qin , Xueli Qiu , Sharon Levy , Michelle R. Kaufman , Mark Dredze

Learning to Negotiate: Multi-Agent Deliberation for Collective Value Alignment in LLMs

LLM alignment has progressed in single-agent settings through paradigms such as RL with human feedback (RLHF), while recent work explores scalable alternatives such as RL with AI feedback (RLAIF) and dynamic alignment objectives. However,…

Computation and Language · Computer Science 2026-04-10 Panatchakorn Anantaprayoon , Nataliia Babina , Nima Asgharbeygi , Jad Tarifi