English
Related papers

Related papers: Contextual Moral Value Alignment Through Context-B…

200 papers

The development of sophisticated artificial intelligence (AI) conversational agents based on large language models raises important questions about the relationship between human norms, values, and practices and AI design and performance.…

Computers and Society · Computer Science 2025-05-30 Rachel Katharine Sterken , James Ravi Kirkpatrick

Large Language Models (LLMs) have shown impressive moral reasoning abilities. Yet they often diverge when confronted with complex, multi-factor moral dilemmas. To address these discrepancies, we propose a framework that synthesizes multiple…

Computation and Language · Computer Science 2026-02-09 Chenchen Yuan , Zheyu Zhang , Shuo Yang , Bardh Prenkaj , Gjergji Kasneci

As AI systems become more advanced, ensuring their alignment with a diverse range of individuals and societal values becomes increasingly critical. But how can we capture fundamental human values and assess the degree to which AI systems…

Human-Computer Interaction · Computer Science 2025-11-05 Hua Shen , Tiffany Knearem , Reshmi Ghosh , Yu-Ju Yang , Nicholas Clark , Tanushree Mitra , Yun Huang

Ensuring that Large Language Models (LLMs) align with the diverse and evolving human values across different regions and cultures remains a critical challenge in AI ethics. Current alignment approaches often yield superficial conformity…

Artificial Intelligence · Computer Science 2025-11-04 Jiahao Wang , Songkai Xue , Jinghui Li , Xiaozhen Wang

Social alignment in AI systems aims to ensure that these models behave according to established societal values. However, unlike humans, who derive consensus on value judgments through social interaction, current language models (LMs) are…

Computation and Language · Computer Science 2023-10-31 Ruibo Liu , Ruixin Yang , Chenyan Jia , Ge Zhang , Denny Zhou , Andrew M. Dai , Diyi Yang , Soroush Vosoughi

Evaluating the value alignment of large language models (LLMs) has traditionally relied on single-sentence adversarial prompts, which directly probe models with ethically sensitive or controversial questions. However, with the rapid…

Computation and Language · Computer Science 2025-03-31 Yazhou Zhang , Qimeng Liu , Qiuchi Li , Peng Zhang , Jing Qin

Are AI systems truly representing human values, or merely averaging across them? Our study suggests a concerning reality: Large Language Models (LLMs) fail to represent diverse cultural moral frameworks despite their linguistic…

Computation and Language · Computer Science 2025-08-01 Simon Münker

The field of artificial intelligence (AI) alignment aims to investigate whether AI technologies align with human interests and values and function in a safe and ethical manner. AI alignment is particularly relevant for large language models…

Human-Computer Interaction · Computer Science 2023-01-18 Thilo Hagendorff , Sarah Fabi

This paper examines the challenges associated with achieving life-long superalignment in AI systems, particularly large language models (LLMs). Superalignment is a theoretical framework that aspires to ensure that superintelligent AI…

Computers and Society · Computer Science 2024-03-25 Gokul Puthumanaillam , Manav Vora , Pranay Thangeda , Melkior Ornik

The ongoing evolution of AI paradigms has propelled AI research into the agentic AI stage. Consequently, the focus of research has shifted from single agents and simple applications towards multi-agent autonomous decision-making and task…

Artificial Intelligence · Computer Science 2025-08-08 Wei Zeng , Hengshu Zhu , Chuan Qin , Han Wu , Yihang Cheng , Sirui Zhang , Xiaowei Jin , Yinuo Shen , Zhenxing Wang , Feimin Zhong , Hui Xiong

The autonomous decision-making process, which is increasingly applied to computer systems, requires that the choices made by these systems align with human values. In this context, systems must assess how well their decisions reflect human…

Computers and Society · Computer Science 2025-12-19 Eduardo de la Cruz Fernández , Marcelo Karanik , Sascha Ossowski

The adoption of generative AI technologies is swiftly expanding. Services employing both linguistic and mul-timodal models are evolving, offering users increasingly precise responses. Consequently, human reliance on these technologies is…

Computers and Society · Computer Science 2023-11-17 Jaeyoun You , Bongwon Suh

Improving the alignment of Large Language Models (LLMs) with respect to the cultural values that they encode has become an increasingly important topic. In this work, we study whether we can exploit existing knowledge about cultural values…

Computation and Language · Computer Science 2025-09-09 Rochelle Choenni , Ekaterina Shutova

Increasing interest in ensuring the safety of next-generation Artificial Intelligence (AI) systems calls for novel approaches to embedding morality into autonomous agents. This goal differs qualitatively from traditional task-specific AI…

Artificial Intelligence · Computer Science 2025-01-17 Elizaveta Tennant , Stephen Hailes , Mirco Musolesi

As Large Language Models (LLMs) become increasingly sophisticated and ubiquitous in natural language processing (NLP) applications, ensuring their robustness, trustworthiness, and alignment with human values has become a critical challenge.…

Computation and Language · Computer Science 2024-08-09 Wrick Talukdar , Anjanava Biswas

We argue that enabling human-AI dialogue, purposed to support joint reasoning (i.e., 'inquiry'), is important for ensuring that AI decision making is aligned with human values and preferences. In particular, we point to logic-based models…

Artificial Intelligence · Computer Science 2024-05-29 Elfia Bezou-Vrakatseli , Oana Cocarascu , Sanjay Modgil

Minimizing negative impacts of Artificial Intelligent (AI) systems on human societies without human supervision requires them to be able to align with human values. However, most current work only addresses this issue from a technical point…

Computation and Language · Computer Science 2024-08-13 Mehdi Khamassi , Marceau Nahon , Raja Chatila

Aligning large language models (LLMs) with human values is a central challenge for ensuring trustworthy and safe deployment. While existing methods such as Reinforcement Learning from Human Feedback (RLHF) and its variants have improved…

Multiagent Systems · Computer Science 2026-03-13 Yuanhong Wu , Djallel Bouneffouf , D. Frank Hsu

Value alignment is central to the development of safe and socially compatible artificial intelligence. However, how Large Language Models (LLMs) represent and enact human values in real-world decision contexts remains under-explored. We…

Computation and Language · Computer Science 2026-01-14 Jen-tse Huang , Jiantong Qin , Xueli Qiu , Sharon Levy , Michelle R. Kaufman , Mark Dredze

LLM alignment has progressed in single-agent settings through paradigms such as RL with human feedback (RLHF), while recent work explores scalable alternatives such as RL with AI feedback (RLAIF) and dynamic alignment objectives. However,…

Computation and Language · Computer Science 2026-04-10 Panatchakorn Anantaprayoon , Nataliia Babina , Nima Asgharbeygi , Jad Tarifi
‹ Prev 1 2 3 10 Next ›