Related papers: Interactive AI Alignment: Specification, Process, …
AI alignment is about ensuring AI systems only pursue goals and activities that are beneficial to humans. Most of the current approach to AI alignment is to learn what humans value from their behavioural data. This paper proposes a…
Aligning AI systems with human values fundamentally relies on effective human feedback. While significant research has addressed training algorithms, the role of user interface is often overlooked and only treated as an implementation…
AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, so do risks from misalignment. To provide a comprehensive and up-to-date overview of the alignment field, in this survey,…
Recent advances in general-purpose AI underscore the urgent need to align AI systems with human goals and values. Yet, the lack of a clear, shared understanding of what constitutes "alignment" limits meaningful progress and…
Background: Value alignment in computer science research is often used to refer to the process of aligning artificial intelligence with humans, but the way the phrase is used often lacks precision. Objectives: In this paper, we conduct a…
This paper looks at philosophical questions that arise in the context of AI alignment. It defends three propositions. First, normative and technical aspects of the AI alignment problem are interrelated, creating space for productive…
Modern AI assistants are trained to follow instructions, implicitly assuming that users can clearly articulate their goals and the kind of assistance they need. Decades of behavioral research, however, show that people often engage with AI…
Empirical human-AI alignment aims to make AI systems act in line with observed human behavior. While noble in its goals, we argue that empirical alignment can inadvertently introduce statistical biases that warrant caution. This position…
In real-world collaboration, alignment, process structure, and outcome quality do not exhibit a simple linear or one-to-one correspondence: similar alignment may accompany either rapid convergence or extensive multi-branch exploration, and…
Discussion of AI alignment (alignment between humans and AI systems) has focused on value alignment, broadly referring to creating AI systems that share human values. We argue that before we can even attempt to align values, it is…
Human computer interaction is shifting from screen-based systems to multimodal interfaces where artificial intelligence powered systems increasingly interpret user intent through speech, gesture, and gaze. Yet users rarely understand how…
Design is a non-linear, reflective process in which practitioners engage with visual, semantic, and other expressive materials to explore, iterate, and refine ideas. As Generative AI (GenAI) becomes integrated into professional design…
Large language model-based AI companions are increasingly viewed by users as friends or romantic partners, leading to deep emotional bonds. However, they can generate biased, discriminatory, and harmful outputs. Recently, users are taking…
As AI systems become embedded in everyday practice, value misalignment has emerged as a pressing concern. Yet, dominant alignment approaches remain model centric, treating users as passive recipients of prespecified values rather than as…
The field of AI alignment aims to steer AI systems toward human goals, preferences, and ethical principles. Its contributions have been instrumental for improving the output quality, safety, and trustworthiness of today's AI models. This…
AI intent alignment, ensuring that AI produces outcomes as intended by users, is a critical challenge in human-AI interaction. The emergence of generative AI, including LLMs, has intensified the significance of this problem, as interactions…
While we do not always use words, communicating what we want to an AI is a conversation -- with ourselves as well as with it, a recurring loop with optional steps depending on the complexity of the situation and our request. Any given…
Existing alignment research is dominated by concerns about safety and preventing harm: safeguards, controllability, and compliance. This paradigm of alignment parallels early psychology's focus on mental illness: necessary but incomplete.…
Much of the research focus on AI alignment seeks to align large language models and other foundation models to the context-less and generic values of helpfulness, harmlessness, and honesty. Frontier model providers also strive to align…
With AI systems becoming more powerful and pervasive, there is increasing debate about keeping their actions aligned with the broader goals and needs of humanity. This multi-disciplinary and multi-stakeholder debate must resolve many…