English
Related papers

Related papers: Towards Integrated Alignment

200 papers

The field of AI alignment aims to steer AI systems toward human goals, preferences, and ethical principles. Its contributions have been instrumental for improving the output quality, safety, and trustworthiness of today's AI models. This…

Artificial Intelligence · Computer Science 2024-11-26 Robert West , Roland Aydin

Existing alignment research is dominated by concerns about safety and preventing harm: safeguards, controllability, and compliance. This paradigm of alignment parallels early psychology's focus on mental illness: necessary but incomplete.…

Recent advances in general-purpose AI underscore the urgent need to align AI systems with human goals and values. Yet, the lack of a clear, shared understanding of what constitutes "alignment" limits meaningful progress and…

AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, so do risks from misalignment. To provide a comprehensive and up-to-date overview of the alignment field, in this survey,…

AI alignment research is the field of study dedicated to ensuring that artificial intelligence (AI) benefits humans. As machine intelligence gets more advanced, this research is becoming increasingly important. Researchers in the field…

Computers and Society · Computer Science 2022-06-08 Jan H. Kirchner , Logan Smith , Jacques Thibodeau , Kyle McDonell , Laria Reynolds

The AI-alignment problem arises when there is a discrepancy between the goals that a human designer specifies to an AI learner and a potential catastrophic outcome that does not reflect what the human designer really wants. We argue that a…

Machine Learning · Computer Science 2020-04-10 Shai Shalev-Shwartz , Shaked Shammah , Amnon Shashua

AI alignment research aims to develop techniques to ensure that AI systems do not cause harm. However, every alignment technique has failure modes, which are conditions in which there is a non-negligible chance that the technique fails to…

Artificial Intelligence · Computer Science 2025-10-14 Leonard Dung , Florian Mai

Biological and artificial information processing systems form representations of the world that they can use to categorize, reason, plan, navigate, and make decisions. How can we measure the similarity between the representations formed by…

This paper introduces a novel visual mapping methodology for assessing strategic alignment in national artificial intelligence policies. The proliferation of AI strategies across countries has created an urgent need for analytical…

Computers and Society · Computer Science 2025-07-10 Mohammad Hossein Azin , Hessam Zandhessami

This position paper states that AI Alignment in Multi-Agent Systems (MAS) should be considered a dynamic and interaction-dependent process that heavily depends on the social environment where agents are deployed, either collaborative,…

Artificial Intelligence · Computer Science 2025-06-09 Florian Carichon , Aditi Khandelwal , Marylou Fauchard , Golnoosh Farnadi

Complex networks are frequently employed to model physical or virtual complex systems. When certain entities exist across multiple systems simultaneously, unveiling their corresponding relationships across the networks becomes crucial. This…

Physics and Society · Physics 2025-04-16 Rui Tang , Ziyun Yong , Shuyu Jiang , Xingshu Chen , Yaofang Liu , Yi-Cheng Zhang , Gui-Quan Sun , Wei Wang

Alignment of artificial intelligence (AI) encompasses the normative problem of specifying how AI systems should act and the technical problem of ensuring AI systems comply with those specifications. To date, AI alignment has generally…

A core challenge in the development of increasingly capable AI systems is to make them safe and reliable by ensuring their behaviour is consistent with human values. This challenge, known as the alignment problem, does not merely apply to…

Machine Learning · Computer Science 2023-11-07 Raphaël Millière

The AI alignment problem, which focusses on ensuring that artificial intelligence (AI), including AGI and ASI, systems act according to human values, presents profound challenges. With the progression from narrow AI to Artificial General…

Artificial Intelligence · Computer Science 2025-07-25 Alberto Hernández-Espinosa , Felipe S. Abrahão , Olaf Witkowski , Hector Zenil

Recent advances in AI research make it increasingly plausible that artificial agents with consequential real-world impact will soon operate beyond tightly controlled environments. Ensuring that these agents are not only safe but that they…

Computers and Society · Computer Science 2025-06-10 Kevin Baum

The concepts of ``human-centered AI'' and ``value-based decision'' have gained significant attention in both research and industry. However, many critical aspects remain underexplored and require further investigation. In particular, there…

Artificial Intelligence · Computer Science 2025-08-26 Sz-Ting Tzeng , Frank Dignum

As AI systems become increasingly capable and influential, ensuring their alignment with human values, preferences, and goals has become a critical research focus. Current alignment methods primarily focus on designing algorithms and loss…

Computation and Language · Computer Science 2025-05-02 Min-Hsuan Yeh , Jeffrey Wang , Xuefeng Du , Seongheon Park , Leitian Tao , Shawn Im , Yixuan Li

While much research in artificial intelligence (AI) has focused on scaling capabilities, the accelerating pace of development makes countervailing work on producing harmless, "aligned" systems increasingly urgent. Yet research on alignment…

Artificial Intelligence · Computer Science 2025-12-12 Dani Roytburg , Beck Miller

We consider that existing approaches to AI "safety" and "alignment" may not be using the most effective tools, teams, or approaches. We suggest that an alternative and better approach to the problem may be to treat alignment as a social…

Computers and Society · Computer Science 2024-07-16 David Rostcheck , Lara Scheibling

As the capabilities of large machine learning models continue to grow, and as the autonomy afforded to such models continues to expand, the spectre of a new adversary looms: the models themselves. The threat that a model might behave in a…

Machine Learning · Computer Science 2023-07-27 Andres Carranza , Dhruv Pai , Rylan Schaeffer , Arnuv Tandon , Sanmi Koyejo
‹ Prev 1 2 3 10 Next ›