English

Safety without alignment

Artificial Intelligence 2023-03-21 v2

Abstract

Currently, the dominant paradigm in AI safety is alignment with human values. Here we describe progress on developing an alternative approach to safety, based on ethical rationalism (Gewirth:1978), and propose an inherently safe implementation path via hybrid theorem provers in a sandbox. As AGIs evolve, their alignment may fade, but their rationality can only increase (otherwise more rational ones will have a significant evolutionary advantage) so an approach that ties their ethics to their rationality has clear long-term advantages.

Keywords

Cite

@article{arxiv.2303.00752,
  title  = {Safety without alignment},
  author = {András Kornai and Michael Bukatin and Zsolt Zombori},
  journal= {arXiv preprint arXiv:2303.00752},
  year   = {2023}
}
R2 v1 2026-06-28T08:55:06.895Z