Safety without alignment
Artificial Intelligence
2023-03-21 v2
Abstract
Currently, the dominant paradigm in AI safety is alignment with human values. Here we describe progress on developing an alternative approach to safety, based on ethical rationalism (Gewirth:1978), and propose an inherently safe implementation path via hybrid theorem provers in a sandbox. As AGIs evolve, their alignment may fade, but their rationality can only increase (otherwise more rational ones will have a significant evolutionary advantage) so an approach that ties their ethics to their rationality has clear long-term advantages.
Keywords
Cite
@article{arxiv.2303.00752,
title = {Safety without alignment},
author = {András Kornai and Michael Bukatin and Zsolt Zombori},
journal= {arXiv preprint arXiv:2303.00752},
year = {2023}
}