AI alignment vs AI ethical treatment: Ten challenges

Adam Bradley (Lingnan University) and Bradford Saad (Global Priorities Institute, University of Oxford)

GPI Working Paper No. 19-2024

A morally acceptable course of AI development should avoid two dangers: creating unaligned AI systems that pose a threat to humanity and mistreating AI systems that merit moral consideration in their own right. This paper argues these two dangers interact and that if we create AI systems that merit moral consideration, simultaneously avoiding both of these dangers would be extremely challenging. While our argument is straightforward and supported by a wide range of pretheoretical moral judgments, it has far-reaching moral implications for AI development. Although the most obvious way to avoid the tension between alignment and ethical treatment would be to avoid creating AI systems that merit moral consideration, this option may be unrealistic and is perhaps fleeting. So, we conclude by offering some suggestions for other ways of mitigating mistreatment risks associated with alignment.

Other working papers

How to resist the Fading Qualia Argument – Andreas Mogensen (Global Priorities Institute, University of Oxford)

The Fading Qualia Argument is perhaps the strongest argument supporting the view that in order for a system to be conscious, it does not need to be made of anything in particular, so long as its internal parts have the right causal relations to each other and to the system’s inputs and outputs. I show how the argument can be resisted given two key assumptions: that consciousness is associated with vagueness at its boundaries and that conscious neural activity has a particular kind of holistic structure. …

Consequentialism, Cluelessness, Clumsiness, and Counterfactuals – Alan Hájek (Australian National University)

According to a standard statement of objective consequentialism, a morally right action is one that has the best consequences. More generally, given a choice between two actions, one is morally better than the other just in case the consequences of the former action are better than those of the latter. (These are not just the immediate consequences of the actions, but the long-term consequences, perhaps until the end of history.) This account glides easily off the tongue—so easily that…

Welfare and felt duration – Andreas Mogensen (Global Priorities Institute, University of Oxford)

How should we understand the duration of a pleasant or unpleasant sensation, insofar as its duration modulates how good or bad the experience is overall? Given that we seem able to distinguish between subjective and objective duration and that how well or badly someone’s life goes is naturally thought of as something to be assessed from her own perspective, it seems intuitive that it is subjective duration that modulates how good or bad an experience is from the perspective of an individual’s welfare. …