AI alignment vs AI ethical treatment: Ten challenges

Adam Bradley (Lingnan University) and Bradford Saad (Global Priorities Institute, University of Oxford)

GPI Working Paper No. 19-2024

A morally acceptable course of AI development should avoid two dangers: creating unaligned AI systems that pose a threat to humanity and mistreating AI systems that merit moral consideration in their own right. This paper argues these two dangers interact and that if we create AI systems that merit moral consideration, simultaneously avoiding both of these dangers would be extremely challenging. While our argument is straightforward and supported by a wide range of pretheoretical moral judgments, it has far-reaching moral implications for AI development. Although the most obvious way to avoid the tension between alignment and ethical treatment would be to avoid creating AI systems that merit moral consideration, this option may be unrealistic and is perhaps fleeting. So, we conclude by offering some suggestions for other ways of mitigating mistreatment risks associated with alignment.

Other working papers

Social Beneficence – Jacob Barrett (Global Priorities Institute, University of Oxford)

A background assumption in much contemporary political philosophy is that justice is the first virtue of social institutions, taking priority over other values such as beneficence. This assumption is typically treated as a methodological starting point, rather than as following from any particular moral or political theory. In this paper, I challenge this assumption.

Time Bias and Altruism – Leora Urim Sung (University College London)

We are typically near-future biased, being more concerned with our near future than our distant future. This near-future bias can be directed at others too, being more concerned with their near future than their distant future. In this paper, I argue that, because we discount the future in this way, beyond a certain point in time, we morally ought to be more concerned with the present well- being of others than with the well-being of our distant future selves. It follows that we morally ought to sacrifice…

Aggregating Small Risks of Serious Harms – Tomi Francis (Global Priorities Institute, University of Oxford)

According to Partial Aggregation, a serious harm can be outweighed by a large number of somewhat less serious harms, but can outweigh any number of trivial harms. In this paper, I address the question of how we should extend Partial Aggregation to cases of risk, and especially to cases involving small risks of serious harms. I argue that, contrary to the most popular versions of the ex ante and ex post views, we should sometimes prevent a small risk that a large number of people will suffer serious harms rather than prevent…