AI alignment vs AI ethical treatment: Ten challenges
Adam Bradley (Lingnan University) and Bradford Saad (Global Priorities Institute, University of Oxford)
GPI Working Paper No. 19-2024
A morally acceptable course of AI development should avoid two dangers: creating unaligned AI systems that pose a threat to humanity and mistreating AI systems that merit moral consideration in their own right. This paper argues these two dangers interact and that if we create AI systems that merit moral consideration, simultaneously avoiding both of these dangers would be extremely challenging. While our argument is straightforward and supported by a wide range of pretheoretical moral judgments, it has far-reaching moral implications for AI development. Although the most obvious way to avoid the tension between alignment and ethical treatment would be to avoid creating AI systems that merit moral consideration, this option may be unrealistic and is perhaps fleeting. So, we conclude by offering some suggestions for other ways of mitigating mistreatment risks associated with alignment.
Other working papers
Can an evidentialist be risk-averse? – Hayden Wilkinson (Global Priorities Institute, University of Oxford)
Two key questions of normative decision theory are: 1) whether the probabilities relevant to decision theory are evidential or causal; and 2) whether agents should be risk-neutral, and so maximise the expected value of the outcome, or instead risk-averse (or otherwise sensitive to risk). These questions are typically thought to be independent – that our answer to one bears little on our answer to the other. …
Non-additive axiologies in large worlds – Christian Tarsney and Teruji Thomas (Global Priorities Institute, Oxford University)
Is the overall value of a world just the sum of values contributed by each value-bearing entity in that world? Additively separable axiologies (like total utilitarianism, prioritarianism, and critical level views) say ‘yes’, but non-additive axiologies (like average utilitarianism, rank-discounted utilitarianism, and variable value views) say ‘no’…
It Only Takes One: The Psychology of Unilateral Decisions – Joshua Lewis (New York University) et al.
Sometimes, one decision can guarantee that a risky event will happen. For instance, it only took one team of researchers to synthesize and publish the horsepox genome, thus imposing its publication even though other researchers might have refrained for biosecurity reasons. We examine cases where everybody who can impose a given event has the same goal but different information about whether the event furthers that goal. …