Evolutionary debunking and value alignment
Michael T. Dale (Hampden-Sydney College) and Bradford Saad (Global Priorities Institute, University of Oxford)
GPI Working Paper No. 11-2024
This paper examines the bearing of evolutionary debunking arguments—which use the evolutionary origins of values to challenge their epistemic credentials—on the alignment problem, i.e. the problem of ensuring that highly capable AI systems are properly aligned with values. Since evolutionary debunking arguments are among the best empirically-motivated arguments that recommend changes in values, it is unsurprising that they are relevant to the alignment problem. However, how evolutionary debunking arguments bear on alignment is a neglected issue. This paper sheds light on that issue by showing how evolutionary debunking arguments: (1) raise foundational challenges to posing the alignment problem, (2) yield normative constraints on solving it, and (3) generate stumbling blocks for implementing solutions. After mapping some general features of this philosophical terrain, we illustrate how evolutionary debunking arguments interact with some of the main technical approaches to alignment. To conclude, we motivate a parliamentary approach to alignment and suggest some ways of developing and testing it.
Other working papers
Is In-kind Kinder than Cash? The Impact of Money vs Food Aid on Social Emotions and Aid Take-up – Samantha Kassirer, Ata Jami, & Maryam Kouchaki (Northwestern University)
There has been widespread endorsement from the academic and philanthropic communities on the new model of giving cash to those in need. Yet the recipient’s perspective has mostly been ignored. The present research explores how food-insecure individuals feel and respond when offered either monetary or food aid from a charity. Our results reveal that individuals are less likely to accept money than food aid from charity because receiving money feels relatively more shameful and relatively less socially positive. Since many…
The unexpected value of the future – Hayden Wilkinson (Global Priorities Institute, University of Oxford)
Various philosophers accept moral views that are impartial, additive, and risk-neutral with respect to betterness. But, if that risk neutrality is spelt out according to expected value theory alone, such views face a dire reductio ad absurdum. If the expected sum of value in humanity’s future is undefined—if, e.g., the probability distribution over possible values of the future resembles the Pasadena game, or a Cauchy distribution—then those views say that no real-world option is ever better than any other. And, as I argue…
AI alignment vs AI ethical treatment: Ten challenges – Adam Bradley (Lingnan University) and Bradford Saad (Global Priorities Institute, University of Oxford)
A morally acceptable course of AI development should avoid two dangers: creating unaligned AI systems that pose a threat to humanity and mistreating AI systems that merit moral consideration in their own right. This paper argues these two dangers interact and that if we create AI systems that merit moral consideration, simultaneously avoiding both of these dangers would be extremely challenging. While our argument is straightforward and supported by a wide range of pretheoretical moral judgments, it has far-reaching…