Evolutionary debunking and value alignment

Michael T. Dale (Hampden-Sydney College) and Bradford Saad (Global Priorities Institute, University of Oxford)

GPI Working Paper No. 11-2024

This paper examines the bearing of evolutionary debunking arguments—which use the evolutionary origins of values to challenge their epistemic credentials—on the alignment problem, i.e. the problem of ensuring that highly capable AI systems are properly aligned with values. Since evolutionary debunking arguments are among the best empirically-motivated arguments that recommend changes in values, it is unsurprising that they are relevant to the alignment problem. However, how evolutionary debunking arguments bear on alignment is a neglected issue. This paper sheds light on that issue by showing how evolutionary debunking arguments: (1) raise foundational challenges to posing the alignment problem, (2) yield normative constraints on solving it, and (3) generate stumbling blocks for implementing solutions. After mapping some general features of this philosophical terrain, we illustrate how evolutionary debunking arguments interact with some of the main technical approaches to alignment. To conclude, we motivate a parliamentary approach to alignment and suggest some ways of developing and testing it.

Other working papers

A paradox for tiny probabilities and enormous values – Nick Beckstead (Open Philanthropy Project) and Teruji Thomas (Global Priorities Institute, Oxford University)

We show that every theory of the value of uncertain prospects must have one of three unpalatable properties. Reckless theories recommend risking arbitrarily great gains at arbitrarily long odds for the sake of enormous potential; timid theories recommend passing up arbitrarily great gains to prevent a tiny increase in risk; nontransitive theories deny the principle that, if A is better than B and B is better than C, then A must be better than C.

How to resist the Fading Qualia Argument – Andreas Mogensen (Global Priorities Institute, University of Oxford)

The Fading Qualia Argument is perhaps the strongest argument supporting the view that in order for a system to be conscious, it does not need to be made of anything in particular, so long as its internal parts have the right causal relations to each other and to the system’s inputs and outputs. I show how the argument can be resisted given two key assumptions: that consciousness is associated with vagueness at its boundaries and that conscious neural activity has a particular kind of holistic structure. …

The case for strong longtermism – Hilary Greaves and William MacAskill (Global Priorities Institute, University of Oxford)

A striking fact about the history of civilisation is just how early we are in it. There are 5000 years of recorded history behind us, but how many years are still to come? If we merely last as long as the typical mammalian species…