Evolutionary debunking and value alignment

Michael T. Dale (Hampden-Sydney College) and Bradford Saad (Global Priorities Institute, University of Oxford)

GPI Working Paper No. 11-2024

This paper examines the bearing of evolutionary debunking arguments—which use the evolutionary origins of values to challenge their epistemic credentials—on the alignment problem, i.e. the problem of ensuring that highly capable AI systems are properly aligned with values. Since evolutionary debunking arguments are among the best empirically-motivated arguments that recommend changes in values, it is unsurprising that they are relevant to the alignment problem. However, how evolutionary debunking arguments bear on alignment is a neglected issue. This paper sheds light on that issue by showing how evolutionary debunking arguments: (1) raise foundational challenges to posing the alignment problem, (2) yield normative constraints on solving it, and (3) generate stumbling blocks for implementing solutions. After mapping some general features of this philosophical terrain, we illustrate how evolutionary debunking arguments interact with some of the main technical approaches to alignment. To conclude, we motivate a parliamentary approach to alignment and suggest some ways of developing and testing it.

Other working papers

Existential risk and growth – Leopold Aschenbrenner (Columbia University)

Human activity can create or mitigate risks of catastrophes, such as nuclear war, climate change, pandemics, or artificial intelligence run amok. These could even imperil the survival of human civilization. What is the relationship between economic growth and such existential risks? In a model of directed technical change, with moderate parameters, existential risk follows a Kuznets-style inverted U-shape. …

The asymmetry, uncertainty, and the long term – Teruji Thomas (Global Priorities Institute, Oxford University)

The Asymmetry is the view in population ethics that, while we ought to avoid creating additional bad lives, there is no requirement to create additional good ones. The question is how to embed this view in a complete normative theory, and in particular one that treats uncertainty in a plausible way. After reviewing…

Estimating long-term treatment effects without long-term outcome data – David Rhys Bernard (Rethink Priorities), Jojo Lee and Victor Yaneng Wang (Global Priorities Institute, University of Oxford)

The surrogate index method allows policymakers to estimate long-run treatment effects before long-run outcomes are observable. We meta-analyse this approach over nine long-run RCTs in development economics, comparing surrogate estimates to estimates from actual long-run RCT outcomes. We introduce the M-lasso algorithm for constructing the surrogate approach’s first-stage predictive model and compare its performance with other surrogate estimation methods. …