Evolutionary debunking and value alignment

Michael T. Dale (Hampden-Sydney College) and Bradford Saad (Global Priorities Institute, University of Oxford)

GPI Working Paper No. 11-2024

This paper examines the bearing of evolutionary debunking arguments—which use the evolutionary origins of values to challenge their epistemic credentials—on the alignment problem, i.e. the problem of ensuring that highly capable AI systems are properly aligned with values. Since evolutionary debunking arguments are among the best empirically-motivated arguments that recommend changes in values, it is unsurprising that they are relevant to the alignment problem. However, how evolutionary debunking arguments bear on alignment is a neglected issue. This paper sheds light on that issue by showing how evolutionary debunking arguments: (1) raise foundational challenges to posing the alignment problem, (2) yield normative constraints on solving it, and (3) generate stumbling blocks for implementing solutions. After mapping some general features of this philosophical terrain, we illustrate how evolutionary debunking arguments interact with some of the main technical approaches to alignment. To conclude, we motivate a parliamentary approach to alignment and suggest some ways of developing and testing it.

Other working papers

Simulation expectation – Teruji Thomas (Global Priorities Institute, University of Oxford)

I present a new argument for the claim that I’m much more likely to be a person living in a computer simulation than a person living in the ground-level of reality. I consider whether this argument can be blocked by an externalist view of what my evidence supports, and I urge caution against the easy assumption that actually finding lots of simulations would increase the odds that I myself am in one.

Intergenerational experimentation and catastrophic risk – Fikri Pitsuwan (Center of Economic Research, ETH Zurich)

I study an intergenerational game in which each generation experiments on a risky technology that provides private benefits, but may also cause a temporary catastrophe. I find a folk-theorem-type result on which there is a continuum of equilibria. Compared to the socially optimal level, some equilibria exhibit too much, while others too little, experimentation. The reason is that the payoff externality causes preemptive experimentation, while the informational externality leads to more caution…

Meaning, medicine and merit – Andreas Mogensen (Global Priorities Institute, Oxford University)

Given the inevitability of scarcity, should public institutions ration healthcare resources so as to prioritize those who contribute more to society? Intuitively, we may feel that this would be somehow inegalitarian. I argue that the egalitarian objection to prioritizing treatment on the basis of patients’ usefulness to others is best thought…