Evolutionary debunking and value alignment
Michael T. Dale (Hampden-Sydney College) and Bradford Saad (Global Priorities Institute, University of Oxford)
GPI Working Paper No. 11-2024
This paper examines the bearing of evolutionary debunking arguments—which use the evolutionary origins of values to challenge their epistemic credentials—on the alignment problem, i.e. the problem of ensuring that highly capable AI systems are properly aligned with values. Since evolutionary debunking arguments are among the best empirically-motivated arguments that recommend changes in values, it is unsurprising that they are relevant to the alignment problem. However, how evolutionary debunking arguments bear on alignment is a neglected issue. This paper sheds light on that issue by showing how evolutionary debunking arguments: (1) raise foundational challenges to posing the alignment problem, (2) yield normative constraints on solving it, and (3) generate stumbling blocks for implementing solutions. After mapping some general features of this philosophical terrain, we illustrate how evolutionary debunking arguments interact with some of the main technical approaches to alignment. To conclude, we motivate a parliamentary approach to alignment and suggest some ways of developing and testing it.
Other working papers
The cross-sectional implications of the social discount rate – Maya Eden (Brandeis University)
How should policy discount future returns? The standard approach to this normative question is to ask how much society should care about future generations relative to people alive today. This paper establishes an alternative approach, based on the social desirability of redistributing from the current old to the current young. …
Ethical Consumerism – Philip Trammell (Global Priorities Institute and Department of Economics, University of Oxford)
I study a static production economy in which consumers have not only preferences over their own consumption but also external, or “ethical”, preferences over the supply of each good. Though existing work on the implications of external preferences assumes price-taking, I show that ethical consumers generically prefer not to act even approximately as price-takers. I therefore introduce a near-Nash equilibrium concept that generalizes the near-Nash equilibria found in literature on strategic foundations of general equilibrium…
The Hinge of History Hypothesis: Reply to MacAskill – Andreas Mogensen (Global Priorities Institute, University of Oxford)
Some believe that the current era is uniquely important with respect to how well the rest of human history goes. Following Parfit, call this the Hinge of History Hypothesis. Recently, MacAskill has argued that our era is actually very unlikely to be especially influential in the way asserted by the Hinge of History Hypothesis. I respond to MacAskill, pointing to important unresolved ambiguities in his proposed definition of what it means for a time to be influential and criticizing the two arguments…