Evolutionary debunking and value alignment
Michael T. Dale (Hampden-Sydney College) and Bradford Saad (Global Priorities Institute, University of Oxford)
GPI Working Paper No. 11-2024
This paper examines the bearing of evolutionary debunking arguments—which use the evolutionary origins of values to challenge their epistemic credentials—on the alignment problem, i.e. the problem of ensuring that highly capable AI systems are properly aligned with values. Since evolutionary debunking arguments are among the best empirically-motivated arguments that recommend changes in values, it is unsurprising that they are relevant to the alignment problem. However, how evolutionary debunking arguments bear on alignment is a neglected issue. This paper sheds light on that issue by showing how evolutionary debunking arguments: (1) raise foundational challenges to posing the alignment problem, (2) yield normative constraints on solving it, and (3) generate stumbling blocks for implementing solutions. After mapping some general features of this philosophical terrain, we illustrate how evolutionary debunking arguments interact with some of the main technical approaches to alignment. To conclude, we motivate a parliamentary approach to alignment and suggest some ways of developing and testing it.
Other working papers
Existential risks from a Thomist Christian perspective – Stefan Riedener (University of Zurich)
Let’s say with Nick Bostrom that an ‘existential risk’ (or ‘x-risk’) is a risk that ‘threatens the premature extinction of Earth-originating intelligent life or the permanent and drastic destruction of its potential for desirable future development’ (2013, 15). There are a number of such risks: nuclear wars, developments in biotechnology or artificial intelligence, climate change, pandemics, supervolcanos, asteroids, and so on (see e.g. Bostrom and Ćirković 2008). …
Quadratic Funding with Incomplete Information – Luis M. V. Freitas (Global Priorities Institute, University of Oxford) and Wilfredo L. Maldonado (University of Sao Paulo)
Quadratic funding is a public good provision mechanism that satisfies desirable theoretical properties, such as efficiency under complete information, and has been gaining popularity in practical applications. We evaluate this mechanism in a setting of incomplete information regarding individual preferences, and show that this result only holds under knife-edge conditions. We also estimate the inefficiency of the mechanism in a variety of settings and show, in particular, that inefficiency increases…
Dynamic public good provision under time preference heterogeneity – Philip Trammell (Global Priorities Institute and Department of Economics, University of Oxford)
I explore the implications of time preference heterogeneity for the private funding of public goods. The assumption that players use a common discount rate is knife-edge: relaxing it yields substantially different equilibria, for two reasons. First, time preference heterogeneity motivates intertemporal polarization, analogous to the polarization seen in a static public good game. In the simplest settings, more patient players spend nothing early in time and less patient players spending nothing later. Second…