Will AI Avoid Exploitation?
Adam Bales (Global Priorities Institute, University of Oxford)
GPI Working Paper No. 16-2023, published in Philosophical Studies
A simple argument suggests that we can fruitfully model advanced AI systems using expected utility theory. According to this argument, an agent will need to act as if maximising expected utility if they’re to avoid exploitation. Insofar as we should expect advanced AI to avoid exploitation, it follows that we should expected advanced AI to act as if maximising expected utility. I spell out this argument more carefully and demonstrate that it fails, but show that the manner of its failure is instructive: in exploring the argument, we gain insight into how to model advanced AI systems.
Other working papers
Moral uncertainty and public justification – Jacob Barrett (Global Priorities Institute, University of Oxford) and Andreas T Schmidt (University of Groningen)
Moral uncertainty and disagreement pervade our lives. Yet we still need to make decisions and act, both in individual and political contexts. So, what should we do? The moral uncertainty approach provides a theory of what individuals morally ought to do when they are uncertain about morality…
Tiny probabilities and the value of the far future – Petra Kosonen (Population Wellbeing Initiative, University of Texas at Austin)
Morally speaking, what matters the most is the far future – at least according to Longtermism. The reason why the far future is of utmost importance is that our acts’ expected influence on the value of the world is mainly determined by their consequences in the far future. The case for Longtermism is straightforward: Given the enormous number of people who might exist in the far future, even a tiny probability of affecting how the far future goes outweighs the importance of our acts’ consequences…
Non-additive axiologies in large worlds – Christian Tarsney and Teruji Thomas (Global Priorities Institute, Oxford University)
Is the overall value of a world just the sum of values contributed by each value-bearing entity in that world? Additively separable axiologies (like total utilitarianism, prioritarianism, and critical level views) say ‘yes’, but non-additive axiologies (like average utilitarianism, rank-discounted utilitarianism, and variable value views) say ‘no’…