Will AI Avoid Exploitation?

Adam Bales (Global Priorities Institute, University of Oxford)

GPI Working Paper No. 16-2023, published in Philosophical Studies

A simple argument suggests that we can fruitfully model advanced AI systems using expected utility theory. According to this argument, an agent will need to act as if maximising expected utility if they’re to avoid exploitation. Insofar as we should expect advanced AI to avoid exploitation, it follows that we should expected advanced AI to act as if maximising expected utility. I spell out this argument more carefully and demonstrate that it fails, but show that the manner of its failure is instructive: in exploring the argument, we gain insight into how to model advanced AI systems.

Other working papers

Population ethical intuitions – Lucius Caviola (Harvard University) et al.

Is humanity’s existence worthwhile? If so, where should the human species be headed in the future? In part, the answers to these questions require us to morally evaluate the (potential) human population in terms of its size and aggregate welfare. This assessment lies at the heart of population ethics. Our investigation across nine experiments (N = 5776) aimed to answer three questions about how people aggregate welfare across individuals: (1) Do they weigh happiness and suffering symmetrically…

Can an evidentialist be risk-averse? – Hayden Wilkinson (Global Priorities Institute, University of Oxford)

Two key questions of normative decision theory are: 1) whether the probabilities relevant to decision theory are evidential or causal; and 2) whether agents should be risk-neutral, and so maximise the expected value of the outcome, or instead risk-averse (or otherwise sensitive to risk). These questions are typically thought to be independent – that our answer to one bears little on our answer to the other. …

Estimating long-term treatment effects without long-term outcome data – David Rhys Bernard (Paris School of Economics)

Estimating long-term impacts of actions is important in many areas but the key difficulty is that long-term outcomes are only observed with a long delay. One alternative approach is to measure the effect on an intermediate outcome or a statistical surrogate and then use this to estimate the long-term effect. …