Will AI Avoid Exploitation?

Adam Bales (Global Priorities Institute, University of Oxford)

GPI Working Paper No. 16-2023, published in Philosophical Studies

A simple argument suggests that we can fruitfully model advanced AI systems using expected utility theory. According to this argument, an agent will need to act as if maximising expected utility if they’re to avoid exploitation. Insofar as we should expect advanced AI to avoid exploitation, it follows that we should expected advanced AI to act as if maximising expected utility. I spell out this argument more carefully and demonstrate that it fails, but show that the manner of its failure is instructive: in exploring the argument, we gain insight into how to model advanced AI systems.

Other working papers

Can an evidentialist be risk-averse? – Hayden Wilkinson (Global Priorities Institute, University of Oxford)

Two key questions of normative decision theory are: 1) whether the probabilities relevant to decision theory are evidential or causal; and 2) whether agents should be risk-neutral, and so maximise the expected value of the outcome, or instead risk-averse (or otherwise sensitive to risk). These questions are typically thought to be independent – that our answer to one bears little on our answer to the other. …

The long-run relationship between per capita incomes and population size – Maya Eden (University of Zurich) and Kevin Kuruc (Population Wellbeing Initiative, University of Texas at Austin)

The relationship between the human population size and per capita incomes has long been debated. Two competing forces feature prominently in these discussions. On the one hand, a larger population means that limited natural resources must be shared among more people. On the other hand, more people means more innovation and faster technological progress, other things equal. We study a model that features both of these channels. A calibration suggests that, in the long run, (marginal) increases in population would…

In defence of fanaticism – Hayden Wilkinson (Australian National University)

Consider a decision between: 1) a certainty of a moderately good outcome, such as one additional life saved; 2) a lottery which probably gives a worse outcome, but has a tiny probability of a far better outcome (perhaps trillions of blissful lives created). Which is morally better? Expected value theory (with a plausible axiology) judges (2) as better, no matter how tiny its probability of success. But this seems fanatical. So we may be tempted to abandon expected value theory…