Will AI Avoid Exploitation?

Adam Bales (Global Priorities Institute, University of Oxford)

GPI Working Paper No. 16-2023, published in Philosophical Studies

A simple argument suggests that we can fruitfully model advanced AI systems using expected utility theory. According to this argument, an agent will need to act as if maximising expected utility if they’re to avoid exploitation. Insofar as we should expect advanced AI to avoid exploitation, it follows that we should expected advanced AI to act as if maximising expected utility. I spell out this argument more carefully and demonstrate that it fails, but show that the manner of its failure is instructive: in exploring the argument, we gain insight into how to model advanced AI systems.

Other working papers

The long-run relationship between per capita incomes and population size – Maya Eden (University of Zurich) and Kevin Kuruc (Population Wellbeing Initiative, University of Texas at Austin)

The relationship between the human population size and per capita incomes has long been debated. Two competing forces feature prominently in these discussions. On the one hand, a larger population means that limited natural resources must be shared among more people. On the other hand, more people means more innovation and faster technological progress, other things equal. We study a model that features both of these channels. A calibration suggests that, in the long run, (marginal) increases in population would…

It Only Takes One: The Psychology of Unilateral Decisions – Joshua Lewis (New York University) et al.

Sometimes, one decision can guarantee that a risky event will happen. For instance, it only took one team of researchers to synthesize and publish the horsepox genome, thus imposing its publication even though other researchers might have refrained for biosecurity reasons. We examine cases where everybody who can impose a given event has the same goal but different information about whether the event furthers that goal. …

Longtermism in an Infinite World – Christian J. Tarsney (Population Wellbeing Initiative, University of Texas at Austin) and Hayden Wilkinson (Global Priorities Institute, University of Oxford)

The case for longtermism depends on the vast potential scale of the future. But that same vastness also threatens to undermine the case for longtermism: If the future contains infinite value, then many theories of value that support longtermism (e.g., risk-neutral total utilitarianism) seem to imply that no available action is better than any other. And some strategies for avoiding this conclusion (e.g., exponential time discounting) yield views that…