On two arguments for Fanaticism

Jeffrey Sanford Russell (University of Southern California)

GPI Working Paper No. 17-2021, published in Noûs

Should we make significant sacrifices to ever-so-slightly lower the chance of extremely bad outcomes, or to ever-so-slightly raise the chance of extremely good outcomes? Fanaticism says yes: for every bad outcome, there is a tiny chance of extreme disaster that is even worse, and for every good outcome, there is a tiny chance of an enormous good that is even better. I consider two related recent arguments for Fanaticism: Beckstead and Thomas’s argument from strange dependence on space and time, and Wilkinson’s Indology argument. While both arguments are instructive, neither is persuasive. In fact, the general principles that underwrite the arguments (a separability principle in the first case, and a reflection principle in the second) are inconsistent with Fanaticism. In both cases, though, it is possible to rehabilitate arguments for Fanaticism based on restricted versions of those principles. The situation is unstable: plausible general principles tell against Fanaticism, but restrictions of those same principles (with strengthened auxiliary assumptions) support Fanaticism. All of the consistent views that emerge are very strange.

Other working papers

Evolutionary debunking and value alignment – Michael T. Dale (Hampden-Sydney College) and Bradford Saad (Global Priorities Institute, University of Oxford)

This paper examines the bearing of evolutionary debunking arguments—which use the evolutionary origins of values to challenge their epistemic credentials—on the alignment problem, i.e. the problem of ensuring that highly capable AI systems are properly aligned with values. Since evolutionary debunking arguments are among the best empirically-motivated arguments that recommend changes in values, it is unsurprising that they are relevant to the alignment problem. However, how evolutionary debunking arguments…

Will AI Avoid Exploitation? – Adam Bales (Global Priorities Institute, University of Oxford)

A simple argument suggests that we can fruitfully model advanced AI systems using expected utility theory. According to this argument, an agent will need to act as if maximising expected utility if they’re to avoid exploitation. Insofar as we should expect advanced AI to avoid exploitation, it follows that we should expected advanced AI to act as if maximising expected utility. I spell out this argument more carefully and demonstrate that it fails, but show that the manner of its failure is instructive…

In search of a biological crux for AI consciousness – Bradford Saad (Global Priorities Institute, University of Oxford)

Whether AI systems could be conscious is often thought to turn on whether consciousness is closely linked to biology. The rough thought is that if consciousness is closely linked to biology, then AI consciousness is impossible, and if consciousness is not closely linked to biology, then AI consciousness is possible—or, at any rate, it’s more likely to be possible. A clearer specification of the kind of link between consciousness and biology that is crucial for the possibility of AI consciousness would help organize inquiry into…