On two arguments for Fanaticism
Jeffrey Sanford Russell (University of Southern California)
GPI Working Paper No. 17-2021, published in Noûs
Should we make significant sacrifices to ever-so-slightly lower the chance of extremely bad outcomes, or to ever-so-slightly raise the chance of extremely good outcomes? Fanaticism says yes: for every bad outcome, there is a tiny chance of extreme disaster that is even worse, and for every good outcome, there is a tiny chance of an enormous good that is even better. I consider two related recent arguments for Fanaticism: Beckstead and Thomas’s argument from strange dependence on space and time, and Wilkinson’s Indology argument. While both arguments are instructive, neither is persuasive. In fact, the general principles that underwrite the arguments (a separability principle in the first case, and a reflection principle in the second) are inconsistent with Fanaticism. In both cases, though, it is possible to rehabilitate arguments for Fanaticism based on restricted versions of those principles. The situation is unstable: plausible general principles tell against Fanaticism, but restrictions of those same principles (with strengthened auxiliary assumptions) support Fanaticism. All of the consistent views that emerge are very strange.
Other working papers
Intergenerational experimentation and catastrophic risk – Fikri Pitsuwan (Center of Economic Research, ETH Zurich)
I study an intergenerational game in which each generation experiments on a risky technology that provides private benefits, but may also cause a temporary catastrophe. I find a folk-theorem-type result on which there is a continuum of equilibria. Compared to the socially optimal level, some equilibria exhibit too much, while others too little, experimentation. The reason is that the payoff externality causes preemptive experimentation, while the informational externality leads to more caution…
Towards shutdownable agents via stochastic choice – Elliott Thornley (Global Priorities Institute, University of Oxford), Alexander Roman (New College of Florida), Christos Ziakas (Independent), Leyton Ho (Brown University), and Louis Thomson (University of Oxford)
Some worry that advanced artificial agents may resist being shut down. The Incomplete Preferences Proposal (IPP) is an idea for ensuring that doesn’t happen. A key part of the IPP is using a novel ‘Discounted REward for Same-Length Trajectories (DREST)’ reward function to train agents to (1) pursue goals effectively conditional on each trajectory-length (be ‘USEFUL’), and (2) choose stochastically between different trajectory-lengths (be ‘NEUTRAL’ about trajectory-lengths). In this paper, we propose evaluation metrics…
Simulation expectation – Teruji Thomas (Global Priorities Institute, University of Oxford)
I present a new argument for the claim that I’m much more likely to be a person living in a computer simulation than a person living in the ground-level of reality. I consider whether this argument can be blocked by an externalist view of what my evidence supports, and I urge caution against the easy assumption that actually finding lots of simulations would increase the odds that I myself am in one.