Christian Tarsney | Non-additive axiologies in large worlds
Global Priorities Seminar
CHRISTIAN TARSNEY: (00:09) All right. So as Andreas said, the paper is called Non-Additive Axiologies in Large Worlds, and it's joint work with Teru Thomas, also at GPI. And, as I think Andreas has mentioned, at GPI, these are the central focus, our research over the last couple of years has been on this idea that people have come to call longtermism, which is roughly, the idea that what we ought to do in most choice situations, or most of the most important choice situations, is primarily determined by the potential effects that our actions have on the far future hundreds or thousands or even millions of years from now.
(00:47) And the central argument for longtermism, if you step back and squint, is what you might call an argument from astronomical scale, where those arguments schematically have the following form: because there are far more value bearing entities, for instance, welfare subjects who are affected by A than who are affected by B, we can make a much bigger difference to the overall value of the world by focusing on A rather than focusing on B. So in the case of longtermism, A is something like the long-term trajectory of humanity or maybe more narrowly the survival of human originating civilization and B is various things that we can do to make the world a better place in the short-term, for instance, improving the welfare of farmed animals, say. So to get a sense of the scales involved, for instance, Nick Bostrom, in his 2013 paper on existential risks, estimates that if we managed to settle a significant portion of the accessible universe, our civilization could eventually support more than 1032 human lives or actually, something like 1052, if we use simulations, rather than biological human beings. And with numbers that large, the thought is, even very tiny changes either to the welfare of all those future people or to the probability that they exist in the first place, could be vastly more important than anything that we can do in the near term.
(02:23) So these kinds of arguments are straightforward, if we accept what we'll call an additively separable axiology (and we'll say precisely what that means in a little bit), but roughly for now, an axiology is additively separable if it treats the value of a world as the sum of values that are contributed by each value-bearing entity in that world. And if you accept an axiology like that, where you can think of the value of the world as such a sum, then it's natural that things that affect a much larger number of individuals or welfare subjects are more important, because the impact that you have on the world just scales linearly, all else being equal, with the number of value-bearing entities or welfare subjects or whatever that you can affect.
(03:09) But if you accept an axiology that isn't additively separable, then things can be a lot less straightforward. So to start with a simple stylized example, suppose that there are 1010 or 10 billion existing people, all of them have welfare 1 and we have three options. We can either leave things unchanged, or we can improve all of the existing people from welfare 1 to welfare 2, or we can create 1032 new people with welfare 1.5 while leaving the existing people all at welfare 1. Now the distinction here isn't necessarily that additive axiologies will tell you to choose O3 and non-additive axiology is will tell you to choose O2. So for instance, if you accepted a critical level view (which we'll introduce a little bit), that's additively separable, but it would say actually, O3 is really, really bad. But the crucial difference here and the reason, or at least one reason why the additive separability marks this important dividing line in axiology, is that an additive axiology, like total utilitarianism, for instance, will say that if O3 is an improvement over the status quo over O1, then there's a really big improvement, or it's almost certainly a really big improvement, makes a much bigger difference to the world for better or worse than O2 does.
(04:32) Where as non-additive axiologies, like average utilitarianism, for instance, might agree that O3 is an improvement over the status quo. So in this case, we raise the average welfare of everybody in the universe from roughly 1 to roughly 1.5, if we choose O3, but see it as a smaller improvement than O2 because if we did O2, we could improve the average welfare to 2. So the point is that non-additive axiology, like in this case average utilitarianism, needn’t accept the claim that actions that affect a larger number of welfare subjects have a bigger impact on the overall value of the world, for better or worse.
(05:16) Okay. So the central idea in our paper is that under certain circumstances, non-additive axiologies come to agree pretty closely with some counterpart additive axiology, and therefore, among other things, that these arguments from astronomical scale that go through on additive axiologies can be made to go through on non-additive axiologies as well, under those circumstances. So the circumstances we have in mind are the existence of a large, what we'll call, ‘background population’ of welfare subjects who are unaffected by our choices. So we're imagining a situation where there is some part of the world that we can affect and that's effectively a choice between two possible sub populations X and Y, but then there's some pre-existing population Z that we can't affect. And so our choice isn't just between X and Y, it's between the combination of X and Z versus the combination of Y & Z. So to give just an illustration of what we have in mind when we talk about a background population, what the Z is, it can include things like for instance, past human beings, or past non-human welfare subjects, distant aliens in faraway galaxies, or also just present and future welfare subjects who are unaffected by our present choice. So there's some part of the world that we affect, but what we're interested in is the value of the world as a whole, and the world as a whole contains all of these individuals that we might not affect in a given choice situation.
(06:50) So just to give an initial flavor of what these convergent results are going to look like (and we'll generalize quite a bit in the next couple of sections), if you have a very large background population, with average welfare of zero, say you have 10100 individuals who all have welfare zero, and then you're choosing between two populations X and Y that are much smaller than that, it's going to turn out that average utilitarianism is going to tell you to choose the sub population, either X or Y, that has greater total welfare, because that will do the most to increase the average welfare of the population as a whole, given that you have this very large pre-existing background population. So we're going to present some results like this. And then we're going to argue that the real world background population, the population that we should treat as background for purposes of our actual choices, is large enough that these limit results, at least can be practically significant and that may have a number of practical implications, but the one we'll focus on is that the case for longtermism and maybe in particular, for things like existential risk mitigation from the astronomical scale of the far future, can succeed even if you reject additive separability as an assumption about axiology.
(08:18) Okay. So just to give a quick roadmap of where we're going, we'll do some formal setup, introduce some concepts and notation, then I’ll present the results and try to move relatively quickly through that. We'll see there are two categories of non-additive axiologies that we want to cover. One is average utilitarianism and these variable value views that are asymptotically averageist in large population contexts. And then we want to talk about egalitarianism in egalitarian axiologies. Then I'll talk about the size of the real-world background population, basically argue it's large enough that it looks at least pretty plausible that these limit results are practically significant. I’ll consider two objections, which are basically objections to my arguments about the size of the background population. And then we'll look at some practical and theoretical implications including, but not limited to this point about arguments from astronomical scale.
(09:16) Okay. So set up first. A population, or world as I think I'll sometimes say, is just a bunch of individuals who each have some level of welfare. And we'll assume, to simplify things for purposes of the talk, that the welfare levels can be represented as real numbers. So we have a set W which is a subset of the reals which represent the possible levels of welfare that individuals can have. In the paper we make this a little bit more abstract, but it's a useful simplification to just assume real valued welfare levels. A population then, is some non-zero, finitely supported function from the set W of possible welfare levels to the non-negative integers, that just tells you in that population, how many individuals are there at each welfare level. So in population X, if X2 = 10, that means that there are 10 individuals in population X who have welfare 2. It'll be important for our purposes to be able to talk about the combination of two populations, particularly combining some foreground population that you have the ability to bring about with a background population. So we can define that as follows: X + Y is the combination of populations X and Y and we just define that as for every welfare level w, X + Y has a number of people at welfare w, which is equal to the sum of the number of people at w in X and in Y. A couple of other bits of notation that will be useful, the size of a population, we can just note as follows and that's defined in the intuitive way. The sum overall of the welfare levels of the number of people at that welfare level, the total welfare we’ll denote like this Tot and that's just the sum overall the welfare levels of the number of people at that welfare level times the real number that represents that welfare level. And then average welfare, which we denote with this overlying type thing, which is total welfare divided by population size.
(11:24) Okay. So that's some formalism for talking about populations. We also want to talk about axiologies, which are evaluative theories of how good one population is relative to another. So we’ll understand an axiology A as a strict partial order, in other words, a transitive and irreflexive relation on the set of all possible populations where this X ≻A Y means that population X is better than population Y according to the axiology A . And all the axiologies that we consider in this paper can be represented by value functions, where what that means is the value function VA were to represent the axiology A means that X is better than Y according to axiology A if and only if the real number that it's assigned by the value function is greater. Okay. So as I said, the crucial dividing line in axiology that we're interested in is between axiologies that are additively separable and ones that aren't. But actually, there are two closely related concepts here and we're sometimes going to want to focus on one sometimes the other. So we'll start with this ordinal separability notion, and then we'll define the stronger property of additive separability.
(12:47) So an axiology A is separable if and only if, for any populations X, Y, and Z, X + Z is better than Y + Z according to that axiology. I'm sorry, there should be a subscript A there, if and only if X is better than Y, according to that axiology. So the crucial implication of separability in the sense is that if you're given a choice between populations X and Y, in the presence of the background population Z, a separable axiology allows you to ignore the background population. You don't need to know anything about it other than it's going to be the same whatever you choose versus a non-separable axiology, the features of the background population can matter whether you ought to choose X + Z or Y + Z can depend on the features of Z. Okay. So as I say, there's the stronger property that we're going to be somewhat more focused on which is additive separability. Say that axiology A is evidently separable, or just additive for short, if and only if there are some function that maps the possible welfare levels to the real numbers, such that A can be represented by a value function of the following form where we just sum up overall the welfare levels, the number of individuals at that welfare level in population X(w) f(w), where remember the welfare levels are the numbers we're assuming, and f is just some function that maps, in this case, real numbers to real numbers. So that means we can treat the overall value population as a sum, but not necessarily just the sum of welfare levels. We apply some transformation to the welfare levels.
(14:38) So separability and additive separability or additivity are pretty closely related. And all the axiologies we're going to consider here, satisfy one if and only if they satisfy the other. But they are distinct properties. Additivity is a stronger requirement, so it's worth bearing in mind that there is a gap there. But most axiologies that have been explored in the literature do satisfy one, if and only if they satisfy the other. Okay. So to illustrate what it means or what an additive axiology looks like, and also to set up some of the additive axiologies to which we're going to see various non-additive axiologies converging in the presence of large background populations, I will introduce three kinds of additive axiology in increasing order of generality. So the simplest, which we've already mentioned, is Total Utilitarianism, which just says that the value of population X is the total welfare in X, which we can also express as average welfare times population size [VTU (X) = Tot (X) = X̅ |X|].
(15:44) Generalizing a bit from that we have Critical-Level Utilitarianism, which says basically, there are some critical level c, which is usually but not necessarily taken to be positive greater than zero, such that adding an additional person to the population is only a good thing if they are at above welfare level c. So in other words, for every person we add to the population, rather than just adding their welfare to the overall value of the world, we add their welfare, but then we subtract this critical level threshold c as a penalty for each additional person. And then finally, the next broader category, and it's worth noting this is an idiosyncrasy of the way that we define things that Critical-Level Utilitarianism will be a special case of Prioritarianism. But Prioritarianism in the sense that we understand it will be any axiology that can be expressed by the following value function, where you noticed this looks just exactly like our definition of additive separability in general, except that we add the stipulation that this function tells us how good it is to have an individual at a particular welfare level is strictly increasing. So more welfare is always better and weakly concave. So if someone's better off, then the marginal value of an additional improvement to their welfare decreases or at least it doesn't increase. Okay. So finally, remember, the claim that we're going to establish in a number of cases is that some non-additive axiology in the presence of a large enough background population is going to tell you to add sub population X rather than sub population Y, if and only if X is better than Y, according to some counterpart additive axiology. And that means we can decide what to do even according to our non-additive axiology by applying an additive axiology. So we want to define that carefully, so you know exactly what these results are saying.
(17:58) So say that axiology A converges to another axiology A ̛ relative to background populations of type T, if and only if, for any populations X and Y, if Z is a sufficiently large population of type T, then if X + Z is better than Y + Z according to this axiology A that we're converging with, then that means that X +Z is better than Y + Z, according to axiology A. Now note that we have to include this restriction to background populations of the type T and we'll see what that means going forward. But for instance, we may need to, as we consider larger and larger background populations, hold fixed, for instance, the average welfare level in the background population or the distribution of welfare levels in the background population, to get a defined limit.
(18:53) Now, note that if this axiology that we’re converging to additive, A which in all the cases we're interested in it will be, this thing that we say here is equivalent to the simpler thing on that, if X is better than Y, according to the axiology A ̛ , the additive axiology that we're converging to, then that means X + Z is better than Y + Z according to axiology A. So that means that we want to figure out what to do according to axiology A., we can just look at these two kinds of foreground populations X and Y and evaluate them according to A ̛ . Okay. Yeah. Adding the caveat that we have to know that Z is a large enough background population and we have to know that it's a background population of the right type.
(19:44) Okay. So let's now present some results. So starting with Average Utilitarianism, the simplest case. Average Utilitarianism says that the value of a population is just the average welfare level in that population. And informally, the limit result that we find here is that as the size of the background population goes to infinity, average utilitarianism converges with or asymptotically agrees with a critical-level theory, where the critical level is the average welfare level in the background population. More specifically, we can in fact say, in the case of average utilitarianism, that average utilitarianism is going to agree with critical-level utilitarianism, whenever the size of the background population exceeds this threshold here. So interestingly, in this case, we can say not just this happens in the limit, but we can say when we've reached that limit case, with respect to a particular choice situation. So note that this threshold depends on properties of X and Y. Now, we won't be able to say that for some of the axiologies that we run into later, for instance, the variable value axiologies we have on the next slide. But in this particular case, we can give a threshold. So again, the interesting implication here is if you're an average utilitarian, and you know that there is a large background population that has average welfare level c, then to a good approximation, you can decide what to do by just applying a critical-level axiology with critical level c to the two sub populations that you're choosing between.
(21:27) Okay. So closely related to Average Utilitarianism are these views. They're called Variable Value, introduced by Thomas Hurka in the 80’s and these come in two flavors, roughly increasing order of generality, more or less. So Variable Value I says, where Total Utilitarianism evaluates the population by average welfare times population size, a variable value axiology of this first kind applies a transformation to population size. This transformation ℤ which is strictly increasing, strictly concave and bounded above. So what that means is that if we have, for instance, a population with positive average welfare, then making that population bigger is good. It makes things better, but the marginal value of adding more happy people to the population decreases and decreases to some upper limit. So Variable Value I basically behaves like Total Utilitarianism for small populations, where we can treat g as effectively linear, and then behaves like Average Utilitarianism for large populations where we're close to that horizontal asymptote.
(22:51) And then there's Variable Value II where we just also apply another transformation f to average welfare and the stipulation there is that f is differentiable and also strictly increasing. So the greater average welfare is better. And because both Variable Value I and Variable Value II are averageist in the large population limit, that's supposed to be part of their attraction, it's a little bit unsurprising that in the large background population limit, which is a kind of large population limit, they do just the same thing that Average Utilitarianism does. Namely, they converge with a Critical-Level Theory, where the critical-level is the average welfare level in the background population. Now note, as I said on the last slide, in the case of averageism we can say where that convergence happens, or we can say at what point it's guaranteed that we can just evaluate populations X and Y by applying the Critical-Level Theory here. We can't do that. The easy way to see that is these variable value theories can be arbitrarily close to Totalism for an arbitrarily long time. So there's no threshold beyond which they have to start acting at all like averageism or hence at all like the critical level here.
(24:16) Okay. So that covers Averageism, and Variable Value views. The second category of non additive-axiology we want to talk about is Egalitarianism. So a egalitarians, of course, care about equality. They care about how equally welfare is distributed in the population as a whole. And this naturally motivates violations of separability and as a result violations of affectivity because the impact that some subpopulation has on equality in the population as a whole depends on what the welfare distribution is in the rest of the population. Now, unfortunately, a lot has been said about a egalitarianism in various contexts, but mostly in fixed population contexts. And so we're a little bit in the dark as to what the most plausible variable population egalitarian axiologies are, but there seem to be two major choice points. So one is between ‘Totalist’ or more broadly, one might call ‘Total-ish’ versions of egalitarianism and averageist or average-ish versions of egalitarianism, where roughly the thought is, a Total-ish theory is going to say, if the existing population is good, then adding more and more copies of that existing population makes things better without limit and vice versa, if the existing population is bad and average-ish theories will deny that. And then there's another choice between what we might call ‘Two-factor’ or ‘pluralistic’ egalitarian theories, which basically have some measure of inequality and treat that as a penalty term, where you look at the total or the average welfare in the population and then subtract the penalty for how unequal the distribution of welfare is in that population, versus rank discounting views where what they do is rank everybody in population from worst off to best off and they just give less weight to your welfare, the higher your rank is. It turns out these two ways of representing egalitarian concerns are extensionally equivalent in a fixed population context – anything that you can represent as a two-factor egalitarian view, you can also represent as a rank discounting view, but they come apart. They're no longer equivalent in that way in a variable population context.
(26:40) Okay. So we're going to look at all four cells in this two by two matrix of possible egalitarian views, although we're not going to be able to cover every possible view even within these categories. As we'll see, the average-ish rank discounting view provides an interesting exception to our general results. But starting first with these two-factor views. So to simplify, let's consider just these two simple categories of use, which we can call Totalist Two-Factor Egalitarianism and Averageist Two-Factor Egalitarianism. So a totalist two-factor egalitarian view says that the value for population is given by the total welfare in that population plus this inequality term, where this is some negative penalty for how unequal the distribution of welfare in X is times the population size in X. So the badness of inequality scales with the size of the population. And I, as I've said, is an inequality measure. And then we can define averageist two-factor egalitarian view that measures the value of a population by average welfare plus the inequality penalty term not multiplied by population size. Okay. So to give a convergence result for these kinds of views, remember, when we were talking about averages and variable value views, we had to hold fixed the average welfare in the background populations. We consider larger and larger background populations with the same average welfare here. Because the egalitarian views are sensitive, not just to the average welfare in a population, but to the whole, potentially the whole distribution, we have to hold the distribution fixed. So let's say that the distribution of a population X is just this function that gives the proportion, for any welfare level W , gives the proportion of the population that has at most welfare w. So it's like a cumulative distribution function for welfare in population X.
(29:00) Okay. So then we can get this general result, our Theorem 3 that says, consider a value function of either of these forms, either the totalist or the averagist, two-factor egalitarian value functions and just assume that this inequality function I is a differentiable function of the distribution of X. So the substantive assumption there is that we can tell how unequal a population is just by looking at the distribution of welfare levels, then an axiology A that's represented by any such value function V will converge to an additive axiology with respect to background populations with any given distribution. So what that means is we specify the distribution of welfare in the background population, and then we consider larger and larger background populations that have that same distribution. And in that limit, any totalist or averagist two-factor egalitarian view will converge with some additively separable axiology. But on top of that, suppose that the two-factor egalitarian view we're considering satisfies the following two principles. First, it satisfies Pareto, so improving the welfare of a given individual makes things better, all else being equal. And second, that Pigou-Dalton transfers are weak improvements where that means taking some welfare from a better off person, giving it to a less well off person but only so much that the first person is still better off than the second person after the transfer. So we also assume that those are at least weak improvements, meaning that they don't make things worse and that seems like a characteristic commitment to egalitarianism. So if our egalitarian axiology satisfies those two conditions, then the additive axiology that it converges to will be a form of prioritarianism. And you can see that makes sense. So prioritarianism adds these two extra constraints to just additive separability in general, one that the priority weighting function has to be increasing, so greater welfare is always better and that corresponds to the Pareto principle here. And the second is that the function is concave. So the marginal value of additional improvements to welfare decreases as an individual's absolute welfare level increases, and that corresponds to the desirability of Pigou-Dalton transfers.
(31:33) Okay. So let's illustrate that just with two examples of use of this kind just to see that… Interestingly, when we specify a particular version of egalitarianism, we can say not just that convergence happens in general, we can say what the priority weighting function is, for the prioritarian theory that we converge to in the limit. So here's a totalist version of two-factor egalitarianism that uses as the measure of inequality the mean absolute difference of welfare levels, so the average difference basically between the welfares of any two individuals that you randomly sample from the population. And then just weights that by this term α between 0 and 1/2, which is what makes this axiology Paretian or greater than 1/2 that we would sometimes prefer to level down to reduce an individual's welfare to increase equality. And we can say that, if we have a background population with some fixed welfare distribution D, then in the limit, as that background population gets bigger and bigger, then this MDT axiology, will converge with a version of prioritarianism where the priority weighting function is this, which I won't bother to read out. The interesting thing here just so we can say what the priority weighting function is.
(33:04) And then, to look at an example of an averagist egalitarian view, consider this what we'll call QAM, which just measures the value of a population by the quasi-arithmetic mean of its welfare levels, which is defined as follows. And here, the measure of inequality that we're using implicitly is just average welfare minus quasi-arithmetic mean. And you can think of this as roughly an averageist version of prioritarianism. So yeah, here again unsurprisingly, we've seen that averageism converges and we've seen egalitarian views converge. So again, this QAA view is going to converge with prioritarianism given a large enough background population with a fixed distribution D. And again, we can say what the priority weighting function we'll be.
(34:03) Okay. So that leaves the second big category of egalitarian, or at least allegedly egalitarian views of some dispute over whether we should really think of this as a kind of egalitarianism, but rate-discounting, as we’ve said, the way that this works is you order all the individuals in the population from worst off to best off. So this Xk denotes the welfare of the kth worst off welfare subject in population X. And then we have some function that tells us how much we care about the welfare of the kth worst off individual where f is positive, so we always care to some extent, but it's decreasing. So the better off you are, the less we care about exactly how well off you are. And here again, we have to add a little bit of additional restriction to describe our convergence result that we do get here. So say that an axiology A converges to A ̛ on a set of populations S relative to background populations of type T if and only if any populations X and Y in S blah, blah, blah, the same conditions we gave before. The point is just here, we restrict ourselves to choosing between foreground populations X and Y that are in some specified set, and we'll see on the next slide, why we need to do that.
(35:35) Okay. So the choice point here that corresponds to whether rank-discounted utilitarianism is in the totalist spirit or an averageist spirit is whether this rank-discounting function f or rank-weighting function f decreases to 0, or is bounded above 0. So a bounded rank-discounted utilitarian view, we just take this definition of rank-discounting in general, but then we stipulate the function f that tells you how much you care about the wellbeing of the kth worst off individual is eventually convex and has an asymptote L > 0. So in other words, the weight that you give to better off individuals decreases, but it never decreases below some positive threshold L. And now we need to stipulate the class of populations X and Y within which we get a convergence result. So say that the population X is moderate with respect to a distribution D if the lowest welfare level in X is still higher than some welfare level in D. So this means for our purposes, you're choosing between populations X and Y in the presence of a background population Z. The situation where our results is going to apply is where there are some individuals in that background distribution D who are worst off than anybody in the populations X and Y they are choosing between. So under that condition, we can say that bounded rank-discounted utilitarianism will converge with total utilitarianism given a large enough background population with a fixed distribution D.
(37:21) Now, okay. So the one case that we have left is unfortunately, the trickiest, which is why I've left it for last. So the average-ish version of rank-discounting, there are various ways this could go, but one version of this view that that exists in the literature is what we'll call Geometric rank-discounted utilitarianism, which is defended by Asheim and Zuber in 2014. And their rank-weighting function is just this, you have some constant β in the open interval (0,1) and you just exponentiate it to the power of k, where k is the welfare rank. So as you consider better and better off individuals, you care exponentially or geometrically less and less about them.
(38:10) Now, so broadly, what happens here is that under the same circumstances we described on the last slide, where you have a background population with distribution D and you're choosing between two populations, X and Y that are moderate with respect to D, meaning that the worst off individual in X or Y is still better off than the worst off individual in the background population Z. GRD is eventually going to converge to what we might call a critical-level leximin theory, which means that it cares lexically more about the welfare of less well off people, and also there's some critical levels such that adding someone to the world below that critical level makes things worse, where the critical level is either 0 or it's the maximum welfare in the background population, whichever is greater.
(39:00) So I haven't given this as a formal result because it would require us to introduce some new notation, having to talk about this more in Q & A. The interesting thing here is this critical-level leximin view is not additive in our sense because it can't be represented by a real valued function or additive function, but it is additive in a broader sense if we allow axiologies to be represented by vector-valued rather than real-valued value functions. So it's in a sense, it's a sort of limiting case of this phenomenon that non-additive axiologies converged to additive axiologies in large background population limit. Okay. So we've shown that various non-additive axiologies will converge with some counterpart additive axiology given a large enough background population. So the natural next question is, how large is the actual background population in real-world choice situations? And it's a little bit unclear what we want to say about this question or what would count as a satisfactory answer because, as I said… For instance, in the case of average utilitarianism, we can say something pretty concrete about how big the background population has to be for the limit results to kick in. In the case of other axiologies we can't as much. But what I'm going to say anyway, or argue, is that the background population is at least significantly larger than the present human population and that this at least suggests that our limit results are relevant for many practical purposes.
(40:41) So to start by just giving some numbers, as we know, the number of present humans is roughly 7 billion i.e. 7 x 109. The best, or at least, the most widely cited estimate of the number of human beings who have ever lived is about 1011, so a little bit more than one order of magnitude larger. But of course, humans aren't the only welfare subjects. So we might ask, for instance, about the number of mammals in the world. Now what we know about this is really hazy and uncertain to at least a couple orders of magnitude. Ryan Tomasik has this nice write-up that I've cited here that summarizes what we do know and gives a good sense of the limit to our knowledge. But it seems reasonably safe to say that the number of presently existing mammals is at least on the order of 1011 and if we care about the world as a whole and not just the people who presently exist, then we also want to know about past individuals. We talked about past humans, the number of past mammals, doing some pretty conservative back-of-the-envelope math, just counting since the K–Pg extinction event 66 million years ago and making conservative assumptions about birth and death rates, we get that there are at least 6.6 x 1018 mammals in the timeless population. And then for vertebrates, more generally, so including reptiles and fish, for instance, it looks like there are at least 1013 of those alive now and probably have been at least 5 x 1022 since the Cambrian explosion. And the point to notice here is just all of these are significantly larger to one or up to (I don't know) 10 or 13 orders of magnitude larger than the present human population.
(42:32) Now, as we saw, it's not just the size of the background population that matters. The practical upshots of these limited results if and when they do kick in, depends on the average welfare in the background population and also on the distribution of welfare. And unfortunately, it's just hard to say very much here. There are two hypotheses, I think that naturally suggest themselves. One is that the background population consists mainly of small animals and has an average welfare that's close to zero. Another hypothesis is that the background population mainly consists of members of advanced alien civilizations. So if you think for instance, that a biosphere that doesn't produce a spacefaring civilization is going to produce maybe say using the numbers here, maybe 1023, 1024 individuals in its lifetime, but a spacefaring civilization can produce 1030 or 1040 or 1050 individuals. It doesn't take a lot of spacefaring civilizations for them to dominate the overall population. And then if that's right, given the limits of our present knowledge, all bets are off with respect to average welfare in the background population because who knows what those advanced alien civilizations are like. And I think more generally on either hypothesis, it's pretty hard to say much about the distribution of welfare in the background population, except that there do seem to be non-trivial inequalities. So pet dogs, hopefully, have pretty high welfare, factory-farmed layer hens seem to have pretty poor welfare. So there is some breadth of distribution, but if anybody thinks we can fruitfully say more here, I'd be very interested to hear about that.
(44:13) Okay. So I'm going to just very quickly talk about two objections and, yeah, kind of breezing through these in the interest of time and we could return to these in Q & A. So one objection is… Look, for practical or ethical purposes, why do we even care about the parts of the population we can't affect, I've been calling them the background population. Shouldn't we just focus on the part of the population we can affect? So one way to make this suggestion precise is what Nick Bostrom calls a causal domain restriction, where you evaluate outcomes by just applying some axiology, your preferred axiology A , not to the world as a whole, but only to the part of the world that you can, in principle affect, which presumably means your future light cone. So if we do that, then obviously a lot of this background population goes away, namely all the past individuals, and presumably also aliens in sufficiently faraway galaxies. So three reasons why we're skeptical of this reply. First, it seems to give up a pretty core motivation for consequentialism. Namely, that we have some non-derivative reason to make the world a better place where that means make the world (as a whole) a better place. It seems just harder to explain why we would have non-derivative reason to make our future light cones better places, when that comes apart from making the world as a whole a better place when it involves making the world as a whole a worse place according to your axiology.
(45:44) Secondly, when you combine a non-separable axiology with a causal domain restriction, you get these weird effects like diachronic inconsistency, where I know at time t I'm going to be faced with a choice between adding population X or population Y to the overall population. And given what will be my future light cone at that point, X will be preferable to Y. But from where I am now, Y looks like a better addition than X. And so I would want to constrain my choices. I would pay some price to prevent myself from choosing acts in the future. And similarly, you can get conflicts between agents who share the same axiology. Hurka gives one of these diachronic cases, by the way, in this 1982 paper, More Average Utilitarianisms. And so all of this stuff just seems pretty weird. Although maybe it's a matter of taste, how bad do you think those implications are.
(46:37) And then finally, maybe the most important thing is that it seems like even if we adopt a causal domain restriction, much of the population that's inside our future light cones can pretty plausibly be treated as background for purposes of most decisions. So most of the choices we make, even most of the ethically important choices, like where to donate money, for instance, only affect a small part of even the present human population or the present population of welfare subjects. And so it seems like the part of the population that we don't affect even inside our future light cone is going to be usually several orders of magnitude bigger than the part that we do affect, or at least that we can affect by these short-termist interventions. Another response is… Look, these large numbers that I gave for the size of the background population coming from thinking about, for instance, mammals or vertebrates for those populations are probably largely made up by, say, small mammals like rodents or by small fish. But even if you think that they count as welfare subjects, maybe you might think that they in some sense count for less and more specifically, that they should make a different, a smaller contribution to the size of a population as we measure it for axiological purposes. So we could apply these axiological weights where, for instance, if we're average utilitarians the way we calculate the size of the population, the thing that we divide total welfare by to get the average is not just the number of individuals but the sum of their axiological weights.
(48:14) So a couple of things to say here: number one in the paper, we actually try to do this in what we think is a fairly conservative way, where we only count mammals as welfare subjects and we take mice as a proxy for all non-human mammals and we just count both by lifespan and by cortical neuron count and we still get a discounted population of past and present non-human animals that's at least 2.3 x 1013. So still significantly larger than the present human population. Also, we already saw the past human population is fairly large, and again, there's this point that much of the present and near future human population can plausibly be treated as background for a lot of decision making purposes. Okay. So all that is to say there's at least a large enough population that it makes sense to treat as background that it's plausible to think that these limit results are practically significant for agents who accept non-additive axiologies. Now, we've been struggling with how to make this point in an interestingly general way. And this is something where we could really use input if you can see a good way of doing that. But just very simply, for now, to give a stylized example of these limit results in practice, let's return to that case that we started off with, where we imagine there were 1010 existing people. They all have welfare 1 and we can either (O1) leave things unchanged or (O2) improve all the existing people to welfare 2, or (O3) we can create 1032 new people with welfare 1.5. And remember, total utilitarianism in this case prefers O3, whereas average utilitarianism prefers O2. But now if we assume that there's a background population of 1013 and using here one of the more conservative numbers what the size of the real world background population might be. And then all we have to assume is that the average welfare of welfare subjects within that background population is anything non-trivially less than 1.5. Then it's going to turn out that AU now agrees with totally utilitarianism and prefers O3. In fact, in this case we can say, by extension, so do VV1, VV2 and various averageist-flavored egalitarian theories, which wouldn't have before.
(50:34) So we do get, at least in this narrow stylized case, a pretty conservative assumption about the size of the background population is enough to make our limited results practically impactful here. And to rescue the conclusion that the option that affects the astronomically larger number of people is the more important thing, makes a larger difference to the overall value of the world. And yeah, just to reinforce that point, here I made the assumption that average welfare is non-trivially less than 1.5. If we just assume it's non-trivially different than 1.5, either greater or less, the more general conclusion we reach is that average utilitarianism and related theories will now agree that O3 makes a bigger difference to the overall value of the world, either for better or worse than O2. So we're at least on the road to recovering these arguments from astronomical scale.
(51:30) Okay. Let me very quickly mention three other conclusions on that we talk about in the paper, and again, these would be good fodder for Q & A if people want to get into them more. One is, and this is a conclusion that's already been noted in independent work by Mark Budolfson and Dean Spears, that the presence of background populations makes it harder for various axiologies to avoid this generalization of Parfit’s repugnant conclusion, where there are some pre-existing background population. And the question is, is it better to add a moderate number of people with very high welfare or an enormous number of people with lives barely worth living. And the repugnant addition problem, in the presence of large background populations, a very wide range of axiologies that can avoid the original repugnant conclusion, will endorse repugnant addition.
(52:27) Second point – we think that these results have some implications for infinite ethics. So infinite ethics is the task of evaluating and saying what agents should do in the context of infinite worlds with infinitely many welfare subjects. And there's pretty large literature on this, but that literature gives a lot of recipes for generalizing additive axiologies from finite to infinite populations but have very little to say so far about non-additive axiologies. And our results at least suggest a partial way of generalizing non-additive axiologies to infinite worlds, namely, when you're in an infinite world that you can only affect a finite sub population, then you're effectively not just approaching but in this limit that our limit results describe, there's an infinitely large background population. And so it seems like we can just replace non-additive axiologies with your additive limit theories and we then have these recipes from infinite ethics for generalizing those additive theories to the infinite context. Now, there are some problems here. We have to be able to talk about, for instance, the average welfare in an infinite population and the distribution of welfare levels. It's tricky to know how to do that, but there's at least a pathway here that seems worth exploring.
(53:42) And then the final thing is that these results suggest the fun possibility of manipulation. So suppose you have some average utilitarians over here and some total utilitarians over there. The total utilitarians would like the average utilitarians to act like total utilitarians. Well, one way they can make that happen is just by producing a very large number of welfare subjects with near zero welfare e.g. breeding trillions and trillions of Etruscan shrews, which I gather are the smallest mammals on Earth, and by creating that very large population, that background from the point of view of the averagist, the averageists now have to act more and more like totalists. So how practically significant that is, we don't know. Maybe not very, but then it seems fun and interesting.
(54:30) Okay. So to recap, we found that when there's a large enough background population of welfare subjects that are unaffected by our choices, a range of non-additive axiologies can converge in practice with some additive axiology. I at least made the argument that the real world background population is large enough that it's plausible that these results have some practical significance, and if they do, maybe the most important implication is that arguments from astronomical scale and hence the case for longtermism and maybe more particularly the case for existential risk mitigation is less dependent on assumptions of additive separability than we might have thought.
(55:11) Now there's a lot of questions that we haven't addressed in this area that I'd be very interested to see other people get into, or maybe we'll get into at some point. One, of course, is giving a better characterization of what the real-world background population is like, which I've only said a very little bit about. The other big obvious thing is we haven't yet considered decisions under risk or uncertainty. And in particular, when you're making choices with a very large background population, but you're uncertain about the size of the welfare distribution in the background population. It's not at all clear how that would go. So that seems like another interesting topic for research, if people want to take that up. So I'll leave it at that and look forward to your questions.