Owen Cotton-Barratt and Daniel Dewey
There’s been some discussion lately about whether we can make estimates of how likely efforts to mitigate existential risk from AI are to succeed and about what reasonable estimates of that probability might be. In a recent conversation between the two of us, Daniel mentioned that he didn’t have a good way to estimate the probability that joining the AI safety research community would actually avert existential catastrophe. Though it would be hard to be certain about this probability, it would be nice to have a principled back-of-the-envelope method for approximating it. Owen actually has a rough method based on the one he used in his article Allocating risk mitigation across time, but he never spelled it out.
It goes like this.
First, estimate the total existential risk associated with developing highly capable AI systems, bearing in mind all of the work on safety that will be done.
Now estimate the size of the research community working on safety by the time we develop those potentially risky AI systems. This number should include researchers who are not directly focused on AI safety, but who nevertheless make some fractional contribution relative to a full-time safety researcher; for example, if 10 AI capability researchers will each contribute the equivalent of 10% of a full-time AI safety researcher, they would collectively add 1 “member” to the research community.
Now estimate the effect of adding a researcher to the community now (& for their career), in terms of the total number of researchers that will be added to the eventual community. This could be less than one if you think they will displace people who would go in later, or more than one if you think they will add momentum to the field. Again, if adding a researcher to the community now results in some AI capability researchers focusing more or less of their time on safety-relevant research, this number should count these fractional researchers added or subtracted as well.
Now suppose that in a heroic effort we managed to double the total amount of work that would be done on AI safety. What percentage of the bad scenarios should we expect this to avert?
(Results will show here once you’ve made your selections, and will update if you change them.)
Then adding a career (of typical quality for the area) to the field now adds about people to the field total, which is % of the total. This is about % of the amount that would be needed to double the field, so we should expect it to avert about % of the bad outcomes, which is a total chance of 1 in of averting existential catastrophe.
So, what does this mean? Obviously this is quite a crude method, and some of the variables you have to estimate are themselves quite tricky to get a handle on, but we think they’re more approachable than trying to estimate the whole thing directly, and expect the answer to be within a few orders of magnitude of correct.
How big does the number have to be to imply that a career in AI safety research is one of the best things to do? One natural answer is to multiply out by the number of lives we expect we could get in the future. I think this is understandable, and worth doing as a check to see if the whole thing is dominated by focusing on the present, but it’s not the end of the story. Other ways of influencing the future may look better. These might be other ways of reducing existential risk (for example a career of work in asteroid detection may have averted something on the order of a 1 in 100 million chance of existential catastrophe), or rather different methods.
A general argument is that there are fewer than 10 billion people alive today, and collectively it seems like we may have a large amount of influence over the future. Therefore if your estimate for the likelihood of a career in AI safety looks much worse than a 1 in 10 billion chance, it seems likely that there are other more promising ways — perhaps much more promising — to productively use your share of that influence. We cannot give a bound above which AI safety is certainly the best thing to do, as this question is sensitive to the value of the other specific options available. The method we’ve used here could also be modified to estimate the value of joining communities working on other existential risks, or perhaps other interventions that change the eventual size or productivity of the AI research community, for example through outreach, funding, or field-steering work.
Thanks to Nick Beckstead and Carl Shulman for comments that led to clarifications in this post.