How much does work in AI safety help the world?

Owen Cotton-Barratt and Daniel Dewey

There’s been some discussion lately about whether we can make estimates of how likely efforts to mitigate existential risk from AI are to succeed and about what reasonable estimates of that probability might be. In a recent conversation between the two of us, Daniel mentioned that he didn’t have a good way to estimate the probability that joining the AI safety research community would actually avert existential catastrophe. Though it would be hard to be certain about this probability, it would be nice to have a principled back-of-the-envelope method for approximating it. Owen actually has a rough method based on the one he used in his article Allocating risk mitigation across time, but he never spelled it out.

It goes like this.

First, estimate the total existential risk associated with developing highly capable AI systems, bearing in mind all of the work on safety that will be done.

Now estimate the size of the research community working on safety by the time we develop those potentially risky AI systems. This number should include researchers who are not directly focused on AI safety, but who nevertheless make some fractional contribution relative to a full-time safety researcher; for example, if 10 AI capability researchers will each contribute the equivalent of 10% of a full-time AI safety researcher, they would collectively add 1 “member” to the research community.

Now estimate the effect of adding a researcher to the community now (& for their career), in terms of the total number of researchers that will be added to the eventual community. This could be less than one if you think they will displace people who would go in later, or more than one if you think they will add momentum to the field. Again, if adding a researcher to the community now results in some AI capability researchers focusing more or less of their time on safety-relevant research, this number should count these fractional researchers added or subtracted as well.

Now suppose that in a heroic effort we managed to double the total amount of work that would be done on AI safety. What percentage of the bad scenarios should we expect this to avert?

(Results will show here once you’ve made your selections, and will update if you change them.)

Posted in Existential Risk, Prioritisation research and tagged , , .


  1. “Therefore if your estimate for the likelihood of a career in AI safety looks much worse than a 1 in 10 billion chance, it seems likely that there are other more promising ways to productively use your share of that influence.”
    Maybe I just haven’t had enough caffeine today, but I don’t find that completely clear.

    • It’s quite a rough argument, not a logical entailment.

      I actually think the threshold of “better options exist” comes significantly earlier than 1-in-10 billion. I’d be interested in arguments against!

Leave a Reply

Your email address will not be published. Required fields are marked *