This is a guide to research on the problem of preventing significant accidental harm from superintelligent AI systems, designed to make it easier to get started on work in this area and to understand how different kinds of work could help mitigate risk. I’ll be updating this guide with a longer reading list and more detailed information about the three areas – keep an eye on this site and @danieldewey to find out when updates are posted!
What is the superintelligence control problem?
Though there are fundamental limits imposed on the capabilities of intelligent systems by the laws of physics and computational complexity, human brains and societies of human brains are probably far from these limits. It is reasonable to think that ongoing research in AI, machine learning, and computing infrastructure will eventually make it possible to build AI systems that not only equal, but far exceed human capabilities in most domains. Current research on AI and machine learning is at least a few decades from this degree of capability and generality, but it would be surprising if it were not eventually achieved.
Superintelligent systems would be extremely effective at achieving tasks they are set – for example, they would be much more efficient than humans are at interpreting data of all kinds, refining scientific theory, improving technologies, and understanding and predicting complex systems like the global economy and the environment (insofar as this is possible). Recent machine learning progress in natural language, visual understanding, and from-scratch reinforcement learning highlights the potential for AI systems to excel at tasks that have traditionally been difficult to automate. If we use these systems well, they will bring enormous benefits – even human-like performance on many tasks would transform the economy completely, and superhuman performance would extend our capabilities greatly.
However, superintelligent AI systems could also pose risks if they are not designed and used carefully. In pursuing a task, such a system could find plans with side-effects that go against our interests; for example, many tasks could be better achieved by taking control of physical resources that we would prefer to be used in other ways, and superintelligent systems could be very effective at acquiring these resources. If these systems come to wield much more power than we do, we could be left with almost no resources. If a superintelligent AI system is not purposefully built to respect our values, then its actions could lead to global catastrophe or even human extinction, as it neglects our needs in pursuit of its task. The superintelligence control problem is the problem of understanding and managing these risks. Though superintelligent systems are quite unlikely to be possible in the next few decades, further study of the superintelligence control problem seems worthwhile.
There are other sources of risk from superintelligent systems; for example, oppressive governments could use these systems to do violence on a large scale, the transition to a superintelligent economy could be difficult to navigate, and some advanced AI systems themselves could turn out to be moral patients. These risks are also worth studying, but seem superficially to be more like the risks caused by artificial intelligence broadly speaking (e.g. risks from autonomous weapons or unemployment), and seem fairly separate from the superintelligence control problem.
Structure of this guide
In order to understand, plan, and communicate about research on the superintelligence control problem, I’ve found it helpful to split the field into three main areas, corresponding to the kinds of questions that each piece of work is ultimately trying to answer and the ways that those answers will help to mitigate risk. Thinking about these areas also helps me to keep my bearings and to understand how specific ideas fit into the broader picture. For the moment, I think there are three usefully distinct areas of research: technical foresight, strategy, and system design.
- Technical foresight: by understanding the potential properties of superintelligent machines as precisely and methodically as possible, this work helps us to understand and communicate about the risks we could face. This understanding can be used to inform requirements and options in strategy and system design; can be used to gain support for further research investment and implementation of strategies; and can attract people with high rigor requirements to the field, especially if good methodologies and standards are used for the work.
- Strategy: it is not yet clear how we as a society can best mitigate the risk of significant accidental harm from superintelligent AI systems; more strategic work is needed on this question. This work can also give us perspective to help steer superintelligence theory and system design work, by clarifying what the most relevant questions in those areas are likely to be. It can also attract resources and people who currently don’t see any way to effectively mitigate risks.
- System design: system design work’s direct impact comes through enabling us to implement strategies that require particular technical capabilities, such as construction of reliable superintelligent machines or containment of potentially superintelligent machines. This object-level knowledge could be especially impactful if machine superintelligence arrives surprisingly early. System design and engineering research can also inform policy requirements by showing what kinds of options are more or less likely to be feasible, and has been successful in attracting people and resources for further research.
Two considerations that have been brought up by Ord and Cotton-Barratt suggest that research done early can have its largest impact through indirect effects, by (i) increasing the resources available in the future and “steering” those future resources into more valuable lines of inquiry (as opposed to by solving object-level problems), and (ii) by addressing risks that would arise if superintelligent machines were developed surprisingly early. Generally speaking, I think that research in each area has the potential to steer and provide resources and people for future research in that area simply by being conducted and communicated well.
Most work on significant accidental harm from superintelligent AI systems has been done by existential risk researchers, a group of philosophers, mathematicians, and computer scientists concerned with disasters that could greatly reduce humanity’s long-term potential; this is why most of the work I will cite in this research guide is from this group instead of from the broader computer science community. However, there is increasing interest in this problem in mainstream AI and machine learning, and lots of more mainstream work on reliability and transparency of AI and ML methods is relevant to control of superintelligent systems.
Since this area is not very mature, it’s not easy at the moment to study it exclusively at a graduate level. For students interested in the superintelligence control problem, I would recommend a Ph.D. in machine learning, artificial intelligence, or computer science, during which you can keep up-to-date on the literature and get connected with others interested in long-term AI risks; see also this guide from 80,000 Hours on careers in AI risk research.
What are the likely properties of various types of AI systems?
Technical foresight work can tell us what kinds of behaviors and abilities we should expect from different kinds of AI systems, and therefore what risks we might face from these systems. It can also tell us what useful properties of those machines we might be able to make use of in order to mitigate risk. Its most important impact may be in its tendency to steer future work and resources in the directions most likely to address the most important problems. A few of the most significant pieces of work in this area are:
Existential risk from AI – Yudkowsky’s and Bostrom’s arguments that superintelligent machines would pose an existential risk (a threat to humanity’s long-term potential);
- Superintelligence: Paths, Dangers, Strategies, Bostrom 2015
- Reducing Long-Term Catastrophic Risks from AI, Yudkowsky et al. 2010
- Some Moral and Technical Consequences of Automation, Wiener 1960
The orthogonality thesis – Yudkowsky’s and Bostrom’s arguments that the intelligence of a machine and its goals are mostly independent, i.e. that superintelligent machines could be built to pursue almost any goal;
- AI as a Positive and Negative Factor in Global Risk (Section 5), Yudkowsky 2008
- The Superintelligent Will (Section 1), Bostrom 2012
Convergent instrumental goals – Omohundro’s and Bostrom’s arguments that superintelligent agents will have certain predictable subgoals (control of resources, self-improvement, etc.) regardless of their final goal, since these subgoals are useful for a broad range of goals;
- The Nature of Self-Improving Artificial Intelligence, Omohundro 2007
- The Superintelligent Will (Section 2), Bostrom 2012
Intelligence explosion – Good’s and Chalmers’ arguments that superintelligent machines should be capable of repeatedly improving their own intelligence, possibly resulting in a rapid increase in machine intelligence;
- Speculations Concerning the First Ultraintelligent Machine (Section 2), Good 1964
- The Singularity: A Philosophical Analysis, Chalmers 2010
- Intelligence Explosion Microeconomics, Yudkowsky 2013
- Superintelligence: Paths, Dangers, Strategies (Chapter 4), Bostrom 2015
- How to Create An Intelligence Explosion – And How to Avoid One, Dietterich 2015
- The slowdown hypothesis, Plebe and Percoti 2012
Further technical foresight work could find new properties of different types of AI systems, or could use new methods to debunk or better understand properties that have already been argued for. Understanding the behaviors and abilities of future AI systems should help us to anticipate and communicate about potential risks, understand requirements and options in strategy and system design, and gain support for further research and implementation of strategies.
What should we do in response to the possibility of significant accidental harm from superintelligent AI systems?
Strategy work is a mostly non-technical area of research: how could we, as a society, best mitigate risks of significant accidental harm superintelligent AI? Work that falls in this area is very diverse; a rough subdivision of some existing work is useful, though most strategic work cuts across more than one of these areas.
Clarification of the strategic situation – What are the important details of the strategic situation we find ourselves in? There is significant overlap here with technical foresight (for example, whether and how fast intelligence explosion could proceed is strategically important); other questions include how soon we should expect different kinds of AI capabilities, what history can tell us about similar past situations, or what the game-theoretic structure of situations we could encounter are, among other questions. A few examples:
- Predicting AGI, Fallenstein and Mennen 2013
- When will AI be created?, Muehlhauser 2013
- AI timelines (collection of posts), Grace and Christiano 2014–2015
- Racing to the Precipice: a Model of AI Development, Bostrom and Armstrong 2013
- Superintelligence: Paths, Dangers, Strategies (Chapters 11, 14, 15), Bostrom 2015
Developing response strategies – Some work on strategy seeks to develop or critique high-level plans to mitigate risk. Past work in this area has asked how we should prioritize and act now in order to best mitigate future risks, as well as what our longer-term strategies and desired outcomes might be.
- The timing of labour aimed at reducing existential risk, Ord 2014
- Allocating risk-mitigation across time, Cotton-Barratt 2015
- Long-term strategies for ending existential risk from fast takeoff, Dewey 2015
- Superintelligence: Paths, Dangers, Strategies (Chapter 14), Bostrom 2015
Strategy in specific domains – In order to develop high-level strategies into achievable ones, it’s helpful to consider certain key domains in more detail. Three very important domains are policy and governance (how are governments likely to react to superintelligence, and how could they best mitigate risk?), scientific community-building (how can we build a good field and culture of risk research?), and technical research agenda-setting. There is considerable overlap between technical research agenda-setting and system design, but insofar as a technical agenda implies a particular plan, it should be subject to strategic considerations.
- The Asilomar Conference: A Case Study in Risk Mitigation, Grace 2015
- Regulating Artificial Intelligence Systems: Risks, Challenges, Competencies, and Challenges, Scherer 2015 (though this paper seems more aimed at issues in AI other than the superintelligence control problem)
- Research priorities for robust and beneficial AI, Russell et al. 2015
- Long-Term and Short-Term Challenges to Ensuring the Safety of AI Systems, Steinhardt 2015
- Aligning Superintelligence with Human Interests: A Technical Research Agenda, Soares and Fallenstein 2014 (Most strategic considerations are in sections 1 and 5, with the rest of the paper fleshing out proposed technical problems)
Strategy is very broad, and calls for theoretical and empirical work across many areas. There are probably large parts of strategy that haven’t been explored at all yet, and the sub-areas I’ve given are probably not exhaustive (and certainly not mutually exclusive!).
How can we design and build reliable superintelligent systems?
In many plausible situations, we could decrease risk of significant accidental harm by making AI systems more predictable, more error-tolerant, more transparent, and more robustly aligned with our own goals and values (“value alignment”). Much of this work is currently focused on making these informal properties precise enough to study in a rigorous way, rather than on implementing them on today’s systems. Most of the current work at the Machine Intelligence Research Institute (MIRI) is in this style, as is Christiano’s work on AI systems that choose actions that a human would approve of, and work by Bostrom and others on system designing ways to precisely specify desirable goals or ways to learn those goals from humans. Some examples of work in this area:
- Algorithms for Inverse Reinforcement Learning, Ng and Russell 2000
- Approval-directed agents, Christiano 2015; The Steering Problem, Christiano 2015; Ambitious vs. narrow value learning, Christiano 2015; AI Control article series
- The Value Learning Problem, Soares 2015; Corrigibility, Soares et al. 2015; Aligning Superintelligence with Human Interests: A Technical Research Agenda, Soares and Fallenstein 2014; other papers in MIRI’s research agenda
- Superintelligence: Paths, Dangers, Strategies (Chapters 9, 10, 13), Bostrom 2015
Some recently begun research focuses on present day AI and machine-learning techniques, aiming to extend them in ways that will make them more amenable to reliable understanding and alignment as they become more powerful; the Future of Life Institute’s grant program, funded by Elon Musk, funds some projects in this area.
Like strategy, system design work probably contains many families of questions that haven’t even been considered yet, corresponding to alternate approaches to reliable AI systems or to technical systems needed for different strategies. Some examples of work on topics that have not yet been explored much:
- MDL Intelligence Distillation: Exploring strategies for safe access to superintelligent problem-solving capabilities, Drexler 2015
- Thinking inside the box: using and controlling an Oracle AI, Armstrong et al. 2012
- Leakproofing the Singularity, Yampolskiy 2012
This article has reviewed three areas of research on the superintelligence control problem: technical foresight, strategy, and system design. Research in all of these areas is relatively early, and further progress may show that our current ideas about risks from superintelligent AI systems are incorrect; however, given our limited understanding of the potential impacts of future developments of AI, additional research seems well worthwhile.
We plan to release further installments in this series here on the Global Priorities Project blog, giving more details about each area of research, career prospects in this area, and a more complete reading list. Please feel free to contact me at firstname.lastname@example.org. If you have feedback on this post, I would love to hear it!