A non-technical introduction to AI risk

In the summer of 2008, experts attending the Global Catastrophic Risk Conference assigned a 5% probability to the human species’ going extinct due to “superintelligent AI” by the year 2100. New organizations, like the Centre for the Study of Existential Risk and the Machine Intelligence Research Institute, are springing up to face the challenge of an AI apocalypse. But what is artificial intelligence, and why do people think it’s dangerous?

As it turns out, studying AI risk is useful for gaining a deeper understanding of philosophy of mind and ethics, and a lot of the general theses are accessible to non-experts. So I’ve gathered here a list of short, accessible, informal articles, mostly written by Eliezer Yudkowsky, to serve as a philosophical crash course on the topic. The first half will focus on what makes something intelligent, and what an Artificial General Intelligence is. The second half will focus on what makes such an intelligence ‘friendly‘ — that is, safe and useful — and why this matters.


Part I. Building intelligence.

An artificial intelligence is any program or machine that can autonomously and efficiently complete a complex task, like Google Maps, or a xerox machine. One of the largest obstacles to assessing AI risk is overcoming anthropomorphism, the tendency to treat non-humans as though they were quite human-like. Because AIs have complex goals and behaviors, it’s especially difficult not to think of them as people. Having a better understanding of where human intelligence comes from, and how it differs from other complex processes, is an important first step in approaching this challenge with fresh eyes.

1. Power of Intelligence. Why is intelligence important?

2. Ghosts in the Machine. Is building an intelligence from scratch like talking to a person?

3. Artificial Addition. What can we conclude about the nature of intelligence from the fact that we don’t yet understand it?

4. Adaptation-Executers, not Fitness-Maximizers. How do human goals relate to the ‘goals’ of evolution?

5. The Blue-Minimizing Robot. What are the shortcomings of thinking of things as ‘agents’, ‘intelligences’, or ‘optimizers’ with defined values/goals/preferences?

Part II. Intelligence explosion.

Forecasters are worried about Artificial General Intelligence (AGI), an AI that, like a human, can achieve a wide variety of different complex aims. An AGI could think faster than a human, making it better at building new and improved AGI — which would be better still at designing AGI. As this snowballed, AGI would improve itself faster and faster, become increasingly unpredictable and powerful as its design changed. The worry is that we’ll figure out how to make self-improving AGI before we figure out how to safety-proof every link in this chain of AGI-built AGIs.

6. Optimization and the Singularity. What is optimization? As optimization processes, how do evolution, humans, and self-modifying AGI differ?

7. Efficient Cross-Domain Optimization. What is intelligence?

8. The Design Space of Minds-In-General. What else is universally true of intelligences?

9. Plenty of Room Above Us. Why should we expect self-improving AGI to quickly become superintelligent?

Part III. AI risk.

In the Prisoner’s Dilemma, it’s better for both players to cooperate than for both to defect; and we have a natural disdain for human defectors. But an AGI is not a human; it’s just a process that increases its own ability to produce complex, low-probability situations. It doesn’t necessarily experience joy or suffering, doesn’t necessarily possess consciousness or personhood. When we treat it like a human, we not only unduly weight its own artificial ‘preferences’ over real human preferences, but also mistakenly assume that an AGI is motivated by human-like thoughts and emotions. This makes us reliably underestimate the risk involved in engineering an intelligence explosion.

10. The True Prisoner’s Dilemma. What kind of jerk would Defect even knowing the other side Cooperated?

11. Basic AI drives. Why are AGIs dangerous even when they’re indifferent to us?

12. Anthropomorphic Optimism. Why do we think things we hope happen are likelier?

13. The Hidden Complexity of Wishes. How hard is it to directly program an alien intelligence to enact my values?

14. Magical Categories. How hard is it to program an alien intelligence to reconstruct my values from observed patterns?

15. The AI Problem, with Solutions. How hard is it to give AGI predictable values of any sort? More generally, why does AGI risk matter so much?

Part IV. Ends.

A superintelligence has the potential not only to do great harm, but also to greatly benefit humanity. If we want to make sure that whatever AGIs people make respect human values, then we need a better understanding of what those values actually are. Keeping our goals in mind will also make it less likely that we’ll despair of solving the Friendliness problem. The task looks difficult, but we have no way of knowing how hard it will end up being until we’ve invested more resources into safety research. Keeping in mind how much we have to gain, and to lose, advises against both cynicism and complacency.

16. Could Anything Be Right? What do we mean by ‘good’, or ‘valuable’, or ‘moral’?

17. Morality as Fixed Computation. Is it enough to have an AGI improve the fit between my preferences and the world?

18. Serious Stories. What would a true utopia be like?

19. Value is Fragile. If we just sit back and let the universe do its thing, will it still produce value? If we don’t take charge of our future, won’t it still turn out interesting and beautiful on some deeper level?

20. The Gift We Give To Tomorrow. In explaining value, are we explaining it away? Are we making our goals less important?

In conclusion, a summary of the core argument: Five theses, two lemmas, and a couple of strategic implications.


If you’re convinced, MIRI has put together a list of ways you can get involved in promoting AI safety research. You can also share this post and start conversations about it, to put the issue on more people’s radars. If you want to read on, check out the more in-depth articles below.


Further reading

How to be a god

What are gods?

In some ways, the question is more important for atheists than for theists. If I’m a theist, after all, I don’t need to understand what it takes to be a ‘god’ in general; I just need to know that my pick of the litter is a bona fide god. It’s the atheists who must speak in broad strokes of all gods, in order for their chosen self-label to even be contentful. These six criteria do the job of pinpointing gods quite well:

1. They are people, capable of thought and action.

2. As a rule, they are (or normally appear) roughly human-shaped and human-sized.

3. They can sometimes be weakened or killed, but by nature they are significantly more powerful and long-lived than a human.

4. As a rule, they are associated with some domain they have dominion or strong influence over, be it a place, a natural phenomenon, or an abstract category.

5. They are natural objects of worship.

6. They are in some way ‘magical‘ or ‘supernatural’.

Within a religion, the gods often form a natural grouping. So, as an added complication, otherwise similar beings may be construed as non-divine if they lack a certain ability or lineage that unifies a tradition’s paradigmatic deities.

Borrowing from E.B. Taylor’s 1871 Primitive Culture, many anthropologists have distinguished ‘god’ worship (theolatry) from ‘spirit’ worship (animism). However, this division, predicated on the theory that cultures naturally ‘evolve’ from animism to polytheism to monotheism, has increasingly fallen out of favor. Cultures have often been labeled ‘animistic’ more because they were seen as ‘primitive’ than because they unambiguously saw everything as intelligent or alive. And even world-views with myriad all-pervading minds can easily blend into world-views with a single all-pervading super-mind. (Compare brahman in India.)

If we do wish to distinguish lesser spirits and monsters from gods, it will need to be because the former are weaker, or less authoritative, or lack human form, or just have a different lineage and name. However, none of these is sufficient on its own. The classification we use will always be arbitrary to some extent, and will depend on the structure of the overall belief system.

Not amused.

Gods interact with humans, and with each other, leading to myths and communicative or transactional rituals. Gods can become human, and humans can become gods, so the human/god distinction can become just as fraught as spirit/god one.

The requirement that gods be ‘natural objects of worship’ distinguishes them from magical beings that lack the power, authority, or virtue true reverence demands. For instance, powerful daemonic spirits may fail to be gods if they cannot or ought not be worshiped, but exist instead to be opposed, manipulated, or befriended. However, some traditions recognize evil gods, and most traditions fail to clearly distinguish religious worship from magical manipulation, so the line between the divine and the demonic is again fuzzy.

Linguistic and conceptual divides mean that lots of interpretive work is required to find a common classification for legendary beings across religious traditions. To see how this works in practice, I’ll present four examples, which I encourage you to treat with some skepticism.

Abrahamism: Early Judaism was henotheistic, believing in many gods but worshiping only one, Yahweh. It shifted from polytheism to monotheism as it came to see rival magical beings as increasingly unworthy even of foreign worship, identifying gods with evil spirits. At the same time, Yahweh’s heavenly court of gods became identified with Yahweh or with his angelic messengers. And the angels themselves, initially treated as manifestations or incarnations of Yahweh, lost their divine status and became lesser spirits. In Christianity, this process worked both ways, as Jesus acquired a status similar to the original angels’.

Buddhism: Buddhas — particularly in their ‘truth body’ (dharmakāya) — are often ascribed attributes similar to the modern Abrahamist God, whereas devas resemble the gods of Greek mythology. (Following the Hindu Paranas, a distinct group of gods, the asuras, were seen as lesser wicked spirits.) However, the word ‘god’ is restricted to the devas so as to highlight buddhas’ unusual features. Upper case ‘God’ is generally reserved for a creator deity like Brahmā or Īshvar. Since Buddhists believe the world has no beginning, it can be said that they deny ‘God’ (issara) but accept ‘gods’ (devas).

Zoroastrianism: Here, the Indian progression is reversed. The supreme god is an asura (Ahura Mazda), while the daēvas are wicked lesser gods, eventually downgraded to demons and ogres. Such developments are often unpredictable or historically contingent; in Germanic religions, asuras again became the ruling gods, the æsir.

Raëlism: This UFO religion, founded in 1974, has no creator gods, spirits, magic, or souls. It is physicalistic, atheistic, and emphasizes science over the supernatural.  The existence of such religions suggests that skeptical and antireligious movements shouldn’t narrowly focus on ‘atheism’. However, Raëlians do believe that we were intelligently designed by a powerful, benevolent alien race, the ‘Elohim’.

Simulation hypotheses posit even more extraordinary creators than Raëlism does. They suggest that we are in a Matrix-like virtual reality, meaning that our entire universe is the product of a transcendent designer. Yet we do not ordinarily think of powerful aliens or computer programmers as ‘gods’. What sets them apart will ultimately rest on the most important criterion I haven’t talked about here, the ‘magical’ or ‘supernatural’ element. Defining the ‘natural’ is a very difficult and messy task, far more problematic than any of the issues I’ve raised above. That story will have to wait for another post.