A non-technical introduction to AI risk

In the summer of 2008, experts attending the Global Catastrophic Risk Conference assigned a 5% probability to the human species’ going extinct due to “superintelligent AI” by the year 2100. New organizations, like the Centre for the Study of Existential Risk and the Machine Intelligence Research Institute, are springing up to face the challenge of an AI apocalypse. But what is artificial intelligence, and why do people think it’s dangerous?

As it turns out, studying AI risk is useful for gaining a deeper understanding of philosophy of mind and ethics, and a lot of the general theses are accessible to non-experts. So I’ve gathered here a list of short, accessible, informal articles, mostly written by Eliezer Yudkowsky, to serve as a philosophical crash course on the topic. The first half will focus on what makes something intelligent, and what an Artificial General Intelligence is. The second half will focus on what makes such an intelligence ‘friendly‘ — that is, safe and useful — and why this matters.


Part I. Building intelligence.

An artificial intelligence is any program or machine that can autonomously and efficiently complete a complex task, like Google Maps, or a xerox machine. One of the largest obstacles to assessing AI risk is overcoming anthropomorphism, the tendency to treat non-humans as though they were quite human-like. Because AIs have complex goals and behaviors, it’s especially difficult not to think of them as people. Having a better understanding of where human intelligence comes from, and how it differs from other complex processes, is an important first step in approaching this challenge with fresh eyes.

1. Power of Intelligence. Why is intelligence important?

2. Ghosts in the Machine. Is building an intelligence from scratch like talking to a person?

3. Artificial Addition. What can we conclude about the nature of intelligence from the fact that we don’t yet understand it?

4. Adaptation-Executers, not Fitness-Maximizers. How do human goals relate to the ‘goals’ of evolution?

5. The Blue-Minimizing Robot. What are the shortcomings of thinking of things as ‘agents’, ‘intelligences’, or ‘optimizers’ with defined values/goals/preferences?

Part II. Intelligence explosion.

Forecasters are worried about Artificial General Intelligence (AGI), an AI that, like a human, can achieve a wide variety of different complex aims. An AGI could think faster than a human, making it better at building new and improved AGI — which would be better still at designing AGI. As this snowballed, AGI would improve itself faster and faster, become increasingly unpredictable and powerful as its design changed. The worry is that we’ll figure out how to make self-improving AGI before we figure out how to safety-proof every link in this chain of AGI-built AGIs.

6. Optimization and the Singularity. What is optimization? As optimization processes, how do evolution, humans, and self-modifying AGI differ?

7. Efficient Cross-Domain Optimization. What is intelligence?

8. The Design Space of Minds-In-General. What else is universally true of intelligences?

9. Plenty of Room Above Us. Why should we expect self-improving AGI to quickly become superintelligent?

Part III. AI risk.

In the Prisoner’s Dilemma, it’s better for both players to cooperate than for both to defect; and we have a natural disdain for human defectors. But an AGI is not a human; it’s just a process that increases its own ability to produce complex, low-probability situations. It doesn’t necessarily experience joy or suffering, doesn’t necessarily possess consciousness or personhood. When we treat it like a human, we not only unduly weight its own artificial ‘preferences’ over real human preferences, but also mistakenly assume that an AGI is motivated by human-like thoughts and emotions. This makes us reliably underestimate the risk involved in engineering an intelligence explosion.

10. The True Prisoner’s Dilemma. What kind of jerk would Defect even knowing the other side Cooperated?

11. Basic AI drives. Why are AGIs dangerous even when they’re indifferent to us?

12. Anthropomorphic Optimism. Why do we think things we hope happen are likelier?

13. The Hidden Complexity of Wishes. How hard is it to directly program an alien intelligence to enact my values?

14. Magical Categories. How hard is it to program an alien intelligence to reconstruct my values from observed patterns?

15. The AI Problem, with Solutions. How hard is it to give AGI predictable values of any sort? More generally, why does AGI risk matter so much?

Part IV. Ends.

A superintelligence has the potential not only to do great harm, but also to greatly benefit humanity. If we want to make sure that whatever AGIs people make respect human values, then we need a better understanding of what those values actually are. Keeping our goals in mind will also make it less likely that we’ll despair of solving the Friendliness problem. The task looks difficult, but we have no way of knowing how hard it will end up being until we’ve invested more resources into safety research. Keeping in mind how much we have to gain, and to lose, advises against both cynicism and complacency.

16. Could Anything Be Right? What do we mean by ‘good’, or ‘valuable’, or ‘moral’?

17. Morality as Fixed Computation. Is it enough to have an AGI improve the fit between my preferences and the world?

18. Serious Stories. What would a true utopia be like?

19. Value is Fragile. If we just sit back and let the universe do its thing, will it still produce value? If we don’t take charge of our future, won’t it still turn out interesting and beautiful on some deeper level?

20. The Gift We Give To Tomorrow. In explaining value, are we explaining it away? Are we making our goals less important?

In conclusion, a summary of the core argument: Five theses, two lemmas, and a couple of strategic implications.


If you’re convinced, MIRI has put together a list of ways you can get involved in promoting AI safety research. You can also share this post and start conversations about it, to put the issue on more people’s radars. If you want to read on, check out the more in-depth articles below.


Further reading



Poets say science takes away from the beauty of the stars — mere globs of gas atoms.

Nothing is “mere.”

I too can see the stars on a desert night, and feel them. But do I see less or more? The vastness of the heavens stretches my imagination — stuck on this carousel my little eye can catch one-million-year-old light. A vast pattern — of which I am a part — perhaps my stuff was belched from some forgotten star, as one is belching there. Or see them with the greater eye of Palomar, rushing all apart from some common starting point when they were perhaps all together.

What is the pattern, or the meaning, or the why? It does not do harm to the mystery to know a little about it. For far more marvelous is the truth than any artists of the past imagined! Why do the poets of the present not speak of it? What men are poets who can speak of Jupiter if he were like a man, but if he is an immense spinning sphere of methane and ammonia must be silent?


Nothing is mere?

Nothing? That can’t be right. One might as well proclaim that nothing is big. Or that nothing is undelicious.

What could that even mean? It sounds… arbitrary. Frivolous. An insult to the extraordinary.

But there’s a whisper of a lesson here. Value is arbitrary. It’s just what moves us. And the stars are lawless. And they nowhere decree what we ought to weep for, fight for, rejoice in. Love and terror, nausea and grace — these are born in us, not in the lovely or the terrible. ‘Arbitrary’ itself first meant ‘according to one’s will’. And by that standard nothing could be more arbitrary than the will itself.

Richard Feynman saw that mereness comes from our attitudes, our perspectives on things.  And those can change. (With effort, and with time.) Sometimes the key to appreciating the world is to remake it in our image, draw out of it an architecture deserving our reverence and joy. But sometimes the key is to reshape ourselves. Sometimes the things we should prize are already hidden in the world, and we have only to unblind ourselves to some latent dimension of merit.

Our task of tasks is to create a correspondence between our values and our world. But to do that, we must bring our values into harmony with themselves. And to do that, we must come to know ourselves.

Through Nothing Is Mere, I want to come to better understand the relationship between the things we care about and the things we believe. The topics I cover will vary wildly, but should all fall under four humanistic umbrellas.

  • Epistemology: What is it reasonable for us to believe? How do we make our beliefs more true, and why does truth matter?
  • Philosophy of Mind: What are we? Can we rediscover our most cherished and familiar concepts of ourselves in the great unseeing cosmos?
  • Value Theory: What is the nature of our moral, prudential, aesthetic, and epistemic norms? Which of our values run deepest?
  • Applied Philosophy: What now? How do we bring all of the above to bear on our personal development, our relationships, our discourse, our political and humanitarian goals?

Saying a little about my background in existential philosophy should go a long way toward explaining why I’m so interested in the project of humanizing Nature, and of naturalizing our humanity.

Two hundred years ago yesterday, the Danish theologian Søren Kierkegaard was born. SK was a reactionary romantic, a navel-gazing amoralist, an anti-scientific irrationalist, a gadfly, a child. But, for all that, he came to wisdom in a way very few do.

It sounds strange, but the words his hands penned taught me how to take my own life seriously. He forced me to see that my life’s value, at each moment, had to come from itself. And that it did. I really do care for myself, and I care for this world, and I need no one’s permission, no authority’s approval, to render my values legitimate.

SK feared the furious apathy of the naturalists, the Hegelians, the listless Christian throngs. He saw with virtuosic clarity the subjectivity of value, saw the value of subjectivity, saw the value of value itself. He saw that it is a species of madness to refuse in any way to privilege your own perspective, to value scientific objectivity so completely that the human preferences that make that objectivity worthwhile get lost in a fog, objectivity becoming an end in itself rather than a tool for realizing the things we cherish.

The path of objective reflection makes the subject accidental, and existence thereby into something indifferent, vanishing, Away from the subject, the path of reflection leads to the objective truth, and while the subject and his subjectivity become indifferent, the truth becomes that too, and just this is its objective validity; because interest, just like decision, is rooted in subjectivity. The path of objective reflection now leads to abstract thinking, to mathematics, to historical knowledge of various kinds, and always leads away from the subject, whose existence or non-existence becomes, and from the objective point of view quite rightly, infinitely indifferent[…. I]n so far as the subject fails to become wholly indifferent to himself, this only shows that his objective striving is not sufficiently objective.

But SK’s corrective was to endorse a rival lunacy. Fearing the world’s scientific mereness, its alien indifference, he fled from the world.

If there were no eternal consciousness in a man, if at the bottom of everything there were only a wild ferment, a power that twisting in dark passions produced everything great or inconsequential; if an unfathomable, insatiable emptiness lay hid beneath everything, what then would life be but despair? If it were thus, if there were no sacred bond uniting mankind, if one generation rose up after another like the leaves of the forest, if one generation succeeded the other as the songs of birds in the woods, if the human race passed through the world as a ship through the sea or the wind through the desert, a thoughtless and fruitless whim, if an eternal oblivion always lurked hungrily for its prey and there were no power strong enough to wrest it from its clutches — how empty and devoid of comfort would life be! But for that reason it is not so[.]

SK shared Feynman’s worry about the poet who cannot bring himself to embrace the merely real. He wanted to transform himself into the sort of person who could love himself, and love the world, purely and completely. But he simply couldn’t do it. So he cast himself before a God that would be for him the perfect lover, the perfect beloved, everything he wished he were.

[H]e sees in secret and recognizes distress and counts the tears and forgets nothing.

But everything moves you, and in infinite love. Even what we human beings call a trifle  and unmoved pass by, the sparrow’s need, that moves you; what we so often scarcely pay attention to, a human sigh, that moves you, Infinite Love.

To SK’s God, it all matters. But SK’s God is a God of solitude and self-deception. Striving for perfect Subjectivity leads to confusion and despair, just as surely as does striving for perfect, impersonal Objectivity. SK saw that we are the basis for the poetry of the world. What he sought in fantasy, we have now to discover — to create — in our shared world, our home.

Five years have passed, and I still return to Kierkegaard’s secret. He reminds me of what this is all for. We’re doing this for us, and it is we, at last, who must define our ends. I remain in his debt for that revelation. Asleep, I did not notice myself. Within a dream, I feel him shaking me awake with a terrifying urgency —— and I wake, and it is night, and I am alone again with the light of the stars.