The reason why people on tumblr over-use the concept of “trigger” rather than just “thing I don’t like” or “thing that makes me angry” or “thing that makes me sad” is that, literally, in the political/fandom part of tumblr culture are required to establish your right not to read a thing, and you only have rights if you can establish that you’re on the bad end of an axis of oppression. Hence, co-opting the language of mental illness: trigger.
i.e. trigger warning culture is a rational response to an environment in which media consumption is mandatory. It’s not hypersensitivity so much as the only way to function.
There is a secondary thing, which is, here we are all oppressed, which ties into the feeling that you only have rights if you can establish that you’re at the bad end of an axis of oppression, but I’m not sure I can totally articulate that thing.
The idea that oppression confers legitimacy does seem to be ascendant, and not just on tumblr. Hostile political debates these days often turn into arguments about which side is the injured party, with both claiming to be unfairly caricatured or oppressed. This is pretty bad if it displaces a substantive exchange of ideas, though it may be hard to fix in a society that’s correcting for bias against oppressed groups. The cure isn’t necessarily worse than the disease, though that’s a question worth looking into, as is the question of whether people can learn to see through false claims of grievance.
On the other hand, I don’t think ‘I will (mostly) disregard your non-triggering aversions’ implies ‘you only have rights to the extent you’re oppressed’. I think the deeper problem is that social interaction between strangers and acquaintances is increasingly taking place in massive common spaces, on public websites.
If we’re trapped in the same common space (e.g., because we have a lot of overlapping interests or friends), an increase in your right to freely say what you want to say inevitably means a decrease in my right to avoid hearing things I don’t want to hear. Increasing my right to only hear what I want to will likewise decrease your right to speak freely; at the very least, you’ll need to add content warnings to the things you write, which puts an increasing workload on writers’ plates as the list of reader aversions they need to keep track of grows longer. (Blogging and social media platforms also make things much more difficult, by forcing trigger warnings and content to compete for space at the start of posts.)
I don’t know of any easy, principled way to solve this problem. Readers can download software that blocks or highlights posts/websites using specific words, such as Tumblr Savior and FB Purity. Writers can adopt content warnings for the most common and most harmful trigger and aversions out there, or the ones that are too vague to be caught by word/phrase blockers.
But vague rules are hard to follow. So it’s understandable that people would gravitate toward a black-and-white ‘trigger’ v. ‘non-trigger’ dichotomy in the hope that the scientific authority and naturalness of a medical category would simplify the problem of deciding when the reader’s right-to-not-hear outweighs the writer’s right-to-speak-freely. And it’s equally understandable that people who don’t have ‘triggers’ in the strictest sense, but are still being harmed in a big way by certain things people say (or ways people say them), will want to piggyback off that heuristic once it exists.
‘Only include content warnings for triggers’ doesn’t work, because ‘trigger’ isn’t a natural kind and people mean different things by it. Give some groups an incentive to broaden the term and others an incentive to narrow it, and language will diverge even more. ‘I’ll only factor medical information into my decisions about how to be nice to people’ is rarely the right approach.
‘Always include content warnings for triggers’ doesn’t work either. There are simply too many things people are triggered by.
If we want rules that are easy to follow in extreme cases while remaining context-sensitive in mild cases, we’ll probably need some combination of
‘Here are the canonical content warnings that everyone should use in public spaces: [A], [B], [C]…’
‘If you have specific reason to think other information will harm part of your audience, the nice thing to do is to have a private conversation with some of those audience members and consider adding more content warnings. If it’s causing a lot of harm to a lot of your audience, adding content warnings transitions from “morally praiseworthy” to “morally obligatory”.’
The ambiguity and context-sensitivity of the second rule is made up for by the very clear and easy-to-follow first rule. Of course, I only provided a schema. The whole point of the first rule is to actually give concrete advice (especially for cases where you don’t know much about your audience). That project requires, if you’re going to do it right, that we collect base rate information on different aversions and triggers, find a not-terrible way of ranking them by ‘suffering caused’, and find a consensus threshold for ‘how much suffering it’s OK for a random content generator to cause in public spaces’.
That wouldn’t obviate the need for safe spaces where the content is more carefully controlled, but it would hopefully make movies, books, social media, etc. safe and enjoyable for nearly everyone, without requiring people to just stop talking about painful topics.
Impulse buying is a thing. We have ready-made clichés for picking it out. Analogously, ‘impulse giving’ is a thing, where you’re spontaneously moved by compassion to help someone out without any advance planning. The problem with most impulse giving is that it gives you the same warm glow and sense of moral license as high-impact giving, without making as much of a difference. Peter Singer puts it best:
My experience with the effective altruism community is that they don’t do much to encourage impulse giving of any kind. If you can give to low-impact charities in the heat of the moment, you should be able to do the same for high-impact charities; yet I think of ‘giving effectively’ as affectively cold, carefully budgeted.
This is probably mostly a good thing. We want people to think carefully about their big decisions, if it improves decision quality. However, the stereotype has its disadvantages. If people think they need to go through a long process of deliberation before they can give, they can end up procrastinating indefinitely. Borrowing Haidt’s analogy, we’re discouraging the elephant (our system-1 emotions and intuitions) from getting passionate and worked up about the most important things we do, while encouraging the elephant’s rider (our system-2 reasoning and deliberation) to overanalyze and agonize over decisions.
Effective altruism as it exists today is aligned with the legions and principalities of Order. I’d bet we can change that in some respects, if we so wish, without giving up our allegiance to Goodness.
Eliezer Yudkowsky suggests that we “purchase fuzzies and utilons separately“. Better to spend some of your time on feel-good do-gooding and some on optimal high-impact do-gooding, rather than pursuing them simultaneously and doing a terrible job at both. In “Harry Potter and the Fuzzies of Altruism“, I noted that there are different kinds of fuzzies people can get for doing good.
One of these varieties is particularly valuable, because it doesn’t need to be purchased separately. I speak of the slytherfuzzy, that warm glow you get from being especially efficient and effective. Do-gooders who find cool, calculated pragmatism strongly motivating in its own right have an obvious leg up. I myself am more motivated by narrative, novelty, and love-of-neighbor than by Winning, but I’d love to find a way to steal that trick and bind my own reward center more tightly to humanitarian accomplishment.
If you’re trying to make yourself (or others) more enthusiastic about purchasing utilons, it may be helpful to make the way you buy utilons as fuzzy-producing as possible. This needn’t dilute the outcome. Select a charity based on a sober cost-benefit analysis, but give chaotically, if chaos happens to gel with your psychology. Impulse giving and effective altruism don’t have to be placed in separate mental boxes forever; we can invent new categories of behavior that wed Chaos Altruism’s giddy spontaneity to Order Altruism’s focus and rigor.
I’d expect mixed approaches to work best. E.g., you can settle on a fixed percentage of your income to give to a high-impact cause every year, but build a habit of giving bonus donations to that cause when the mood strikes you. I’m a big fan of using specific benevolence triggers. For example: ‘When someone on the street asks me for money, and I feel an urge to give them $X, give $X to a high-impact charity (whether or not I also give money to the individual who asked).’ Leah Libresco and Michael Blume make good use of this kind of ‘nudged giving’.
But I think we should also normalize whimsical, untriggered high-impact giving. If we start thinking of evidence-based humanitarianism as the kind of thing you can splurge on, I suspect we’ll come to see do-gooding as more of a fun opportunity and less of a burden.
Some people think of their philanthropy as a personal passion that drives them to excel, as in Holden Karnofsky’s “excited altruism“. Others think of their philanthropy as a universal moral obligation they’re striving to meet, as in Eliezer’s “one life against the world“. Try to fit all philanthropists into the ‘passion’ box, and you’ll get a contingent that feels cut off from what makes this work important; try to fit them all into the ‘obligation’ box, and you’ll get a contingent that feels burdened with a dour or guilt-inducing chore.
Likewise, there are important points of divergence between do-gooders who are motivated by different kinds of warm fuzzy (or hot blazing, or cool gliding, or wiggly sparkling…) feelings. I’m more of an obligation-based altruist, but I still find the ‘excited altruist’ framing useful. That I think in moralistic terms doesn’t say much about the specific feelings that drive me to do good in the moment.
My moralism also leaves open what feelings I should emphasize if I want to transfer my enthusiasm to others.
The ice bucket challenge is an example of memetically successful Chaos Altruism. Ditto today’s Giving Tuesday event, though an annual event is relatively compatible with the reign of Order. Will McAskill and Timothy Ogden have criticized these memes as possibly counterproductive, but it’s not obvious to me that the ineffectiveness of these events stems from their viral or ad-hoc character. Instead, those same attributes could be very valuable if they were targeted at more urgent causes.
McAskill and Ogden draw attention to the fact that charitable donations have been stuck at 2% of U.S. GDP for 40 years now. People (on average) seem to change where they donate, but not how much they donate. One approach to doing better, than, would be to redirect that 2% to worthier interventions.
At the same time, the success of the giving pledge shows that some people can be inspired to increase their donations. Perhaps we haven’t been able to rise above 2% because charities are too busy competing with each other to focus their advertising ingenuity on growing the pie. Perhaps some deep change in people’s mindset is needed; I’ll note that households giving to religious nonprofits donate twice as much. Relatedly, the key may be to shift entire (small) communities to giving more, so giving more is the norm among everyone you know. Then expand those supergiver tribes into neighboring social networks.
Experimenting with playful, unorthodox, and personalized modes of altruism seems like it could be useful for finding ways to make inroads in new communities. Over the next few years, I think we should place more focus on self-experimentation and object-level research than on outreach; but we should still keep in mind that we need a better handle on human motivation if we’re going to completely restructure the way charity is done. For that reason, I’m eager to hear whether any aspiring effective altruists find Chaos approaches attractive.
Vegans: If the meat eaters believed what you did about animal sentience, most of them would be vegans, and they would be horrified by their many previous murders. Your heart-wrenching videos aren’t convincing to them because they aren’t already convinced that animals can feel.
Meat-eaters: Vegans think there are billions of times more people on this planet than you do, they believe you’re eating a lot of those people, and they care about every one of them the way you care about every human. […]
Finally, let me tell you about what happens when you post a heart-wrenching video of apparent animal suffering: It works, if the thing you’re trying to do is make me feel terrible. My brain anthropomorphizes everything at the slightest provocation. Pigs, cows, chickens, mollusks, worms, bacteria, frozen vegetables, and even rocks. And since I know that it’s quite easy to get me to deeply empathize with a pet rock, I know better than to take those feelings as evidence that the apparently suffering thing is in fact suffering. If you posted videos of carrots in factory farms and used the same phrases to describe their miserable lives and how it’s all my fault for making the world this terrible place where oodles of carrots are murdered constantly, I’d feel the same way. So these arguments do not tend to be revelatory of truth.
I’ve argued before that non-human animals’ abilities to self-monitor, learn, collaborate, play, etc. aren’t clear evidence that they have a subjective, valenced point of view on the world. Until we’re confident we know what specific physical behaviors ‘having a subjective point of view’ evolved to produce — what cognitive problem phenomenal consciousness solves — we can’t confidently infer consciousness from the overt behaviors of infants, non-human animals, advanced AI, anesthetized humans, etc.
[I]f you work on AI, and have an intuition that a huge variety of systems can act ‘intelligently’, you may doubt that the linkage between human-style consciousness and intelligence is all that strong. If you think it’s easy to build a robot that passes various Turing tests without having full-fledged first-person experience, you’ll also probably (for much the same reason) expect a lot of non-human species to arrive at strategies for intelligently planning, generalizing, exploring, etc. without invoking consciousness. (Especially if [you think consciousness is very complex]. Evolution won’t put in the effort to make a brain conscious unless it’s extremely necessary for some reproductive advantage.)
That said, I don’t think any of this is even superficially an adequate justification for torturing, killing, and eating human infants, intelligent aliens, or cattle.
The intellectual case against meat-eating is pretty air-tight
To argue from ‘we don’t understand the cognitive basis for consciousness’ to ‘it’s OK to eat non-humans’ is acting as though our ignorance were positive knowledge we could confidently set down our weight on. Even if you have a specific cognitive model that predicts ‘there’s an 80% chance cattle can’t suffer,’ you have to be just as cautious as you’d be about torturing a 20%-likely-to-be-conscious person in a non-vegetative coma, or a 20%-likely-to-be-conscious alien. And that’s before factoring in your uncertainty about the arguments for your model.
The argument for not eating cattle, chickens, etc. is very simple:
1. An uncertainty-about-animals premise, e.g.: We don’t know enough about how cattle cognize, and about what kinds of cognition make things moral patients, to assign a less-than-1-in-20 subjective probability to ‘factory-farmed cattle undergo large quantities of something-morally-equivalent-to-suffering’.
2. An altruism-in-the-face-of-uncertainty premise, e.g.: You shouldn’t do things that have a 1-in-20 (or greater) chance of contributing to large amounts of suffering, unless the corresponding gain is huge. E.g., you shouldn’t accept $100 to flip a switch that 95% of the time does nothing and 5% of the time nonconsensually tortures an adult human for 20 minutes.
3. An eating-animals-doesn’t-have-enormous-benefits premise.
4. An eating-animals-is-causally-linked-to-factory-farming premise.
5. So don’t eat the animals in question.
This doesn’t require us to indulge in anthropomorphism or philosophical speculation. And Brienne’s updates to her post suggest she now agrees a lot of meat-eaters we know assign a non-negligible probability to ‘cattle can suffer’. (Also, kudos to Brienne on not only changing her mind about an emotionally fraught issue extremely rapidly, but also changing the original post. A lot of rationalists who are surprisingly excellent at updating their beliefs don’t seem to fully appreciate the value of updating the easy-to-Google public record of their beliefs to cut off the spread of falsehoods.)
This places intellectually honest meat-eating effective altruists in a position similar to Richard Dawkins':
[I’m] in a very difficult moral position. I think you have a very, very strong point when you say that anybody who eats meat has a very, very strong obligation to think seriously about it. And I don’t find any very good defense. I find myself in exactly the same position as you or I would have been — well, probably you wouldn’t have been, but I might have been — 200 years ago, talking about slavery. […T]here was a time when it was simply the norm. Everybody did it. Some people did it with gusto and relish; other people, like Jefferson, did it reluctantly. I would have probably done it reluctantly. I would have sort of just gone along with what society does. It was hard to defend then, yet everybody did it. And that’s the sort of position I find myself in now. […] I live in a society which is still massively speciesist. Intellectually I recognize that, but I go along with it the same way I go along with celebrating Christmas and singing Christmas carols.
Until I see solid counter-arguments — not just counter-arguments to ‘animals are very likely conscious,’ but to the much weaker formulation needed to justify veg(etari)anism — I’ll assume people are mostly eating meat because it’s tasty and convenient and accepted-in-polite-society, not because they’re morally indifferent to torturing puppies behind closed doors.
Why isn’t LessWrong extremely veg(etari)an?
On the face of it, LessWrong ought to be leading the pack in veg(etari)anism. A lot of LessWrong’s interests and values look like they should directly cash out in a concern for animal welfare:
transhumanism and science fiction: If you think aliens and robots and heavily modified posthumans can be moral patients, you should be more open to including other nonhumans in your circle of concern.
superrationality: Veg(etari)anism benefits from an ability to bind my future self to my commitments, and from a Kantian desire to act as I’d want other philosophically inclined people in my community to act.
utilitarianism: Animals causes are admirably egalitarian and scope-sensitive.
taking ideas seriously: If you’re willing to accept inconvenient conclusions even when they’re based in abstract philosophy, that gives more power to theoretical arguments for worrying about animal cognition even if you can’t detect or imagine that cognition yourself.
distrusting the status quo: Veg(etari)anism remains fairly unpopular, and societal inertia is an obvious reason why.
distrusting ad-hoc intuitions: It may not feel desperately urgent to stop buying hot dogs, but you shouldn’t trust that intuition, because it’s self-serving and vulnerable to e.g. status quo bias. This is a lot of how LessWrong goes about ‘taking ideas seriously'; one should ‘shut up and multiply’ even when a conclusion is counter-intuitive.
Yet only about 15% of LessWrong is vegetarian (compared to 4-13% of the Anglophone world, depending on the survey). By comparison, the average ‘effective altruist’ LessWronger donated $2503 to charity in 2013; 9% of LessWrongers have been to a CFAR class; and 4% of LessWrongers are signed up for cryonics (and another 24% would like to be signed up). These are much larger changes relative to the general population, where maybe 1 in 150,000 people are signed up for cryonics.
I can think of a few reasons for the discrepancy:
(a) Cryonics, existential risk, and other LessWrong-associated ideas have techy, high-IQ associations, in terms of their content and in terms of the communities that primarily endorse them. They’re tribal markers, not just attempts to maximize expected utility; and veg(etari)ans are seen as belonging to other tribes, like progressive political activists and people who just want to hug every cat.
(b) Those popular topics have been strongly endorsed and argued for by multiple community leaders appealing to emotional language and vivid prose. It’s one thing to accept cryonics and vegetarianism as abstract arguments, and another thing to actually change your lifestyle based on the argument; the latter took a lot of active pushing and promotion. (The abstract argument is important; but it’s a necessary condition for action, not a sufficient one. You can’t just say ‘I’m someone who takes ideas seriously’ and magically stop reasoning motivatedly in all contexts.)
(c) Veg(etari)anism isn’t weird and obscure enough. If you successfully sign up for cryonics, LessWrong will treat you like an intellectual and rational elite, a rare person who actually thinks clearly and acts accordingly. If you successfully donate 10% of your income to GiveWell, ditto; even though distributing deworming pills isn’t sexy and futuristic, it’s obscure enough (and supported by enough community leaders, per (b)) that it allows you to successfully signal that you’re special. If 10% of the English-speaking world donated to GiveWell or were signed up for cryonics, my guess is that LessWrongers would be too bored by those topics to rush to sign up even if the cryonics and deworming organizations had scaled up in ways that made marginal dollars more effective. Maybe you’d get 20% to sign up for cryonics, but you wouldn’t get 50% or 90%.
(d) Changing your diet is harder than spending lots of money. Where LessWrongers excel, it’s generally via once-off or sporadic spending decisions that don’t have a big impact on your daily life. (‘Successfully employing CFAR techniques’ may be an exception to this rule, if it involves reinvesting effort every single day or permanently skipping out on things you enjoy; but I don’t know how many LessWrongers do that.)
If those hypotheses are right, it might be possible to shift LessWrong types more toward veganism by improving its status in the community and making the transition to veganism easier and less daunting.
What would make a transhumanist excited about this?
I’ll conclude with various ideas for bridging the motivation gap. Note that it doesn’t follow from ‘the gap is motivational’ that posting a bunch of videos of animal torture to LessWrong or the Effective Altruism Forum is the best way to stir people’s hearts. When intellectual achievement is what you trust and prize, you’re more likely to be moved to action by things that jibe with that part of your identity.
Write stunningly beautiful, rigorous, philosophically sophisticated things that are amazing and great
I’m not primarily thinking of writing really good arguments for veg(etari)anism; as I noted above, the argument is almost too clear-cut. It leaves very little to talk about in any detail, especially if we want something that hasn’t been discussed to death on LessWrong before. However, there are still topics in the vicinity to address, such as ‘What is the current state of the evidence about the nutrition of veg(etari)an diets?’ Use Slate Star Codex as a model, and do your very best to actually portray the state of the evidence, including devoting plenty of attention to any ways veg(etari)an diets might turn out to be unhealthy. (EDIT: Soylent is popular with this demographic and is switching to a vegan recipe, so it might be especially useful to evaluate its nutritional completeness and promote a supplemented Soylent diet.)
In the long run you’ll score more points by demonstrating how epistemically rational and even-handed you are than by making any object-level argument for veg(etari)anism. Not only will you thereby find out more about whether you’re wrong, but you’ll convince rationalists to take these ideas more seriously than if you gave a more one-sided argument in favor of a policy.
Fiction, done right, can serve a similar function. I could imagine someone writing a sci-fi story set in a future where humans have evolved into wildly different species with different perceived rights, thus translating animal welfare questions into a transhumanist idiom.
Just as the biggest risk with a blog post is of being too one-sided, the biggest risk with a story is of being too didactic and persuasion-focused. The goal is not to construct heavy-handed allegories; the goal is to make an actually good story, with moral conflicts you’re genuinely unsure about. Make things that would be worth reading even if you were completely wrong about animal ethics, and as a side-effect you’ll get people interested in the science, the philosophy, and the pragmatics of related causes.
Be positive and concrete
Frame animal welfare activism as an astonishingly promising, efficient, and uncrowded opportunity to do good. Scale back moral condemnation and guilt. LessWrong types can be powerful allies, but the way to get them on board is to give them opportunities to feel like munchkins with rare secret insights, not like latecomers to a not-particularly-fun party who have to play catch-up to avoid getting yelled at. It’s fine to frame helping animals as challenging, but the challenge should be to excel and do something astonishing, not to meet a bare standard for decency.
This doesn’t necessarily mean lowering your standards; if you actually demand more of LessWrongers and effective altruists than you do of ordinary people, you’ll probably do better than if you shot for parity. If you want to change minds in a big way, think like Berwick in this anecdote from Switch:
In 2004, Donald Berwick, a doctor and the CEO of the Institute for Healthcare Improvement (IHI), had some ideas about how to save lives—massive numbers of lives. Researchers at the IHI had analyzed patient care with the kinds of analytical tools used to assess the quality of cars coming off a production line. They discovered that the ‘defect’ rate in health care was as high as 1 in 10—meaning, for example, that 10 percent of patients did not receive their antibiotics in the speciﬁed time. This was a shockingly high defect rate—many other industries had managed to achieve performance at levels of 1 error in 1,000 cases (and often far better). Berwick knew that the high medical defect rate meant that tens of thousands of patients were dying every year, unnecessarily.
Berwick’s insight was that hospitals could beneﬁt from the same kinds of rigorous process improvements that had worked in other industries. Couldn’t a transplant operation be ‘produced’ as consistently and ﬂawlessly as a Toyota Camry?
Berwick’s ideas were so well supported by research that they were essentially indisputable, yet little was happening. He certainly had no ability to force any changes on the industry. IHI had only seventy-ﬁve employees. But Berwick wasn’t deterred.
On December 14, 2004, he gave a speech to a room full of hospital administrators at a large industry convention. He said, ‘Here is what I think we should do. I think we should save 100,000 lives. And I think we should do that by June 14, 2006—18 months from today. Some is not a number; soon is not a time. Here’s the number: 100,000. Here’s the time: June 14, 2006—9 a.m.’
The crowd was astonished. The goal was daunting. But Berwick was quite serious about his intentions. He and his tiny team set out to do the impossible.
IHI proposed six very speciﬁc interventions to save lives. For instance, one asked hospitals to adopt a set of proven procedures for managing patients on ventilators, to prevent them from getting pneumonia, a common cause of unnecessary death. (One of the procedures called for a patient’s head to be elevated between 30 and 45 degrees, so that oral secretions couldn’t get into the windpipe.)
Of course, all hospital administrators agreed with the goal to save lives, but the road to that goal was ﬁlled with obstacles. For one thing, for a hospital to reduce its ‘defect rate,’ it had to acknowledge having a defect rate. In other words, it had to admit that some patients were dying needless deaths. Hospital lawyers were not keen to put this admission on record.
Berwick knew he had to address the hospitals’ squeamishness about admitting error. At his December 14 speech, he was joined by the mother of a girl who’d been killed by a medical error. She said, ‘I’m a little speechless, and I’m a little sad, because I know that if this campaign had been in place four or ﬁve years ago, that Josie would be ﬁne…. But, I’m happy, I’m thrilled to be part of this, because I know you can do it, because you have to do it.’ Another guest on stage, the chair of the North Carolina State Hospital Association, said: ‘An awful lot of people for a long time have had their heads in the sand on this issue, and it’s time to do the right thing. It’s as simple as that.’
IHI made joining the campaign easy: It required only a one-page form signed by a hospital CEO. By two months after Berwick’s speech, over a thousand hospitals had enrolled. Once a hospital enrolled, the IHI team helped the hospital embrace the new interventions. Team members provided research, step-by-step instruction guides, and training. They arranged conference calls for hospital leaders to share their victories and struggles with one another. They encouraged hospitals with early successes to become ‘mentors’ to hospitals just joining the campaign.
The friction in the system was substantial. Adopting the IHI interventions required hospitals to overcome decades’ worth of habits and routines. Many doctors were irritated by the new procedures, which they perceived as constricting. But the adopting hospitals were seeing dramatic results, and their visible successes attracted more hospitals to join the campaign.
Eighteen months later, at the exact moment he’d promised to return—June 14, 2006, at 9 a.m.—Berwick took the stage again to announce the results: ‘Hospitals enrolled in the 100,000 Lives Campaign have collectively prevented an estimated 122,300 avoidable deaths and, as importantly, have begun to institutionalize new standards of care that will continue to save lives and improve health outcomes into the future.’
The crowd was euphoric. Don Berwick, with his 75-person team at IHI, had convinced thousands of hospitals to change their behavior, and collectively, they’d saved 122,300 lives—the equivalent of throwing a life preserver to every man, woman, and child in Ann Arbor, Michigan.
This outcome was the fulfillment of the vision Berwick had articulated as he closed his speech eighteen months earlier, about how the world would look when hospitals achieved the 100,000 lives goal:
‘And, we will celebrate. Starting with pizza, and ending with champagne. We will celebrate the importance of what we have undertaken to do, the courage of honesty, the joy of companionship, the cleverness of a field operation, and the results we will achieve. We will celebrate ourselves, because the patients whose lives we save cannot join us, because their names can never be known. Our contribution will be what did not happen to them. And, though they are unknown, we will know that mothers and fathers are at graduations and weddings they would have missed, and that grandchildren will know grandparents they might never have known, and holidays will be taken, and work completed, and books read, and symphonies heard, and gardens tended that, without our work, would have been only beds of weeds.’
As an added bonus, emphasizing excellence and achievement over guilt and wickedness can decrease the odds that you’ll make people feel hounded or ostracized for not immediately going vegan. I expressed this worry in Virtue, Public and Private, e.g., for people with eating disorders that restrict their dietary choices. This is also an area where ‘just be nice to people’ is surprisingly effective.
If you want to propagate a modest benchmark, consider: “After every meal where you eat an animal, donate $1 to the Humane League.” Seems like a useful way to bootstrap toward veg(etari)anism, and it fits the mix of economic mindfulness and virtue cultivation that a lot of rationalists find appealing. This sort of benchmark is forgiving without being shapeless or toothless. If you want to propagate an audacious vision for the future, consider: “There were 1200 meat-eaters on LessWrong in the 2013 survey; if we could get them to consume 30% less meat from land animals over the next 10 years, we could prevent 100,000 deaths (mostly chickens). Let’s shoot for that.” Combining an audacious vision with a simple, actionable policy should get the best results.
Embrace weird philosophies
Here’s an example of the special flavor LessWrong-style animal activism could develop:
Are there any animal welfare groups that emphasize the abyssal otherness of the nonhuman mind? That talk about the impossible dance, the catastrophe of shapeless silence that lies behind a cute puppy dog’s eyes? As opposed to talking about how ‘sad’ or ‘loving’ the puppies are?
I think I’d have a much, much easier time talking about the moral urgency of animal suffering without my Anthropomorphism Alarms going off if I were part of a community like ‘Lovecraftians for the Ethical Treatment of Animals’.
This is philosophically sound and very relevant, since our uncertainty about animal cognition is our best reason to worry about their welfare. (This is especially true when we consider the possibility that non-humans might suffer more than any human can.) And, contrary to popular misconceptions, the Lovecraftian perspective is more about profound otherness than about nightmarish evil. Rejecting anthropomorphism makes the case for veg(etari)anism stronger; and adopting that sort of emotional distance, paradoxically, is the only way to get LessWrong types interested and the only way to build trust.
Yet when I expressed an interest in this nonstandard perspective on animal well-being, I got responses from effective animal altruists like (paraphrasing):
- ‘Your endorsement of Lovecraftian animal rights sounds like an attack on animal rights; so here’s my defense of the importance of animal rights…’
- ‘No, viewing animal psychology as alien and unknown is scientifically absurd. We know for a fact that dogs and chickens experience human-style suffering. (David Pearce adds: Also lampreys!)’
- ‘That’s speciesist!’
Confidence about animal psychology (in the direction of ‘it’s relevantly human-like’) and extreme uncertainty about animal psychology can both justify prioritizing animal welfare; but when you’re primarily accustomed to seeing uncertainty about animal psychology used as a rationalization for neglecting animals, it will take increasing amounts of effort to keep the policy proposal and the question-of-fact mentally distinct. Encourage more conceptual diversity and pursue more lines of questioning for their own sake, and you end up with a community that’s able to benefit more from cross-pollination with transhumanists and mainline effective altruists and, further, one that’s epistemically healthier.
I’ve been going to Val’s rationality dojo for CFAR workshop alumni, and I found a kind-of-similar-to-this exercise useful:
- List a bunch of mental motions — situational responses, habits, personality traits — you wish you could possess or access at will. Visualize small things you imagine would be different about you if you were making more progress toward your goals.
- Make these skills things you could in principle just start doing right now, like ‘when my piano teacher shuts the door at the end of our weekly lessons, I’ll suddenly find it easy to install specific if-then triggers for what times I’ll practice piano that week.’ Or ‘I’ll become a superpowered If-Then Robot, the kind of person who always thinks to use if-then triggers when she needs to keep up with a specific task.’ Not so much ‘I suddenly become a piano virtuoso’ or ‘I am impervious to projectile weapons’.
- Optionally, think about a name or visualization that would make you personally excited and happy to think and talk about the virtuous disposition you desire. For example, when I think about the feeling of investing in a long-term goal in a manageable, realistic way, one association that spring to mind for me is the word healthy. I also visualize a solid forward motion, with my friends and life-as-a-whole relaxedly keeping pace. If I want to frame this habit as a Powerful Technique, maybe I’ll call it ‘Healthiness-jutsu’.
Here’s a grab bag of other things I’d like to start being better at:
1. casual responsibility – Freely and easily noticing and attending to my errors, faults, and obligations, without melodrama. Keeping my responsibility in view without making a big deal about it, beating myself up, or seeking a Grand Resolution. Just, ‘Yup, those are some of the things on the List. They matter. Next question?’
2. rigorous physical gentleness – My lower back is recovering from surgery. I need to consistently work to incrementally strengthen it, while being very careful not to overdo it. Often this means avoiding fun strenuous exercise, which can cause me to start telling frailty narratives to myself and psych myself out of relatively boring-but-sustainable exercise. So I’m mentally combining the idea of a boot camp with the idea of a luxurious spa: I need to be militaristic and zealous about always pampering and caring for and moderately-enhancing myself, without fail, dammit. It takes grit to be that patient and precise and non-self-destructive.
3. tsuyoku naritai – I am the naive but tenacious-and-hard-working protagonist-with-an-aura-of-destiny in a serial. I’ll face foes beyond my power — cinematic obstacles, yielding interesting, surprising failures — and I’ll learn, and grow. My journey is just beginning. I will become stronger.
4. trust – Disposing of one of my biggest practical obstacles to tsuyoku naritai. Feeling comfortable losing; feeling safe and luminous about vulnerability. Building five-second habits and social ties that make growth-mindset weakness-showing normal.
5. outcome pumping – “What you actually end up doing screens off the clever reason why you’re doing it.” Past a certain point, it just doesn’t matter exactly why or exactly how; it matters what. If I somehow find myself studying mathematics for 25 minutes a day over four months. and that is hugely rewarding, it’s almost beside the point what cognitive process I used to get there. I don’t need to have a big cause or justification for doing the awesome thing; I can just do it. Right now, in fact.
6. do the thing – Where outcome pumping is about ‘get it done and who cares about method’, I associate thing-doing with ‘once I have a plan/method/rule, do that. Follow though.’ You did the thing yesterday? Good. Do the thing today. Thing waits for no man. — — — You’re too [predicate]ish or [adjective]some to do the thing? That’s perfectly fine. Go do the thing.
When I try to visualize a shiny badass hybrid Competence Monster with all of these superpowers, I get something that looks like this. Your memetico-motivational mileage may vary.
7. sword of clear sight – Inner bullshit detector, motivated stopping piercer, etc. A thin blade cleanly divorces my person from unhealthy or not-reflectively-endorsed inner monologues. Martial arts metaphors don’t always work for me, but here they definitely feel right.
8. ferocity – STRIKE right through the obstacle. Roar. Spit fire, smash things, surge ahead. A whipping motion — a sudden SPIKE in focused agency — YES. — YES, IT MUST BE THAT TIME AGAIN. CAPS LOCK FEELS UNBELIEVABLY APPROPRIATE. … LET’S DO THIS.
9. easy response – With a sense of lightness and fluid motion-right-to-completion, immediately execute each small task as it arises. Breathe as normal. No need for a to-do list or burdensome juggling act; with no particular fuss or exertion, it is already done.
10. revisit the mountain – Take a break to look at the big picture. Ponder your vision for the future. Write blog posts like this. I’m the kind of person who benefits a lot from periodically looking back over how I’m doing and coming up with handy new narratives.
These particular examples probably won’t match your own mental associations and goals. I’d like to see your ideas; and feel free to steal from and ruthlessly alter entries on my own or others’ lists!
Effective altruists have been discussing animal welfare rather a lot lately, on a few different levels:
1. object-level: How likely is it that conventional food animals suffer?
2. philanthropic: Compared to other causes, how important is non-human animal welfare? How effective are existing organizations and programs in this area? Should effective altruists concentrate attention and resources here?
3. personal-norm: Is it morally acceptable for an individual to use animal products? How important is it to become a vegetarian or vegan?
4. group-norm: Should effective altruist meetings and conventions serve non-vegan food? Should the effective altruist movement rally to laud vegans and/or try to make all effective altruists go vegan?
These questions are all linked, but I’ll mostly focus on 4. For catered EA events, I think it makes sense to default to vegan food whenever feasible, and order other dishes only if particular individuals request them. I’m not a vegan myself, but I think this sends a positive message — that we respect the strength of vegans’ arguments, and the large stakes if they’re right, more than we care about non-vegans’ mild aesthetic preferences.
My views about trying to make as many EAs as possible go vegan are more complicated. As a demonstration of personal virtue, I’d put ‘become a vegan’ in the same (very rough) category as:
- have no carbon footprint.
- buy no product whose construction involved serious exploitation of labor.
- give 10+% of your income to a worthy cause.
- avoid lifestyle choices that have an unsustainable impact on marine life.
- only use antibiotics as a last (or almost-last) resort, so as not to contribute to antibiotic resistance.
- do your best to start a career in effective altruism.
Arguments could be made that many of these are morally obligatory for nearly all people. And most people dismiss these policies too hastily, overestimating the action’s difficulty and underestimating its urgency. Yet, all the same, I’m not confident any of these is universally obligatory — and I’m confident that it’s not a good idea to issue blanket condemnations of everyone who fails to live up to some or all of the above standards, nor to make these actions minimal conditions for respectable involvement in EA.
People with eating disorders can have good grounds for not immediately going vegan. Immunocompromised people can have good grounds for erring on the side of overusing medicine. People trying to dig their way out of debt while paying for a loved one’s medical bills can have good grounds not to give to charity every year.
The deeper problem with treating these as universal Standards of Basic Decency in our community isn’t that we’d be imposing an unreasonable demand on people. It’s that we’d be forcing lots of people to disclose very sensitive details about their personal lives to a bunch of strangers or to the public Internet — physical disabilities, mental disabilities, personal tragedies, intense aversions…. Putting people into a tight spot is a terrible way to get them on board with any of the above proposals, and it’s a great way to make people feel hounded and unsafe in their social circles.
No one’s suggested casting all non-vegans out of our midst. I have, however, heard recent complaints from people who have disabilities that make it unusually difficult to meet some of the above Standards, and who have become less enthusiastic about EA as a result of feeling socially pressured or harangued by EAs to immediately restructure their personal lives. So I think this is something to be aware of and nip in the bud.
In principle, there’s no crisp distinction between ‘personal life’ and ‘EA activities’. There may be lots of private details about a person’s life that would constitute valuable Bayesian evidence about their character, and there may be lots of private activities whose humanitarian impact over a lifetime adds up to be quite large.
Even taking that into account, we should adopt (quasi-)deontic heuristics like ‘don’t pressure people into disclosing a lot about their spending, eating, etc. habits.’ Ends don’t justify means among humans. For the sake of maximizing expected utility, lean toward not jabbing too much at people’s boundaries, and not making it hard for them to have separate private and public lives — even for the sake of maximizing expected utility.
Edit (9/1): Mason Hartman gave the following criticism of this post:
I think putting people into a tight spot is not only not a terrible way to get people on board with veganism, but basically the only way to make a vegan of anyone who hasn’t already become one on their own by 18. Most people like eating meat and would prefer not to be persuaded to stop doing it. Many more people are aware of the factory-like reality of agriculture in 2014 than are vegans. Quietly making the information available to those who seek it out is the polite strategy, but I don’t think it’s anywhere near the most effective one. I’m not necessarily saying we should trade social comfort for greater efficacy re: animal activism, but this article disappoints in that it doesn’t even acknowledge that there is a tradeoff.
Also, all of our Standards of Basic Decency put an “unreasonable demand” (as defined in Robby’s post) on some people. All of them. That doesn’t necessarily mean we’ve made the wrong decision by having them.
In reply: The strategy that works best for public outreach won’t always be best for friends and collaborators, and it’s the latter I’m talking about. I find it a lot more plausible that open condemnation and aggressive uses of social pressure work well for strangers on the street than that they work well for coworkers, romantic partners, etc. (And I’m pretty optimistic that there are more reliable ways to change the behavior of the latter sorts of people, even when they’re past age 18.)
It’s appropriate to have a different set of norms for people you regularly interact with, assuming it’s a good idea to preserve those relationships. This is especially true when groups and relationships involve complicated personal and professional dynamics. I focused on effective altruism because it’s the sort of community that could be valuable, from an animal-welfare perspective, even if a significant portion of the community makes bad consumer decisions. That makes it likelier that we could agree on some shared group norms even if we don’t yet agree on the same set of philanthropic or individual norms.
I’m not arguing that you shouldn’t try to make all EAs vegans, or get all EAs to give 10+% of their income to charity, or make EAs’ purchasing decisions more labor- or environment-friendly in other respects. At this point I’m just raising a worry that should constrain how we pursue those goals, and hopefully lead to new ideas about how we should promote ‘private’ virtue. I’d expect strategies that are very sensitive to EAs’ privacy and boundaries to work better, in that I’d expect them to make it easier for a diverse community of researchers and philanthropists to grow in size, to grow in trust, to reason together, to progressively alter habits and beliefs, and to get some important work done even when there are serious lingering disagreements within the community.
Richard Loosemore recently wrote an essay criticizing worries about AI safety, “The Maverick Nanny with a Dopamine Drip“. (Subtitle: “Debunking Fallacies in the Theory of AI Motivation”.) His argument has two parts. First:
1. Any AI system that’s smart enough to pose a large risk will be smart enough to understand human intentions, and smart enough to rewrite itself to conform to those intentions.
2. Any such AI will be motivated to edit itself and remove ‘errors’ from its own code. (‘Errors’ is a large category, one that includes all mismatches with programmer intentions.)
3. So any AI system that’s smart enough to pose a large risk will be motivated to spontaneously overwrite its utility function to value whatever humans value.
4. Therefore any powerful AGI will be fully safe / friendly, no matter how it’s designed.
5. Logical AI is brittle and inefficient.
6. Neural-network-inspired AI works better, and we know it’s possible, because it works for humans.
7. Therefore, if we want a domain-general problem-solving machine, we should move forward on Loosemore’s proposal, called ‘swarm relaxation intelligence.’
Combining these two conclusions, we get:
8. Since AI is completely safe — any mistakes we make will be fixed automatically by the AI itself — there’s no reason to devote resources to safety engineering. Instead, we should work as quickly as possible to train smarter and smarter neural networks. As they get smarter, they’ll get better at self-regulation and make fewer mistakes, with the result that accidents and moral errors will become decreasingly likely.
I’m not persuaded by Loosemore’s case for point 2, and this makes me doubt claims 3, 4, and 8. I’ll also talk a little about the plausibility and relevance of his other suggestions.
Does intelligence entail docility?
Loosemore’s claim (also made in an older essay, “The Fallacy of Dumb Superintelligence“) is that an AGI can’t simultaneously be intelligent enough to pose a serious risk, but “unsophisticated” enough to disregard its programmers’ intentions. I replied last year in two blog posts (crossposted to Less Wrong).
In “The AI Knows, But Doesn’t Care” I noted that while Loosemore posits an AGI smart enough to correctly interpret natural language and model human motivation, this doesn’t bridge the gap between the ability to perform a task and the motivation, the agent’s decision criteria. In “The Seed is Not the Superintelligence,” I argued, concerning recursively self-improving AI (seed AI):
When you write the seed’s utility function, you, the programmer, don’t understand everything about the nature of human value or meaning. That imperfect understanding remains the causal basis of the fully-grown superintelligence’s actions, long after it’s become smart enough to fully understand our values.
Why is the superintelligence, if it’s so clever, stuck with whatever meta-ethically dumb-as-dirt utility function we gave it at the outset? Why can’t we just pass the fully-grown superintelligence the buck by instilling in the seed the instruction: ‘When you’re smart enough to understand Friendliness Theory, ditch the values you started with and just self-modify to become Friendly.’?
Because that sentence has to actually be coded in to the AI, and when we do so, there’s no ghost in the machine to know exactly what we mean by ‘frend-lee-ness thee-ree’. Instead, we have to give it criteria we think are good indicators of Friendliness, so it’ll know what to self-modify toward.
My claim is that if we mess up on those indicators of friendliness — the criteria the AI-in-progress uses to care about (i.e., factor into its decisions) self-modification toward safety — then it won’t edit itself to care about those factors later, even if it’s figured out that that’s what we would have wanted (and that doing what we want is part of this ‘friendliness’ thing we failed to program it to value).
Loosemore discussed this with me on Less Wrong and on this blog, then went on to explain his view in more detail in the new essay. His new argument is that MIRI and other AGI theorists and forecasters think “AI is supposed to be hardwired with a Doctrine of Logical Infallibility,” meaning “it is incapable of considering the hypothesis that its own reasoning engine may not have taken it to a sensible place”.
Loosemore thinks that if we reject this doctrine, the AI will “understand that many of its more abstract logical atoms have a less than clear denotation or extension in the world”. In addition to recognizing that its reasoning process is fallible, it will recognize that its understanding of terms is fallible and revisable. This includes terms in its representation of its own goals; so the AI will improve its understanding of what it values over time. Since its programmers’ intention was for the AI to have a positive impact on the world, the AI will increasingly come to understand this fact about its values, and will revise its policies to match its (improved interpretation of its) values.
The main problem with this argument occurs at the phrase “understand this fact about its values”. The sentence starts by talking about the programmers’ values, yet it ends by calling this a fact about the AI’s values.
Consider a human trying to understand her parents’ food preferences. As she develops a better model of what her parents mean by ‘delicious,’ of their taste receptors and their behaviors, she doesn’t necessarily replace her own food preferences with her parents’. If her food choices do change as a result, there will need to be some added mechanism that’s responsible — e.g., she will need a specific goal like ‘modify myself to like what others do’.
We can make the point even stronger by considering minds that are alien to each other. If a human studies the preferences of a nautilus, she probably won’t acquire them. Likewise, a human who studies the ‘preferences’ (selection criteria) of an optimization process like natural selection needn’t suddenly abandon her own. It’s not an impossibility, but it depends on the human’s having a very specific set of prior values (e.g., an obsession with emulating animals or natural processes). For the same reason, most decision criteria a recursively self-improving AI could possess wouldn’t cause it to ditch its own values in favor of ours.
If no amount of insight into biology would make you want to steer clear of contraceptives and optimize purely for reproduction, why expect any amount of insight into human values to compel an AGI to abandon all its hopes and dreams and become a humanist? ‘We created you to help humanity!’ we might protest. Yet if evolution could cry out ‘I created you to reproduce!’, we would be neither rationally obliged nor psychologically impelled to comply. There isn’t any theorem of decision theory or probability theory saying ‘rational agents must promote the same sorts of outcomes as the processes that created them, else fail in formally defined tasks’.
Epistemic and instrumental fallibility v. moral fallibility
I don’t know of any actual AGI researcher who endorses Loosemore’s “Doctrine of Logical Infallibility”. (He equates Muehlhauser and Helm’s “Literalness” doctrine with Infallibility in passing, but the link isn’t clear to me, and I don’t see any argument for the identification. The Doctrine is otherwise uncited.) One of the main organizations he critiques, MIRI, actually specializes in researching formal agents that can’t trust their own reasoning, or can’t trust the reasoning of future versions of themselves. This includes work on logical uncertainty (briefly introduced here, at length here) and ’tiling’ self-modifying agents (here).
Loosemore imagines a programmer chiding an AI for the “design error” of pursuing human-harming goals. The human tells the AI that it should fix this error, since it fixed other errors in its code. But Loosemore is conflating programming errors the human makes with errors of reasoning the AI makes. He’s assuming unargued that flaws in an agent’s epistemic and instrumental rationality are of a kind with defects in its moral character or docility.
Any efficient goal-oriented system has convergent instrumental reasons to fix ‘errors of reasoning’ of the kind that are provably obstacles to its own goals. Bostrom discusses this in “The Superintelligent Will,” and Omohundro discusses it in “Rational Artificial Intelligence for the Greater Good,” under the name ‘Basic AI Drives’.
‘Errors of reasoning,’ in the relevant sense, aren’t just things humans think are bad. They’re general obstacles to achieving any real-world goal, and ‘correct reasoning’ is an attractor for systems (e.g., self-improving humans, institutions, or AIs) that can alter their own ability to achieve such goals. If a moderately intelligent self-modifying program lacks the goal ‘generally avoid confirmation bias’ or ‘generally avoid acquiring new knowledge when it would put my life at risk,’ it will add that goal (or something tantamount to it) to its goal set, because it’s instrumental to almost any other goal it might have started with.
On the other hand, if a moderately intelligent self-modifying AI lacks the goal ‘always and forever do exactly what my programmer would ideally wish,’ the number of goals for which it’s instrumental to add that goal to the set is very small, relative to the space of all possible goals. This is why MIRI is worried about AGI; ‘defer to my programmer’ doesn’t appear to be an attractor goal in the way ‘improve my processor speed’ and ‘avoid jumping off cliffs’ are attractor goals. A system that appears amazingly ‘well-designed’ (because it keeps hitting goal after goal of the latter sort) may be poorly-designed to achieve any complicated outcome that isn’t an instrumental attractor, including safety protocols. This is the basis for disaster scenarios like Bostrom on AI deception.
That doesn’t mean that ‘defer to my programmer’ is an impossible goal. It’s just something we have to do the hard work of figuring out ourselves; we can’t delegate the entire task to the AI. It’s a mathematical open problem to define a way for adaptive autonomous AI with otherwise imperfect motivations to defer to programmer oversight and not look for loopholes in its restrictions. People at MIRI and FHI have been thinking about this issue for the past few years; there’s not much published about the topic, though I notice Yudkowsky mentions issues in this neighborhood off-hand in a 2008 blog post about morality.
Do what I mean by ‘do what I mean’!
Loosemore doesn’t discuss in any technical detail how an AI could come to improve its goals over time, but one candidate formalism is Daniel Dewey’s value learning. Following Dewey’s work, Bostrom notes that this general approach (‘outsource some of the problem to the AI’s problem-solving ability’) is promising, but needs much more fleshing out. Bostrom discusses some potential obstacles to value learning in his new book Superintelligence (pp. 192-201):
[T]he difficulty is not so much how to ensure that the AI can understand human intentions. A superintelligence should easily develop such understanding. Rather, the difficulty is ensuring that the AI will be motivated to pursue the described values in the way we intended. This is not guaranteed by the AI’s ability to understand our intentions: an AI could know exactly what we meant and yet be indifferent to that interpretation of our words (being motivated instead by some other interpretation of the words or being indifferent to our words altogether).
The difficulty is compounded by the desideratum that, for reasons of safety, the correct motivation should ideally be installed in the seed AI before it becomes capable of fully representing human concepts or understanding human intentions.
We do not know how to build a general intelligence whose goals are a stable function of human brain states, or patterns of ink on paper, or any other encoding of our preferences. Moreover, merely making the AGI’s goals a function of brain states or ink marks doesn’t help if we make it the wrong function. If the AGI starts off with the wrong function, there’s no reason to expect it to self-correct in the direction of the right one, because (a) having the right function is a prerequisite for caring about self-modifying toward the relevant kind of ‘rightness,’ and (b) having goals that are an ersatz function of human brain-states or ink marks seems consistent with being superintelligent (e.g., with having veridical world-models).
When Loosemore’s hypothetical programmer attempts to argue her AI into friendliness, the AI replies, “I don’t care, because I have come to a conclusion, and my conclusions are correct because of the Doctrine of Logical Infallibility.” MIRI and FHI’s view is that the AI’s actual reply (assuming it had some reason to reply, and to be honest) would invoke something more like “the Doctrine of Not-All-Children-Assigning-Infinite-Value-To-Obeying-Their-Parents.” The task ‘across arbitrary domains, get an AI-in-progress to defer to its programmers when its programmers dislike what it’s doing’ is poorly understood, and looks extremely difficult. Getting a corrigible AI of that sort to ‘learn’ the right values is a second large problem. Loosemore seems to treat corrigibility as trivial, and to equate corrigibility with all other AGI goal content problems.
A random AGI self-modifying to improve its own efficiency wouldn’t automatically self-modify to acquire the values of its creators. We have to actually do the work of coding the AI to have a safe decision-making subsystem. Loosemore is right that it’s desirable for the AI to incrementally learn over time what its values are, so we can make some use of its intelligence to solve the problem; but raw intelligence on its own isn’t the solution, since we need to do the work of actually coding the AI to value executing the desired interpretation of our instructions.
“Correct interpretation” and “instructions” are both monstrously difficult to turn into lines of code. And, crucially, we can’t pass the buck to the superintelligence here. If you can teach an AI to “do what I mean,” you can proceed to teach it anything else; but if you can’t teach it to “do what I mean,” you can’t get the bootstrapping started. In particular, it’s a pretty sure bet you also can’t teach it “do what I mean by ‘do what I mean'”.
Unless you can teach it to do what you mean, teaching it to understand what you mean won’t help. Even teaching an AI to “do what you believe I mean” assumes that we can turn the complex concept “mean” into code.
I’ll run more quickly through some other points Loosemore makes:
a. He criticizes Legg and Hutter’s definition of ‘intelligence,’ arguing that it trivially applies to an unfriendly AI that self-destructs. However, Legg and Hutter’s definition seems to (correctly) exclude agents that self-destruct. On the face of it, Loosemore should be criticizing MIRI for positing an unintelligent AGI, not for positing a trivially intelligent AGI. For a fuller discussion, see Legg and Hutter’s “A Collection of Definitions of Intelligence“.
b. He argues that safe AGI would be “swarm-like,” with elements that are “unpredictably dependent” on non-representational “internal machinery,” because “logic-based AI” is “brittle”. This seems to contradict the views of many specialists in present-day high-assurance AI systems. As Gerwin Klein writes, “everything that makes it easier for humans to think about a system, will help to verify it.” Indiscriminately adding uncertainty or randomness or complexity to a system makes it harder to model the system and check that it has required properties. It may be less “brittle” in some respects, but we have no particular reason to expect safety to be one of those respects. For a fuller discussion, see Muehlhauser’s “Transparency in Safety-Critical Systems“.
c. MIRI thinks we should try to understand safety-critical general reasoning systems as far in advance as possible, and mathematical logic and rational agent models happen to be useful tools on that front. However, MIRI isn’t invested in “logical AI” in the manner of Good Old-Fashioned AI. Yudkowsky and other MIRI researchers are happy to use neural networks when they’re useful for solving a given problem, and equally happy to use other tools for problems neural networks aren’t well-suited to. For a fuller discussion, see Yudkowsky’s “The Nature of Logic” and “Logical or Connectionist AI?“
d. One undercurrent of Loosemore’s article is that we should model AI after humans. MIRI and FHI worry that this would be very unsafe if it led to neuromorphic AI. On the other hand, modeling AI very closely after human brains (approaching the fidelity of whole-brain emulation) might well be a safer option than de novo AI. For a fuller discussion, see Bostrom’s Superintelligence.
On the whole, Loosemore’s article doesn’t engage much with the arguments of other AI theorists regarding risks from AGI.
Assigning less than 5% probability to ‘cows are moral patients’ strikes me as really overconfident. Ditto, assigning greater than 95% probability. (A moral patient is something that can be harmed or benefited in morally important ways, though it may not be accountable for its actions in the way a moral agent is.)
I’m curious how confident others are, and I’m curious about the most extreme confidence levels they’d consider ‘reasonable’.
I also want to hear more about what theories and backgrounds inform people’s views. I’ve seen some relatively extreme views defended recently, and the guiding intuitions seem to have come from two sources:
(1) How complicated is consciousness? In the space of possible minds, how narrow a target is consciousness?
Humans seem to be able to have very diverse experiences — dreams, orgasms, drug-induced states — that they can remember in some detail, and at least appear to be conscious during. That’s some evidence that consciousness is robust to modification and can take many forms. So, perhaps, we can expect a broad spectrum of animals to be conscious.
But what would our experience look like if it were fragile and easily disrupted? There would probably still be edge cases. And, from inside our heads, it would look like we had amazingly varied possibilities for experience — because we couldn’t use anything but our own experience as a baseline. It certainly doesn’t look like a human brain on LSD differs as much from a normal human brain as a turkey brain differs from a human brain.
There’s some risk that we’re overestimating how robust consciousness is, because when we stumble on one of the many ways to make a human brain unconscious, we (for obvious reasons) don’t notice it as much. Drastic changes in unconscious neurochemistry interest us a lot less than minor tweaks to conscious neurochemistry.
And there’s a further risk that we’ll underestimate the complexity of consciousness because we’re overly inclined to trust our introspection and to take our experience at face value. Even if our introspection is reliable in some domains, it has no access to most of the necessary conditions for experience. So long as they lie outside our awareness, we’re likely to underestimate how parochial and contingent our consciousness is.
(2) How quick are you to infer consciousness from ‘intelligent’ behavior?
People are pretty quick to anthropomorphize superficially human behaviors, and our use of mental / intentional language doesn’t clearly distinguish between phenomenal consciousness and behavioral intelligence. But if you work on AI, and have an intuition that a huge variety of systems can act ‘intelligently’, you may doubt that the linkage between human-style consciousness and intelligence is all that strong. If you think it’s easy to build a robot that passes various Turing tests without having full-fledged first-person experience, you’ll also probably (for much the same reason) expect a lot of non-human species to arrive at strategies for intelligently planning, generalizing, exploring, etc. without invoking consciousness. (Especially if your answer to question 1 is ‘consciousness is very complex’. Evolution won’t put in the effort to make a brain conscious unless it’s extremely necessary for some reproductive advantage.)
… But presumably there’s some intelligent behavior that was easier for a more-conscious brain than for a less-conscious one — at least in our evolutionary lineage, if not in all possible lineages that reproduce our level of intelligence. We don’t know what cognitive tasks forced our ancestors to evolve-toward-consciousness-or-perish. At the outset, there’s no special reason to expect that task to be one that only arose for proto-humans in the last few million years.
Even if we accept that the machinery underlying human consciousness is very complex, that complex machinery could just as easily have evolved hundreds of millions of years ago, rather than tens of millions. We’d then expect it to be preserved in many nonhuman lineages, not just in humans. Since consciousness-of-pain is mostly what matters for animal welfare (not, e.g., consciousness-of-complicated-social-abstractions), we should look into hypotheses like:
first-person consciousness is an adaptation that allowed early brains to represent simple policies/strategies and visualize plan-contingent sensory experiences.
Do we have a specific cognitive reason to think that something about ‘having a point of view’ is much more evolutionarily necessary for human-style language or theory of mind than for mentally comparing action sequences or anticipating/hypothesizing future pain? If not, the data of ethology plus ‘consciousness is complicated’ gives us little reason to favor the one view over the other.
We have relatively direct positive data showing we’re conscious, but we have no negative data showing that, e.g., salmon aren’t conscious. It’s not as though we’d expect them to start talking or building skyscrapers if they were capable of experiencing suffering — at least, any theory that predicts as much has some work to do to explain the connection. At present, it’s far from obvious that the world would look any different than it does even if all vertebrates were conscious.
So… the arguments are a mess, and I honestly have no idea whether cows can suffer. The probability seems large enough to justify ‘don’t torture cows (including via factory farms)’, but that’s a pretty low bar, and doesn’t narrow the probability down much.
To the extent I currently have a favorite position, it’s something like: ‘I’m pretty sure cows are unconscious on any simple, strict, nondisjunctive definition of “consciousness;” but what humans care about is complicated, and I wouldn’t be surprised if a lot of ‘unconscious’ information-processing systems end up being counted as ‘moral patients’ by a more enlightened age. … But that’s a pretty weird view of mine, and perhaps deserves a separate discussion.
I could conclude with some crazy video of a corvid solving a rubik’s cube or an octopus breaking into a bank vault or something, but I somehow find this example of dog problem-solving more compelling:
Oxford philosopher Nick Bostrom has argued, in “The Superintelligent Will,” that advanced AIs are likely to diverge in their terminal goals (i.e., their ultimate decision-making criteria), but converge in some of their instrumental goals (i.e., the policies and plans they expect to indirectly further their terminal goals). An arbitrary superintelligent AI would be mostly unpredictable, except to the extent that nearly all plans call for similar resources or similar strategies. The latter exception may make it possible for us to do some long-term planning for future artificial agents.
Bostrom calls the idea that AIs can have virtually any goal the orthogonality thesis, and he calls the idea that there are attractor strategies shared by almost any goal-driven system (e.g., self-preservation, knowledge acquisition) the instrumental convergence thesis.
Bostrom fleshes out his worries about smarter-than-human AI in the book Superintelligence: Paths, Dangers, Strategies, which came out in the US a few days ago. He says much more there about the special technical and strategic challenges involved in general AI. Here’s one of the many scenarios he discusses, excerpted:
[T]he orthogonality thesis suggests that we cannot blithely assume that a superintelligence will necessarily share any of the final values stereotypically associated with wisdom and intellectual development in humans — scientific curiosity, benevolent concern for others, spiritual enlightenment and contemplation, renunciation of material acquisitiveness, a taste for refined culture or for the simple pleasures in life, humility and selflessness, and so forth. We will consider later whether it might be possible through deliberate effort to construct a superintelligence that values such things, or to build one that values human welfare, moral goodness, or any other complex purpose its designers might want it to serve. But it is no less possible — and in fact technically a lot easier — to build a superintelligence that places final value on nothing but calculating the decimal expansion of pi. This suggests that — absent a specific effort — the first superintelligence may have some such random or reductionistic final goal.
[… T]he instrumental convergence thesis entails that we cannot blithely assume that a superintelligence with the final goal of calculating the decimals of pi (or making paperclips, or counting grains of sand) would limit its activities in such a way as not to infringe on human interests. An agent with such a final goal would have a convergent instrumental reason, in many situations, to acquire an unlimited amount of physical resources and, if possible, to eliminate potential threats to itself and its goal system. Human beings might constitute potential threats; they certainly constitute physical resources. […]
It might seem incredible that a project would build or release an AI into the world without having strong grounds for trusting that the system will not cause an existential catastrophe. It might also seem incredible, even if one project were so reckless, that wider society would not shut it down before it (or the AI it was building) attains a decisive strategic advantage. But as we shall see, this is a road with many hazards. […]
With the help of the concept of convergent instrumental value, we can see the flaw in one idea for how to ensure superintelligence safety. The idea is that we validate the safety of a superintelligent AI empirically by observing its behavior while it is in a controlled, limited environment (a “sandbox”) and that we only let the AI out of the box if we see it behaving in a friendly, cooperative, responsible manner.
The flaw in this idea is that behaving nicely while in the box is a convergent instrumental goal for friendly and unfriendly AIs alike. An unfriendly AI of sufficient intelligence realizes that its unfriendly final goals will be best realized if it behaves in a friendly manner initially, so that it will be let out of the box. It will only start behaving in a way that reveals its unfriendly nature when it no longer matters whether we find out; that is, when the AI is strong enough that human opposition is ineffectual.
Consider also a related set of approaches that rely on regulating the rate of intelligence gain in a seed AI by subjecting it to various kinds of intelligence tests or by having the AI report to its programmers on its rate of progress. At some point, an unfriendly AI may become smart enough to realize that it is better off concealing some of its capability gains. It may underreport on its progress and deliberately flunk some of the harder tests, in order to avoid causing alarm before it has grown strong enough to attain a decisive strategic advantage. The programmers may try to guard against this possibility by secretly monitoring the AI’s source code and the internal workings of its mind; but a smart-enough AI would realize that it might be under surveillance and adjust its thinking accordingly. The AI might find subtle ways of concealing its true capabilities and its incriminating intent. (Devising clever escape plans might, incidentally, also be a convergent strategy for many types of friendly AI, especially as they mature and gain confidence in their own judgments and capabilities. A system motivated to promote our interests might be making a mistake if it allowed us to shut it down or to construct another, potentially unfriendly AI.)
We can thus perceive a general failure mode, wherein the good behavioral track record of a system in its juvenile stages fails utterly to predict its behavior at a more mature stage. Now, one might think that the reasoning described above is so obvious that no credible project to develop artificial general intelligence could possibly overlook it. But one should not be too overconfident that this is so.
Consider the following scenario. Over the coming years and decades, AI systems become gradually more capable and as a consequence find increasing real-world application: they might be used to operate trains, cars, industrial and household robots, and autonomous military vehicles. We may suppose that this automation for the most part has the desired effects, but that the success is punctuated by occasional mishaps — a driverless truck crashes into oncoming traffic, a military drone fires at innocent civilians. Investigations reveal the incidents to have been caused by judgment errors by the controlling AIs. Public debate ensues. Some call for tighter oversight and regulation, others emphasize the need for research and better-engineered systems — systems that are smarter and have more common sense, and that are less likely to make tragic mistakes. Amidst the din can perhaps also be heard the shrill voices of doomsayers predicting many kinds of ill and impending catastrophe. Yet the momentum is very much with the growing AI and robotics industries. So development continues, and progress is made. As the automated navigation systems of cars become smarter, they suffer fewer accidents; and as military robots achieve more precise targeting, they cause less collateral damage. A broad lesson is inferred from these observations of real-world outcomes: the smarter the AI, the safer it is. It is a lesson based on science, data, and statistics, not armchair philosophizing. Against this backdrop, some group of researchers is beginning to achieve promising results in their work on developing general machine intelligence. The researchers are carefully testing their seed AI in a sandbox environment, and the signs are all good. The AI’s behavior inspires confidence — increasingly so, as its intelligence is gradually increased.
At this point, any remaining Cassandra would have several strikes against her:
i A history of alarmists predicting intolerable harm from the growing capabilities of robotic systems and being repeatedly proven wrong. Automation has brought many benefits and has, on the whole, turned out safer than human operation.
ii A clear empirical trend: the smarter the AI, the safer and more reliable it has been. Surely this bodes well for a project aiming at creating machine intelligence more generally smart than any ever built before — what is more, machine intelligence that can improve itself so that it will become even more reliable.
iii Large and growing industries with vested interests in robotics and machine intelligence. These fields are widely seen as key to national economic competitiveness and military security. Many prestigious scientists have built their careers laying the groundwork for the present applications and the more advanced systems being planned.
iv A promising new technique in artificial intelligence, which is tremendously exciting to those who have participated in or followed the research. Although safety issues and ethics are debated, the outcome is preordained. Too much has been invested to pull back now. AI researchers have been working to get to human-level artificial intelligence for the better part of a century: of course there is no real prospect that they will now suddenly stop and throw away all this effort just when it finally is about to bear fruit.
v The enactment of some safety rituals, whatever helps demonstrate that the participants are ethical and responsible (but nothing that significantly impedes the forward charge).
vi A careful evaluation of seed AI in a sandbox environment, showing that it is behaving cooperatively and showing good judgment. After some further adjustments, the test results are as good as they could be. It is a green light for the final step . . .
And so we boldly go — into the whirling knives.
We observe here how it could be the case that when dumb, smarter is safe; yet when smart, smarter is more dangerous. There is a kind of pivot point, at which a strategy that has previously worked excellently suddenly starts to backfire.
For more on terminal goal orthogonality, see Stuart Armstrong’s “General Purpose Intelligence“. For more on instrumental goal convergence, see Steve Omohundro’s “Rational Artificial Intelligence for the Greater Good“.
Eliezer Yudkowsky has written a delightful series of posts (originally on the economics blog Overcoming Bias) about why partisan debates are so frequently hostile and unproductive. Particularly incisive is A Fable of Science and Politics.
One of the broader points Eliezer makes is that, while political issues are important, political discussion isn’t the best place to train one’s ability to look at issues objectively and update on new evidence. The way I’d put it is that politics is hard mode; it takes an extraordinary amount of discipline and skill to communicate effectively in partisan clashe.
This jibes with my own experience; I’m much worse at arguing politics than at arguing other things. And psychological studies indicate that politics is hard mode even (or especially!) for political veterans; see Taber & Lodge (2006).
Eliezer’s way of putting the same point is (riffing off of Dune): ‘Politics is the Mind-Killer.’ An excerpt from that blog post:
Politics is an extension of war by other means. Arguments are soldiers. Once you know which side you’re on, you must support all arguments of that side, and attack all arguments that appear to favor the enemy side; otherwise it’s like stabbing your soldiers in the back — providing aid and comfort to the enemy. […]
I’m not saying that I think Overcoming Bias should be apolitical, or even that we should adopt Wikipedia’s ideal of the Neutral Point of View. But try to resist getting in those good, solid digs if you can possibly avoid it. If your topic legitimately relates to attempts to ban evolution in school curricula, then go ahead and talk about it — but don’t blame it explicitly on the whole Republican Party; some of your readers may be Republicans, and they may feel that the problem is a few rogues, not the entire party. As with Wikipedia’s NPOV, it doesn’t matter whether (you think) the Republican Party really is at fault. It’s just better for the spiritual growth of the community to discuss the issue without invoking color politics.
Scott Alexander fleshes out why it can be dialogue-killing to attack big groups (even when the attack is accurate) in another blog post, Weak Men Are Superweapons. And Eliezer expands on his view of partisanship in follow-up posts like The Robbers Cave Experiment and Hug the Query.
Some people involved in political advocacy and activism have objected to the “mind-killer” framing. Miri Mogilevsky of Brute Reason explained on Facebook:
My usual first objection is that it seems odd to single politics out as a “mind-killer” when there’s plenty of evidence that tribalism happens everywhere. Recently, there has been a whole kerfuffle within the field of psychology about replication of studies. Of course, some key studies have failed to replicate, leading to accusations of “bullying” and “witch-hunts” and what have you. Some of the people involved have since walked their language back, but it was still a rather concerning demonstration of mind-killing in action. People took “sides,” people became upset at people based on their “sides” rather than their actual opinions or behavior, and so on.
Unless this article refers specifically to electoral politics and Democrats and Republicans and things (not clear from the wording), “politics” is such a frightfully broad category of human experience that writing it off entirely as a mind-killer that cannot be discussed or else all rationality flies out the window effectively prohibits a large number of important issues from being discussed, by the very people who can, in theory, be counted upon to discuss them better than most. Is it “politics” for me to talk about my experience as a woman in gatherings that are predominantly composed of men? Many would say it is. But I’m sure that these groups of men stand to gain from hearing about my experiences, since some of them are concerned that so few women attend their events.
In this article, Eliezer notes, “Politics is an important domain to which we should individually apply our rationality — but it’s a terrible domain in which to learn rationality, or discuss rationality, unless all the discussants are already rational.” But that means that we all have to individually, privately apply rationality to politics without consulting anyone who can help us do this well. After all, there is no such thing as a discussant who is “rational”; there is a reason the website is called “Less Wrong” rather than “Not At All Wrong” or “Always 100% Right.” Assuming that we are all trying to be more rational, there is nobody better to discuss politics with than each other.
The rest of my objection to this meme has little to do with this article, which I think raises lots of great points, and more to do with the response that I’ve seen to it — an eye-rolling, condescending dismissal of politics itself and of anyone who cares about it. Of course, I’m totally fine if a given person isn’t interested in politics and doesn’t want to discuss it, but then they should say, “I’m not interested in this and would rather not discuss it,” or “I don’t think I can be rational in this discussion so I’d rather avoid it,” rather than sneeringly reminding me “You know, politics is the mind-killer,” as though I am an errant child. I’m well-aware of the dangers of politics to good thinking. I am also aware of the benefits of good thinking to politics. So I’ve decided to accept the risk and to try to apply good thinking there. […]
I’m sure there are also people who disagree with the article itself, but I don’t think I know those people personally. And to add a political dimension (heh), it’s relevant that most non-LW people (like me) initially encounter “politics is the mind-killer” being thrown out in comment threads, not through reading the original article. My opinion of the concept improved a lot once I read the article.
In the same thread, Andrew Mahone added, “Using it in that sneering way, Miri, seems just like a faux-rationalist version of ‘Oh, I don’t bother with politics.’ It’s just another way of looking down on any concerns larger than oneself as somehow dirty, only now, you know, rationalist dirty.” To which Miri replied: “Yeah, and what’s weird is that that really doesn’t seem to be Eliezer’s intent, judging by the eponymous article.”
Eliezer clarified that by “politics” he doesn’t generally mean ‘problems that can be directly addressed in local groups but happen to be politically charged':
Hanson’s “Tug the Rope Sideways” principle, combined with the fact that large communities are hard to personally influence, explains a lot in practice about what I find suspicious about someone who claims that conventional national politics are the top priority to discuss. Obviously local community matters are exempt from that critique! I think if I’d substituted ‘national politics as seen on TV’ in a lot of the cases where I said ‘politics’ it would have more precisely conveyed what I was trying to say.
Even if polarized local politics is more instrumentally tractable, though, the worry remains that it’s a poor epistemic training ground. A subtler problem with banning “political” discussions on a blog or at a meet-up is that it’s hard to do fairly, because our snap judgments about what counts as “political” may themselves be affected by partisan divides. In many cases the status quo is thought of as apolitical, even though objections to the status quo are ‘political.’ (Shades of Pretending to be Wise.)
Because politics gets personal fast, it’s hard to talk about it successfully. But if you’re trying to build a community, build friendships, or build a movement, you can’t outlaw everything ‘personal.’ And selectively outlawing personal stuff gets even messier. Last year, daenerys shared anonymized stories from women, including several that discussed past experiences where the writer had been attacked or made to feel unsafe. If those discussions are made off-limits because they’re ‘political,’ people may take away the message that they aren’t allowed to talk about, e.g., some harmful or alienating norm they see at meet-ups. I haven’t seen enough discussions of this failure mode to feel super confident people know how to avoid it.
Since this is one of the LessWrong memes that’s most likely to pop up in discussions between different online communities (along with the even more ripe-for-misinterpretation “policy debates should not appear one-sided“…), as a first (very small) step, I suggest obsoleting the ‘mind-killer’ framing. It’s cute, but ‘politics is hard mode’ works better as a meme to interject into random conversations. ∵:
1. ‘Politics is hard mode’ emphasizes that ‘mind-killing’ (= epistemic difficulty) is quantitative, not qualitative. Some things might instead fall under Very Hard Mode, or under Middlingly Hard Mode…
2. ‘Hard’ invites the question ‘hard for whom?’, more so than ‘mind-killer’ does. We’re all familiar with the fact that some people and some contexts change what’s ‘hard’, so it’s a little less likely we’ll universally generalize about what’s ‘hard.’
3. ‘Mindkill’ connotes contamination, sickness, failure, weakness. ‘Hard Mode’ doesn’t imply that a thing is low-status or unworthy, so it’s less likely to create the impression (or reality) that LessWrongers or Effective Altruists dismiss out-of-hand the idea of hypothetical-political-intervention-that-isn’t-a-terrible-idea. Maybe some people do want to argue for the thesis that politics is always useless or icky, but if so it should be done in those terms, explicitly — not snuck in as a connotation.
4. ‘Hard Mode’ can’t readily be perceived as a personal attack. If you accuse someone of being ‘mindkilled’, with no context provided, that clearly smacks of insult — you appear to be calling them stupid, irrational, deluded, or similar. If you tell someone they’re playing on ‘Hard Mode,’ that’s very nearly a compliment, which makes your advice that they change behaviors a lot likelier to go over well.
5. ‘Hard Mode’ doesn’t carry any risk of evoking (e.g., gendered) stereotypes about political activists being dumb or irrational or overemotional.
6. ‘Hard Mode’ encourages a growth mindset. Maybe some topics are too hard to ever be discussed. Even so, ranking topics by difficulty still encourages an approach where you try to do better, rather than merely withdrawing. It may be wise to eschew politics, but we should not fear it. (Fear is the mind-killer.)
If you and your co-conversationalists haven’t yet built up a lot of trust and rapport, or if tempers are already flaring, conveying the message ‘I’m too rational to discuss politics’ or ‘You’re too irrational to discuss politics’ can make things worse. ‘Politics is the mind-killer’ is the mind-killer. At least, it’s a relatively mind-killing way of warning people about epistemic hazards.
‘Hard Mode’ lets you communicate in the style of the Humble Aspirant rather than the Aloof Superior. Try something in the spirit of: ‘I’m worried I’m too low-level to participate in this discussion; could you have it somewhere else?’ Or: ‘Could we talk about something closer to Easy Mode, so we can level up together?’ If you’re worried that what you talk about will impact group epistemology, I think you should be even more worried about how you talk about it.