In a December 14 comment on his blog, Scott Aaronson confessed that the idea that he gains privilege from being a man feels ‘alien to his lived experience’. Generalizing from his own story, Aaronson suggested that it makes more sense to think of shy nerdy males as a disprivileged group than as a privileged one, because such men are unusually likely to be socially isolated and stigmatized, and to suffer from mental health problems.
Here’s the thing: I spent my formative years—basically, from the age of 12 until my mid-20s—feeling not “entitled,” not “privileged,” but terrified. I was terrified that one of my female classmates would somehow find out that I sexually desired her, and that the instant she did, I would be scorned, laughed at, called a creep and a weirdo, maybe even expelled from school or sent to prison. You can call that my personal psychological problem if you want, but it was strongly reinforced by everything I picked up from my environment: to take one example, the sexual-assault prevention workshops we had to attend regularly as undergrads, with their endless lists of all the forms of human interaction that “might be” sexual harassment or assault, and their refusal, ever, to specify anything that definitely wouldn’t be sexual harassment or assault. I left each of those workshops with enough fresh paranoia and self-hatred to last me through another year. […]
Of course, I was smart enough to realize that maybe this was silly, maybe I was overanalyzing things. So I scoured the feminist literature for any statement to the effect that my fears were as silly as I hoped they were. But I didn’t find any. On the contrary: I found reams of text about how even the most ordinary male/female interactions are filled with “microaggressions,” and how even the most “enlightened” males—especially the most “enlightened” males, in fact—are filled with hidden entitlement and privilege and a propensity to sexual violence that could burst forth at any moment.
Because of my fears—my fears of being “outed” as a nerdy heterosexual male, and therefore as a potential creep or sex criminal—I had constant suicidal thoughts. As Bertrand Russell wrote of his own adolescence: “I was put off from suicide only by the desire to learn more mathematics.” At one point, I actually begged a psychiatrist to prescribe drugs that would chemically castrate me (I had researched which ones), because a life of mathematical asceticism was the only future that I could imagine for myself.
The two main responses have been Laurie Penny’s “On nerd entitlement” and Amanda Marcotte’s “MIT professor explains: The real oppression is having to learn to talk to people.” These led to a rejoinder from Scott Alexander (“Untitled“) and a follow-up by Aaronson (“What I believe“). My impression is that each response in this chain has at least partly misunderstood the preceding arguments, but I’ll do my best to summarize the state of the debate without making the same mistake, borrowing liberally from others’ comments.
1. Does feminist rhetoric bear some of the blame?
Nick Tarleton responds to Scott Aaronson’s anecdote:
Scott attributes his problems entirely(?) to feminism. I’ve had similar (milder) bad experiences, but it’s really not clear to me in retrospect how much to attribute them to gender/sex-specific cultural stuff rather than general social anxiety and fear of imposing. Within gender/sex-specific cultural stuff, it’s really not clear how much to attribute to feminism rather than not-really-feminist (patriarchal, or Victorian reversed-stupidity-patriarchal) background ideas about male sexuality being aggressive, women not wanting sex, women needing protection, and the like. (Which feminism has a complicated relationship with — most feminists would disavow those ideas, but in my experience a lot of feminist rhetoric still trades on them, out of convenience or just because they’re embedded in the ways we have of thinking and talking about gender issues and better ways haven’t propagated.)
And Alexander writes:
Laurie Penny has an easy answer to any claims that any of this is feminists’ fault: “Feminism, however, is not to blame for making life hell for ‘shy, nerdy men’. Patriarchy is to blame for that.”
I say: why can’t it be both? […]
Pick any attempt to shame people into conforming with gender roles, and you’ll find self-identified feminists leading the way. Transgender people? Feminists led the effort to stigmatize them and often still do. Discrimination against sex workers? Led by feminists. Against kinky people? Feminists again. People who have too much sex, or the wrong kind of sex? Feminists are among the jeering crowd, telling them they’re self-objectifying or reinforcing the patriarchy or whatever else they want to say. Male victims of domestic violence? It’s feminists fighting against acknowledging and helping them.
Yes, many feminists have been on both sides of these issues, and there have been good feminists tirelessly working against the bad feminists. Indeed, right now there are feminists who are telling the other feminists to lay off the nerd-shaming. My girlfriend is one of them. But that’s kind of my point. There are feminists on both sides of a lot of issues, including the important ones.
Alexander is right that “Whether or not a form of cruelty is decreed to be patriarchy doesn’t tell us how many feminists are among the people twisting the knife.”, and he’s right that people who accuse nerds of misogyny often appeal in the same breath to ableist, classist, lookist, fat-shaming, and heteronormative (!) language. Being a feminist doesn’t mean you can never be cruel to people, or never misrepresent them. Consider the way Marcotte elects to summarize Aaronson’s disclosure of his many-year struggle with mental illness:
Translation: Unwilling to actually do the work required to address my social anxiety—much less actually improve my game—I decided that it would be easier to indulge a conspiracy theory where all the women in the world, led by evil feminists, are teaching each other not to fuck me. Because bitches, yo.
Marcotte adds, “I’m not a doctor, but I can imagine that it’s nearly impossible to help someone who is more interested in blaming his testicles, feminism, women generally, or the world for his mental health problems than to actually settle down and get to work at getting better.” Or, as Ozy Frantz of Thing of Thing puts it: “how dare those mentally ill people go about having distorted and inaccurate thoughts”.
Penny’s piece too ignores the possibility that feminist discourse norms are causing any harm. Sarah Constantin of Otium responds in a Facebook comment:
So, there are women nerds who make feminism their identity. The author [Penny] is one of them. And I think you do that if nerd culture treats you badly and feminist culture treats you well. But feminist culture doesn’t treat everyone well. Sometimes it’s *full* of anti-nerd contempt.
I’m unusual in this respect, but I’m much more offended and bothered by people who don’t like how my brain works than by people who don’t like what’s between my legs. I’m more wary of feminists who I suspect of wanting to mock my personal quirks and hobble my professional success than I am of sexism in STEM. I see comments on anti-SV articles like “this is what happens when you give autistic people money and power” and I get mad. I take it personally. A lot more personally than I take insults to women. Maybe it’s not fair of me, but that’s how my emotional calculus stacks up.
Scott Aaronson is right that there is a particular kind of damage that is inflicted ONLY on men and boys [eta: and queer women/girls] who want to do right by women and do not want to be “creeps”.
In general, there is a kind of damage that is inflicted ONLY upon the morally scrupulous. If you really want to be good, the demands of altruistic or self-sacrificing goodness can be paralyzing. The extreme case of this is scrupulosity as a symptom of OCD. This is a kind of pain that simply does not affect people whose personal standards are more relaxed. […]
What actually happens is that a highly scrupulous person reads a bunch of things that seem to put moral obligations on him, with the implication that the correct amount of moral obligation is always “more,” and *never* finds any piece of feminist writing that explicitly says “this is enough, you can stop here” because there aren’t that many people period who understand that obsessive moral paralysis is a problem. And so you get Scott Aaronson and many others like him (including some women!)
What we need is people talking about the problem of obsessive moral paralysis. “Yes, you *do* have some moral obligations, but they are finite and attainable. Here are realistic examples of people acting acceptably. Here are real-world examples of good men. You can be good without being a martyr.”
Wesley Fenza of Living within Reason adds:
There is a lot to like about this piece. Penny correctly points out that women have an extra layer of marginalization on top of what Aaronson went through, and that Aaronson didn’t account for that in his comment.
However, I think the thing that rubbed me wrong about Penny’s piece is that she didn’t offer any account of the role that feminism played in Aaronson’s tortured adolescence, which is an experience unique to the privileged, and which Penny didn’t acknowledge at all. […]
Penny claims the mantle of feminism, yet she refuses to acknowledge the role that her movement played in Aaronson’s tragic story. She demands that Aaronson, as a nerdy white man, be “held to account” for the lack of women in STEM, yet refuses his call that feminism be held to account for its at-worst abusive and at-best unkind rhetoric toward people deemed “privileged.”
The thesis of Penny’s piece is that as a nerdy woman, she went through all of the hell that Aaronson did, plus extra because she’s a woman. I think if she wanted to make that claim, she should have some kind of argument that Aaronson’s unique pain somehow doesn’t count or is somehow lesser than the pain of being a woman. I don’t find that obvious, and I don’t think she even attempted to make a case for it.
I think, as feminist advocates, we are obligated to recognize the darker side of our community and its potential to cause real-world harm. Aaronson’s piece was a real, raw testimonial documenting some of that harm. Penny’s piece just seemed like she was trying to handwave it away. She was compassionate, but she ultimately didn’t seem like she was listening.
I tend to recognize this because it’s a problem I have often — when someone tells me about an issue they have, I try to relate it to my own experience. On the one hand, a measure of that is how empathy/sympathy works. But on the other hand, I have a tendency to ignore the differences that make the other person’s pain and loss unique. I feel like that may be what’s going on here.
Chana Messinger raises the possibility that the harm inflicted on some scrupulous people could be “an unfortunate but necessary side effect of spreading the right messages to everyone else”. To know whether that’s so, we’ll need to investigate how common a problem this is, and whether there are easy ways to avoid it. At this stage, however, relatively few people have acknowledged that this is a concern. I certainly wasn’t aware of it until recently, and I’m now having to rethink how I talk about moral issues.
2. Are nerds oppressed? How bad do they have it?
I know there are a couple different definitions of what exactly structural oppression is, but however you define it, I feel like people who are at much higher risk of being bullied throughout school, are portrayed by the media as disgusting and ridiculous, have a much higher risk of mental disorders, and are constantly told by mainstream society that they’re ugly and defective kind of counts. If nerdiness is defined as intelligence plus poor social skills, then it is at least as heritable as other things people are willing to count as structural oppression like homosexuality (heritability of social skills, heritability of IQ, heritability of homosexuality)[.]
The three main objections I’ve heard to this line of reasoning are that (1) the shaming and bullying nerds experience is relatively minor, (2) nerds are privileged, and (3) anti-nerd sentiment is really some combination of lookism, ableism, etc.
3 strikes me as a reasonable (though not conclusively demonstrated) position, and is still consistent with points like Frantz’s:
it is amazing how laurie penny can write this entire article without mentioning that neurodiversity is a form of oppression????
“Privilege doesn’t mean you don’t suffer, which, I know, totally blows.” except that a lot of shy nerdy men are suffering because… they lack privilege… on at least one axis
Intersectionality also suggests that anti-nerd sentiment won’t perfectly reduce to its constituent parts. ‘Nerd’ could be a composite like ‘Chinese-American lesbian’ or ‘poor transgender Muslim’, but third-wave feminist theory denies that the social significance of ‘poor transgender Muslim’ is just a conjunction of the significance of ‘poor person’, ‘transgender person’, and ‘Muslim’.
Alexander gives a good response to 2, pointing out that being Jewish (for example) can simultaneously result in being privileged and oppressed. 1 seems like an open empirical question, provided we can agree on a threshold level of harm that is required for something to qualify as ‘oppression’, ‘discrimination’, etc.
Alternatively, one might object that the ‘structures’ Alexander points to are cognitive and cultural, but not institutional. Perhaps there isn’t enough economic, legal, and political restriction on nerds for them to qualify as ‘oppressed’ in the relevant sense. (And perhaps the same is true of Jews in 21st-century America, and we should think of Jews in that context as ‘historically oppressed’ but not actively oppressed? One man’s modus ponens is another’s modus tollens.)
Of course, it could turn out that ‘shy nerds’ suffer as a group from a distinct flavor of oppression even if ‘shy male nerds’ don’t. And Messinger adds in correspondence: “However strong or weak the case for nerd oppression, the case for nerd oppression by feminists is an order of magnitude or two weaker.”
But ‘oppressed’ is in the end just a word. What’s the substantive question under debate?
If some categories of suffering are unusually intense, widespread, and preventable, it makes sense to adopt the heuristic ‘allocate more attention and sympathy to those categories’. This is the schematic reasoning behind treating triggers as qualitatively more important than aversions, or treating racism as qualitatively more important than run-of-the-mill bullying. (At least, it’s the good reasoning. There may be worse reasons on hand, such as medical essentialism and outgroup antipathy.)
However, these heuristics require some policing, or they’ll degrade in effectiveness. Once everyone agrees that ‘triggers’ demand respect, people without PTSD symptoms have an incentive to expand the ‘trigger’ concept to fit their most intense preferences. Once everyone agrees that ‘oppressed groups’ get special consideration, disadvantaged people outside conventional axes of oppression have an incentive to expand the idea of ‘oppression’. This is inevitable, even if no one is being evil. Thus we need to take into account the upkeep cost of preserving these categories’ meanings when we decide whether they’re useful.
Many people intuit that we should have different norms in Europe and the Anglophone world about when it’s OK to belittle white people as a group, versus when it’s OK to belittle black people. The former is “punching up,” the latter “punching down.” Without a clear sense of whether geeks are ‘above’ or ‘below’ us, this heuristic short-circuits here; so the practical import of this debate is how strongly we should endorse a norm ‘don’t pick on shy geeky men as a group’.
Even if geeks aren’t oppressed and their problems are much smaller than those of women, black people, LGBT people, etc., their suffering is still real, and there are probably good ways to reduce it. I don’t know what the best solution here is, but trigger warnings and carefully-labeled safe spaces may be useful for people who want to avoid discussing various forms of feminism. For public spaces, perhaps we need a new concept of ‘punching straight ahead’, and new norms for when that’s OK. I generally prefer to err on the side of niceness, but I understand the arguments for being a loud gadfly, and I don’t know of a practical way to keep memes of wrath from outcompeting pacific memes.
Alexander, however, worries that even raising the issue of punching up vs. down is a red herring. He accuses feminists of misrepresenting Scott Aaronson’s ‘my suffering is real and matters’ as ‘my suffering is the most real and most important kind of suffering':
If you look through Marcotte’s work, you find this same phrasing quite often. “Some antifeminist guy is ranting at me about how men are the ones who are really oppressed because of the draft” (source). […] But Aaronson is admitting about a hundred times that he recognizes the importance of the ways women are oppressed. The “is really oppressed” isn’t taken from him, it’s assumed by Marcotte. Her obvious worldview is – since privilege and oppression are a completely one dimensional axis, for Aaronson to claim that there is anything whatsoever that has ever been bad for men must be interpreted as a claim that they are the ones who are really oppressed and therefore women are not the ones who are really oppressed and therefore nothing whatsoever has ever been bad for women.
Alexander blames this on “Insane Moon Logic”. I find it likelier that different people, Alexander included, are just focusing on different aspects of Aaronson’s comment, to fit them into different narratives. Aaronson doesn’t deny that women are disadvantaged in various ways, but he, not Marcotte or Penny, is the person who raised the issue of whether geeks are more disprivileged than women. It shouldn’t surprise us that some eyebrows would be raised at lines like:
 Alas, as much as I try to understand other people’s perspectives, the first reference to my ‘male privilege’—my privilege!—is approximately where I get off the train, because it’s so alien to my actual lived experience.
 But I suspect the thought that being a nerdy male might not make me “privileged”—that it might even have put me into one of society’s least privileged classes—is completely alien to your way of seeing things.
 My recurring fantasy, through this period, was to have been born a woman, or a gay man, or best of all, completely asexual, so that I could simply devote my life to math, like my hero Paul Erdös did. Anything, really, other than the curse of having been born a heterosexual male, which for me, meant being consumed by desires that one couldn’t act on or even admit without running the risk of becoming an objectifier or a stalker or a harasser or some other creature of the darkness.
 As I see it, whenever these nerdy males pull themselves out of the ditch the world has tossed them into, while still maintaining enlightened liberal beliefs, including in the inviolable rights of every woman and man, they don’t deserve blame for whatever feminist shortcomings they might still have. They deserve medals at the White House.
1 appears to deny the existence of male privilege; 2 suggests that nerdy men may be “one of society’s least privileged classes”; 3 calls being a heterosexual man a “curse”; and 4 can easily be read as demanding cookies (“medals”, even) for insecure men who don’t actively reject women’s rights, no matter how glaring their “feminist shortcomings”.
Aaronson has since explained that he does believe in male privilege, and he has walked back claim 2 to just “the problem of the nerdy ‘heterosexual male’ is surely one of the worst social problems today that you can’t even acknowledge as being a problem” (emphasis added). Still, a feminist could reasonably worry that Aaronson is vacillating between a motte (‘nerds suffer too!’ or ‘there exists at least one person who was harmed by feminist rhetoric!’) and a bailey (‘nerds have it worse than all or most other groups’, or ‘pointing out problems with nerd culture is immoral’).
I hate the ‘motte’/’bailey’ framing — it encourages people to assume malice, even when we should be looking into the possibility that our conversation partner has made a mistake, or has updated their beliefs, or consists of multiple dissenting factions. But if you’re going to use the motte/bailey idea to accuse your enemies of deceit (or Moon Logic), be sure you spend at least as much time testing how readily it applies to your own side.
I don’t know whether Aaronson stands by his younger self’s belief that he would have been better off as a non-white non-heterosexual non-male. As Tarn Somervell Fletcher notes:
I’ve seen plenty of responses that seemed to have completely taken on board everything he’s [Aaronson’s] said, and just think that he’s misjudged how bad it is for some people. When you’re comparing two people’s oppression, or suffering etc. (which is a terrible terribly unproductive idea but everyone seems determined to do it anyway), the default is that both people are going to discount (or, fail to count?) the others’ experience.
I agree with Aaronson’s statement, “This whole affair makes me despair of the power of language to convey human reality” (only I came in pre-despairing). Since people are extremely bad at simulating others’ life experiences, Aaronson is likely to misunderstand how bad women, black people, trans people, etc. have it. (This is of course consistent with acknowledging the psychological importance of Aaronson’s feeling that he had it worse than everyone else.) For the same reason, a black lesbian social butterfly would be likely to misunderstand how bad Aaronson has it. If we only rely on who has the most eloquent anecdotes, rather than on reliable population-wide quality-of-life measures, we aren’t going to get very far with these discussions.
And perhaps it isn’t worth the effort, if it’s possible for us to come up with norms of discourse that work OK even when we don’t all start with perfectly accurate beliefs about people’s demographics and relative levels of privilege. Even if punching up is justifiable in principle, we may not want to come in swinging when there’s a chance we’re misappraising the situation.
- abykale on “That Scott Aaronson Thing.”
- Ozy Frantz on nerd privilege, on nerd desexualization, on My Little Pony and gender-non-confirming men, and on times it’s good to express physical or romantic desire.
- Topher Hallquist “on Laurie Penny on Scott Aaronson“.
- Scott Alexander on structural power and on bravery debates.
What can you do that would have the best chance of making the world a better place? As Scott Siskind puts the question:
Most donors say they want to “help people”. If that’s true, they should try to distribute their resources to help people as much as possible. Most people don’t.
In the “Buy A Brushstroke” campaign, eleven thousand British donors gave a total of £550,000 to keep the famous painting “Blue Rigi” in a UK museum. If they had given that £550,000 to buy better sanitation systems in African villages instead, the latest statistics suggest it would have saved the lives of about one thousand two hundred people from disease. Each individual $50 donation could have given a year of normal life back to a Third Worlder afflicted with a disabling condition like blindness or limb deformity.
Most of those 11,000 donors genuinely wanted to help people by preserving access to the original canvas of a beautiful painting. And most of those 11,000 donors, if you asked, would say that a thousand people’s lives are more important than a beautiful painting, original or no. But these people didn’t have the proper mental habits to realize that was the choice before them, and so a beautiful painting remains in a British museum and somewhere in the Third World a thousand people are dead. […]
It is important to be rational about charity for the same reason it is important to be rational about Arctic exploration: it requires the same awareness of opportunity costs and the same hard-headed commitment to investigating efficient use of resources, and it may well be a matter of life and death.
Holden Karnofsky of GiveWell notes (in this video) that it isn’t easy to spot an ineffective charity. Many popular charities are “not even failing to do good, but doing harm”. At the same time, the positive difference you can make with a carefully targeted, empirically vetted charitable donation is extraordinary. Philosopher William MacAskill voices his excitement:
Imagine you’re walking down the street and see a building on fire. You run in, kick the door down—smoke billowing—you run in and save a young child. That would be a pretty amazing day in your life: That’s a day that would stay with you forever. Who wouldn’t want to have that experience? But the most effective charities can save a life for $4,000, so many of us are lucky enough that we can save a life every year through our donations. When you’re able to achieve so much at such low cost to yourself…why wouldn’t you do that? The only reason not to is that you’re stuck in the status quo, where giving away so much of your income seems a little bit odd.
GiveWell is the top organization investigating the impact charities have upon the most disadvantaged people in the world. If you want to be confident you’re really improving the world in a concrete way, really saving lives, it’s hard to do better than following GiveWell’s new annual giving recommendations (updated December 2014). The new recommendations are that each $100 you give to charity over the next 4 months break down as follows:
$60 – Against Malaria Foundation (AMF)
$12 – GiveDirectly
$12 – Schistosomiasis Control Initiative (SCI)
$10 – GiveWell
$6 – Deworm the World Initiative (DtWI)
(The $10 to GiveWell is an operating expenses donation GiveWell is requesting separately. I’m including it in the breakdown on the assumption that if you trust GiveWell’s expertise enough to base your decisions on their research, you probably also want to support GiveWell’s ability to keep those recommendations up to date.)
The above breakdown is intended to minimize the risk that, say, AMF keeps getting swamped with donations long after it’s reached its yearly target, while donors neglect DtWI. GiveWell’s goal is that AMF receive $5 million from individual donors over the next 4 months; GiveDirectly between $1 million and $25 million; SCI $1 million; and DtWI between $500,000 and $1 million. If everyone donates in the above proportion, then every top-effectiveness charity will be equally likely to hit its minimum target.
If you want to follow this breakdown exactly, go to https://givewell.secure.nonprofitsoapbox.com/donate-to-givewell and select “Grants to recommended charities (90%) and unrestricted (10)%” under “How should we use your gift?”. If you’d rather just donate to one organization and not split it up in this way, GiveWell suggests giving to the Against Malaria Foundation; you can do so by setting “How should we use your gift?” to “Grants to recommended charities” and writing under Comments “all to AMF”.
Edit 12/31: More specifically, Elie Hassenfeld of GiveWell writes:
For donors who have a high degree of trust in and alignment with GiveWell, we recommend unrestricted gifts to GiveWell. For donors who want to support our work because they value it but are otherwise primarily interested in supporting charities based on neutral recommendations, strong evidence, etc., we recommend giving 10% of their donation to GiveWell.
What do these charities do?
AMF, GiveDirectly, SCI, and DtWI all focus on combating poverty and disease in poor regions of Africa and Asia. This isn’t an arbitrary choice; your dollar can go orders of magnitude farther in the developing world than in developed nations. Dylan Matthews of Vox writes:
GiveWell actually looked into a number of US charities, like the Nurse-Family Partnership program for infants, the KIPP chain of charter schools, and the HOPE job-training program. It found that all were highly effective, but far more cost intensive than the best foreign charities. KIPP and the Nurse-Family Partnership cost over $10,000 per child served, while deworming programs like SCI’s and Deworm the World’s generally cost about $0.50 per child treated.
AMF distributes insecticide-treated bed nets in the Democratic Republic of the Congo and other countries. This prevents transmission of malaria by mosquito bite, reducing child mortality and anemia and improving developmental outcomes. (General information on insecticide-treated nets.)
GiveDirectly makes secure cash payments to poor households they’ve vetted in Kenya and Uganda. Recipients may then use this money however they wish. This generally results in improved food security and investments with high rates of return. Direct cash transfers are a good way to avoid the common mistake of trying to micromanage the lives of people in the developing world. Impoverished individuals usually have much more robust and fine-grained knowledge of their own needs than any philanthropic organization or donor does, and they have clearer incentives to make sure every penny gets used wisely. (General information on cash transfers.)
SCI works with governments in sub-Saharan Africa to distribute deworming pills to schoolchildren, improving nutrition and developmental outcomes. DtWI does similar deworming work in India, Kenya, and Vietnam, with more focus on improving existing programs than on creating and scaling up programs. (General information on deworming.)
How do these charities compare to each other?
GiveWell publishes its evidence and reasoning process publicly so others can examine it in as much detail as they’d like and identify points of disagreement. That gives you a chance to deviate from GiveWell’s recommendations in an informed way, if you disagree with GiveWell about the tradeoffs involved. To summarize GiveWell’s take:
- Cost-effectiveness: GiveDirectly is probably the least cost-effective, in spite of transferring 87 to 90 cents per dollar donated directly into the hands of poor individuals. This is because it still appears to be cheaper to cure the worst widespread diseases than to directly alleviate the poverty of otherwise healthy people. AMF and SCI are maybe 5-10 times as effective as GiveDirectly, and DtWI may be twice as effective as SCI.
- Strength of supporting evidence: We can be relatively confident GiveDirectly is having the impact it intends to. The case for AMF is weaker, and the case for SCI is weaker still. DtWI has the weakest case, because its political focus places it more causal steps away from its goal. On the other hand, DtWI’s transparency and self-monitoring is much better than SCI’s, so there’s more likelihood we’ll notice in the future if DtWI has gone wrong than if SCI has.
- History of rolling out more program: GiveDirectly and SCI have a strong track record. AMF and DtWI have an adequate track record.
- Room for more funding: GiveDirectly is scaling up amazingly well, and could continue to make use of tens of millions more dollars this year. AMF has had difficulty finding enough places to distribute bed nets to use its funds effectively; however, it now appears to have fixed that problem and has a lot more room for funding it can use to leverage more distribution deals. DtWI and SCI have relatively little room for funding.
In their personal charitable donations, GiveWell staff generally followed the above recommendations, though several staffers gave substantially more to GiveDirectly (to reward its transparency and self-monitoring, and to be sure of having a positive impact), and less to the deworming charities. Other people who have explained how they’re factoring in GiveWell’s new recommendations include philosopher Richard Chappell, blogger Unit of Caring, consultant Chris Smith, and economist Robert Wiblin.
What are other contenders for the best causes out there?
If you’re interested in credible but less thoroughly vetted efforts to combat global poverty, you may want to look at GiveWell’s second tier of promising charities:
- Development Media International, an organization that broadcasts health information to people in the developing world on television and radio.
- GAIN’s Universal Salt Iodization program and the International Council for the Control of Iodine Deficiency Disorders Global Network, initiatives supporting governments’ and private companies’ efforts to improve children’s cognitive development through iodine supplementation.
- Living Goods, an organization that “[sells] health and household goods door-to-door in Uganda and Kenya and [provides] basic health counseling. They sell products such as treatments for malaria and diarrhea, fortified foods, water filters, bed nets, clean cook stoves and solar lights.”
Following GiveWell’s recommendations is probably the best way to measurably improve the lives of human beings who are suffering and dying today. However, the same evidence-based approach should allow us to identify relatively effective and ineffective causes in the developed world too. GiveWell is in the early stages of looking for the most urgent and tractable projects in U.S. policy, and one of their top contenders is prison reform. If you live in the U.S. and are more interested in local issues, you may want to follow the work of:
- Open Philanthropy Project, a spin-off of GiveWell that looks into general causes that may be unusually important, as opposed to specific charities that are unusually well-targeted and efficient. One of their focus areas is policy-oriented philanthropy.
- The Pew Charitable Trusts’ Public Safety Performance Project, which has helped get criminal justice reform packages passed in over two dozen states since 2007 and has recently begun a collaboration with GiveWell.
On the other hand, there are some local, activism-oriented charities that may have a much larger impact than any I’ve listed so far — charities focused on non-human animal welfare. If you aren’t just worried about human suffering, you may want to give to:
- The Humane League, a top-notch animal welfare nonprofit that discourages factory farming through outreach and advertising. They attempt to test the efficacy of their methods at Humane League Labs.
Another excellent way to try to outdo GiveWell’s recommended charities is to help fund scientific research into the life-saving innovations of the future. Historically, scientific and technological progress has had a vastly larger effect on human welfare than any philanthropy has, and this is another major area the Open Philanthropy Project hopes to investigate in the future. For now, the main scientific institute I can recommend donating to is:
- The Future of Humanity Institute, an Oxford-based research center that investigates social and technological changes that may impact our future as a species, as well as the effects of systematic uncertainty and bias on our attempts to predict such developments.
If there are interesting developments over the next year, I’ll update this advice December 2015. For now, the main organizations I recommend giving to are GiveWell and its top charities (donation page), the Humane League (donation page), or the Future of Humanity Institute (donation page), in increasing order of ‘uncertainty about the organization’s real effects’ and ‘probability of having a large positive impact’.
Edit 12/28: GiveWell has updated their donation page to include a “Grants to recommended charities (90%) and unrestricted (10)%” option. I’ve modified my above advice to make use of that new option. I’ve also started a birthday fundraiser to give to the charities I covered above.
The reason why people on tumblr over-use the concept of “trigger” rather than just “thing I don’t like” or “thing that makes me angry” or “thing that makes me sad” is that, literally, in the political/fandom part of tumblr culture are required to establish your right not to read a thing, and you only have rights if you can establish that you’re on the bad end of an axis of oppression. Hence, co-opting the language of mental illness: trigger.
i.e. trigger warning culture is a rational response to an environment in which media consumption is mandatory. It’s not hypersensitivity so much as the only way to function.
There is a secondary thing, which is, here we are all oppressed, which ties into the feeling that you only have rights if you can establish that you’re at the bad end of an axis of oppression, but I’m not sure I can totally articulate that thing.
The idea that oppression confers legitimacy does seem to be ascendant, and not just on tumblr. Hostile political debates these days often turn into arguments about which side is the injured party, with both claiming to be unfairly caricatured or oppressed. This is pretty bad if it displaces a substantive exchange of ideas, though it may be hard to fix in a society that’s correcting for bias against oppressed groups. The cure isn’t necessarily worse than the disease, though that’s a question worth looking into, as is the question of whether people can learn to see through false claims of grievance.
On the other hand, I don’t think ‘I will (mostly) disregard your non-triggering aversions’ implies ‘you only have rights to the extent you’re oppressed’. I think the deeper problem is that social interaction between strangers and acquaintances is increasingly taking place in massive common spaces, on public websites.
If we’re trapped in the same common space (e.g., because we have a lot of overlapping interests or friends), an increase in your right to freely say what you want to say inevitably means a decrease in my right to avoid hearing things I don’t want to hear. Increasing my right to only hear what I want to will likewise decrease your right to speak freely; at the very least, you’ll need to add content warnings to the things you write, which puts an increasing workload on writers’ plates as the list of reader aversions they need to keep track of grows longer. (Blogging and social media platforms also make things much more difficult, by forcing trigger warnings and content to compete for space at the start of posts.)
I don’t know of any easy, principled way to solve this problem. Readers can download software that blocks or highlights posts/websites using specific words, such as Tumblr Savior and FB Purity. Writers can adopt content warnings for the most common and most harmful trigger and aversions out there, or the ones that are too vague to be caught by word/phrase blockers.
But vague rules are hard to follow. So it’s understandable that people would gravitate toward a black-and-white ‘trigger’ v. ‘non-trigger’ dichotomy in the hope that the scientific authority and naturalness of a medical category would simplify the problem of deciding when the reader’s right-to-not-hear outweighs the writer’s right-to-speak-freely. And it’s equally understandable that people who don’t have ‘triggers’ in the strictest sense, but are still being harmed in a big way by certain things people say (or ways people say them), will want to piggyback off that heuristic once it exists.
‘Only include content warnings for triggers’ doesn’t work, because ‘trigger’ isn’t a natural kind and people mean different things by it. Give some groups an incentive to broaden the term and others an incentive to narrow it, and language will diverge even more. ‘I’ll only factor medical information into my decisions about how to be nice to people’ is rarely the right approach.
‘Always include content warnings for triggers’ doesn’t work either. There are simply too many things people are triggered by.
If we want rules that are easy to follow in extreme cases while remaining context-sensitive in mild cases, we’ll probably need some combination of
‘Here are the canonical content warnings that everyone should use in public spaces: [A], [B], [C]…’
‘If you have specific reason to think other information will harm part of your audience, the nice thing to do is to have a private conversation with some of those audience members and consider adding more content warnings. If it’s causing a lot of harm to a lot of your audience, adding content warnings transitions from “morally praiseworthy” to “morally obligatory”.’
The ambiguity and context-sensitivity of the second rule is made up for by the very clear and easy-to-follow first rule. Of course, I only provided a schema. The whole point of the first rule is to actually give concrete advice (especially for cases where you don’t know much about your audience). That project requires, if you’re going to do it right, that we collect base rate information on different aversions and triggers, find a not-terrible way of ranking them by ‘suffering caused’, and find a consensus threshold for ‘how much suffering it’s OK for a random content generator to cause in public spaces’.
That wouldn’t obviate the need for safe spaces where the content is more carefully controlled, but it would hopefully make movies, books, social media, etc. safe and enjoyable for nearly everyone, without requiring people to just stop talking about painful topics.
Impulse buying is a thing. We have ready-made clichés for picking it out. Analogously, ‘impulse giving’ is a thing, where you’re spontaneously moved by compassion to help someone out without any advance planning. The problem with most impulse giving is that it gives you the same warm glow and sense of moral license as high-impact giving, without making as much of a difference. Peter Singer puts it best:
My experience with the effective altruism community is that they don’t do much to encourage impulse giving of any kind. If you can give to low-impact charities in the heat of the moment, you should be able to do the same for high-impact charities; yet I think of ‘giving effectively’ as affectively cold, carefully budgeted.
This is probably mostly a good thing. We want people to think carefully about their big decisions, if it improves decision quality. However, the stereotype has its disadvantages. If people think they need to go through a long process of deliberation before they can give, they can end up procrastinating indefinitely. Borrowing Haidt’s analogy, we’re discouraging the elephant (our system-1 emotions and intuitions) from getting passionate and worked up about the most important things we do, while encouraging the elephant’s rider (our system-2 reasoning and deliberation) to overanalyze and agonize over decisions.
Effective altruism as it exists today is aligned with the legions and principalities of Order. I’d bet we can change that in some respects, if we so wish, without giving up our allegiance to Goodness.
Eliezer Yudkowsky suggests that we “purchase fuzzies and utilons separately“. Better to spend some of your time on feel-good do-gooding and some on optimal high-impact do-gooding, rather than pursuing them simultaneously and doing a terrible job at both. In “Harry Potter and the Fuzzies of Altruism“, I noted that there are different kinds of fuzzies people can get for doing good.
One of these varieties is particularly valuable, because it doesn’t need to be purchased separately. I speak of the slytherfuzzy, that warm glow you get from being especially efficient and effective. Do-gooders who find cool, calculated pragmatism strongly motivating in its own right have an obvious leg up. I myself am more motivated by narrative, novelty, and love-of-neighbor than by Winning, but I’d love to find a way to steal that trick and bind my own reward center more tightly to humanitarian accomplishment.
If you’re trying to make yourself (or others) more enthusiastic about purchasing utilons, it may be helpful to make the way you buy utilons as fuzzy-producing as possible. This needn’t dilute the outcome. Select a charity based on a sober cost-benefit analysis, but give chaotically, if chaos happens to gel with your psychology. Impulse giving and effective altruism don’t have to be placed in separate mental boxes forever; we can invent new categories of behavior that wed Chaos Altruism’s giddy spontaneity to Order Altruism’s focus and rigor.
I’d expect mixed approaches to work best. E.g., you can settle on a fixed percentage of your income to give to a high-impact cause every year, but build a habit of giving bonus donations to that cause when the mood strikes you. I’m a big fan of using specific benevolence triggers. For example: ‘When someone on the street asks me for money, and I feel an urge to give them $X, give $X to a high-impact charity (whether or not I also give money to the individual who asked).’ Leah Libresco and Michael Blume make good use of this kind of ‘nudged giving’.
But I think we should also normalize whimsical, untriggered high-impact giving. If we start thinking of evidence-based humanitarianism as the kind of thing you can splurge on, I suspect we’ll come to see do-gooding as more of a fun opportunity and less of a burden.
Some people think of their philanthropy as a personal passion that drives them to excel, as in Holden Karnofsky’s “excited altruism“. Others think of their philanthropy as a universal moral obligation they’re striving to meet, as in Eliezer’s “one life against the world“. Try to fit all philanthropists into the ‘passion’ box, and you’ll get a contingent that feels cut off from what makes this work important; try to fit them all into the ‘obligation’ box, and you’ll get a contingent that feels burdened with a dour or guilt-inducing chore.
Likewise, there are important points of divergence between do-gooders who are motivated by different kinds of warm fuzzy (or hot blazing, or cool gliding, or wiggly sparkling…) feelings. I’m more of an obligation-based altruist, but I still find the ‘excited altruist’ framing useful. That I think in moralistic terms doesn’t say much about the specific feelings that drive me to do good in the moment.
My moralism also leaves open what feelings I should emphasize if I want to transfer my enthusiasm to others.
The ice bucket challenge is an example of memetically successful Chaos Altruism. Ditto today’s Giving Tuesday event, though an annual event is relatively compatible with the reign of Order. Will McAskill and Timothy Ogden have criticized these memes as possibly counterproductive, but it’s not obvious to me that the ineffectiveness of these events stems from their viral or ad-hoc character. Instead, those same attributes could be very valuable if they were targeted at more urgent causes.
McAskill and Ogden draw attention to the fact that charitable donations have been stuck at 2% of U.S. GDP for 40 years now. People (on average) seem to change where they donate, but not how much they donate. One approach to doing better, than, would be to redirect that 2% to worthier interventions.
At the same time, the success of the giving pledge shows that some people can be inspired to increase their donations. Perhaps we haven’t been able to rise above 2% because charities are too busy competing with each other to focus their advertising ingenuity on growing the pie. Perhaps some deep change in people’s mindset is needed; I’ll note that households giving to religious nonprofits donate twice as much. Relatedly, the key may be to shift entire (small) communities to giving more, so giving more is the norm among everyone you know. Then expand those supergiver tribes into neighboring social networks.
Experimenting with playful, unorthodox, and personalized modes of altruism seems like it could be useful for finding ways to make inroads in new communities. Over the next few years, I think we should place more focus on self-experimentation and object-level research than on outreach; but we should still keep in mind that we need a better handle on human motivation if we’re going to completely restructure the way charity is done. For that reason, I’m eager to hear whether any aspiring effective altruists find Chaos approaches attractive.
Vegans: If the meat eaters believed what you did about animal sentience, most of them would be vegans, and they would be horrified by their many previous murders. Your heart-wrenching videos aren’t convincing to them because they aren’t already convinced that animals can feel.
Meat-eaters: Vegans think there are billions of times more people on this planet than you do, they believe you’re eating a lot of those people, and they care about every one of them the way you care about every human. […]
Finally, let me tell you about what happens when you post a heart-wrenching video of apparent animal suffering: It works, if the thing you’re trying to do is make me feel terrible. My brain anthropomorphizes everything at the slightest provocation. Pigs, cows, chickens, mollusks, worms, bacteria, frozen vegetables, and even rocks. And since I know that it’s quite easy to get me to deeply empathize with a pet rock, I know better than to take those feelings as evidence that the apparently suffering thing is in fact suffering. If you posted videos of carrots in factory farms and used the same phrases to describe their miserable lives and how it’s all my fault for making the world this terrible place where oodles of carrots are murdered constantly, I’d feel the same way. So these arguments do not tend to be revelatory of truth.
I’ve argued before that non-human animals’ abilities to self-monitor, learn, collaborate, play, etc. aren’t clear evidence that they have a subjective, valenced point of view on the world. Until we’re confident we know what specific physical behaviors ‘having a subjective point of view’ evolved to produce — what cognitive problem phenomenal consciousness solves — we can’t confidently infer consciousness from the overt behaviors of infants, non-human animals, advanced AI, anesthetized humans, etc.
[I]f you work on AI, and have an intuition that a huge variety of systems can act ‘intelligently’, you may doubt that the linkage between human-style consciousness and intelligence is all that strong. If you think it’s easy to build a robot that passes various Turing tests without having full-fledged first-person experience, you’ll also probably (for much the same reason) expect a lot of non-human species to arrive at strategies for intelligently planning, generalizing, exploring, etc. without invoking consciousness. (Especially if [you think consciousness is very complex]. Evolution won’t put in the effort to make a brain conscious unless it’s extremely necessary for some reproductive advantage.)
That said, I don’t think any of this is even superficially an adequate justification for torturing, killing, and eating human infants, intelligent aliens, or cattle.
The intellectual case against meat-eating is pretty air-tight
To argue from ‘we don’t understand the cognitive basis for consciousness’ to ‘it’s OK to eat non-humans’ is acting as though our ignorance were positive knowledge we could confidently set down our weight on. Even if you have a specific cognitive model that predicts ‘there’s an 80% chance cattle can’t suffer,’ you have to be just as cautious as you’d be about torturing a 20%-likely-to-be-conscious person in a non-vegetative coma, or a 20%-likely-to-be-conscious alien. And that’s before factoring in your uncertainty about the arguments for your model.
The argument for not eating cattle, chickens, etc. is very simple:
1. An uncertainty-about-animals premise, e.g.: We don’t know enough about how cattle cognize, and about what kinds of cognition make things moral patients, to assign a less-than-1-in-20 subjective probability to ‘factory-farmed cattle undergo large quantities of something-morally-equivalent-to-suffering’.
2. An altruism-in-the-face-of-uncertainty premise, e.g.: You shouldn’t do things that have a 1-in-20 (or greater) chance of contributing to large amounts of suffering, unless the corresponding gain is huge. E.g., you shouldn’t accept $100 to flip a switch that 95% of the time does nothing and 5% of the time nonconsensually tortures an adult human for 20 minutes.
3. An eating-animals-doesn’t-have-enormous-benefits premise.
4. An eating-animals-is-causally-linked-to-factory-farming premise.
5. So don’t eat the animals in question.
This doesn’t require us to indulge in anthropomorphism or philosophical speculation. And Brienne’s updates to her post suggest she now agrees a lot of meat-eaters we know assign a non-negligible probability to ‘cattle can suffer’. (Also, kudos to Brienne on not only changing her mind about an emotionally fraught issue extremely rapidly, but also changing the original post. A lot of rationalists who are surprisingly excellent at updating their beliefs don’t seem to fully appreciate the value of updating the easy-to-Google public record of their beliefs to cut off the spread of falsehoods.)
This places intellectually honest meat-eating effective altruists in a position similar to Richard Dawkins':
[I’m] in a very difficult moral position. I think you have a very, very strong point when you say that anybody who eats meat has a very, very strong obligation to think seriously about it. And I don’t find any very good defense. I find myself in exactly the same position as you or I would have been — well, probably you wouldn’t have been, but I might have been — 200 years ago, talking about slavery. […T]here was a time when it was simply the norm. Everybody did it. Some people did it with gusto and relish; other people, like Jefferson, did it reluctantly. I would have probably done it reluctantly. I would have sort of just gone along with what society does. It was hard to defend then, yet everybody did it. And that’s the sort of position I find myself in now. […] I live in a society which is still massively speciesist. Intellectually I recognize that, but I go along with it the same way I go along with celebrating Christmas and singing Christmas carols.
Until I see solid counter-arguments — not just counter-arguments to ‘animals are very likely conscious,’ but to the much weaker formulation needed to justify veg(etari)anism — I’ll assume people are mostly eating meat because it’s tasty and convenient and accepted-in-polite-society, not because they’re morally indifferent to torturing puppies behind closed doors.
Why isn’t LessWrong extremely veg(etari)an?
On the face of it, LessWrong ought to be leading the pack in veg(etari)anism. A lot of LessWrong’s interests and values look like they should directly cash out in a concern for animal welfare:
transhumanism and science fiction: If you think aliens and robots and heavily modified posthumans can be moral patients, you should be more open to including other nonhumans in your circle of concern.
superrationality: Veg(etari)anism benefits from an ability to bind my future self to my commitments, and from a Kantian desire to act as I’d want other philosophically inclined people in my community to act.
utilitarianism: Animals causes are admirably egalitarian and scope-sensitive.
taking ideas seriously: If you’re willing to accept inconvenient conclusions even when they’re based in abstract philosophy, that gives more power to theoretical arguments for worrying about animal cognition even if you can’t detect or imagine that cognition yourself.
distrusting the status quo: Veg(etari)anism remains fairly unpopular, and societal inertia is an obvious reason why.
distrusting ad-hoc intuitions: It may not feel desperately urgent to stop buying hot dogs, but you shouldn’t trust that intuition, because it’s self-serving and vulnerable to e.g. status quo bias. This is a lot of how LessWrong goes about ‘taking ideas seriously'; one should ‘shut up and multiply’ even when a conclusion is counter-intuitive.
Yet only about 15% of LessWrong is vegetarian (compared to 4-13% of the Anglophone world, depending on the survey). By comparison, the average ‘effective altruist’ LessWronger donated $2503 to charity in 2013; 9% of LessWrongers have been to a CFAR class; and 4% of LessWrongers are signed up for cryonics (and another 24% would like to be signed up). These are much larger changes relative to the general population, where maybe 1 in 150,000 people are signed up for cryonics.
I can think of a few reasons for the discrepancy:
(a) Cryonics, existential risk, and other LessWrong-associated ideas have techy, high-IQ associations, in terms of their content and in terms of the communities that primarily endorse them. They’re tribal markers, not just attempts to maximize expected utility; and veg(etari)ans are seen as belonging to other tribes, like progressive political activists and people who just want to hug every cat.
(b) Those popular topics have been strongly endorsed and argued for by multiple community leaders appealing to emotional language and vivid prose. It’s one thing to accept cryonics and vegetarianism as abstract arguments, and another thing to actually change your lifestyle based on the argument; the latter took a lot of active pushing and promotion. (The abstract argument is important; but it’s a necessary condition for action, not a sufficient one. You can’t just say ‘I’m someone who takes ideas seriously’ and magically stop reasoning motivatedly in all contexts.)
(c) Veg(etari)anism isn’t weird and obscure enough. If you successfully sign up for cryonics, LessWrong will treat you like an intellectual and rational elite, a rare person who actually thinks clearly and acts accordingly. If you successfully donate 10% of your income to GiveWell, ditto; even though distributing deworming pills isn’t sexy and futuristic, it’s obscure enough (and supported by enough community leaders, per (b)) that it allows you to successfully signal that you’re special. If 10% of the English-speaking world donated to GiveWell or were signed up for cryonics, my guess is that LessWrongers would be too bored by those topics to rush to sign up even if the cryonics and deworming organizations had scaled up in ways that made marginal dollars more effective. Maybe you’d get 20% to sign up for cryonics, but you wouldn’t get 50% or 90%.
(d) Changing your diet is harder than spending lots of money. Where LessWrongers excel, it’s generally via once-off or sporadic spending decisions that don’t have a big impact on your daily life. (‘Successfully employing CFAR techniques’ may be an exception to this rule, if it involves reinvesting effort every single day or permanently skipping out on things you enjoy; but I don’t know how many LessWrongers do that.)
If those hypotheses are right, it might be possible to shift LessWrong types more toward veganism by improving its status in the community and making the transition to veganism easier and less daunting.
What would make a transhumanist excited about this?
I’ll conclude with various ideas for bridging the motivation gap. Note that it doesn’t follow from ‘the gap is motivational’ that posting a bunch of videos of animal torture to LessWrong or the Effective Altruism Forum is the best way to stir people’s hearts. When intellectual achievement is what you trust and prize, you’re more likely to be moved to action by things that jibe with that part of your identity.
Write stunningly beautiful, rigorous, philosophically sophisticated things that are amazing and great
I’m not primarily thinking of writing really good arguments for veg(etari)anism; as I noted above, the argument is almost too clear-cut. It leaves very little to talk about in any detail, especially if we want something that hasn’t been discussed to death on LessWrong before. However, there are still topics in the vicinity to address, such as ‘What is the current state of the evidence about the nutrition of veg(etari)an diets?’ Use Slate Star Codex as a model, and do your very best to actually portray the state of the evidence, including devoting plenty of attention to any ways veg(etari)an diets might turn out to be unhealthy. (EDIT: Soylent is popular with this demographic and is switching to a vegan recipe, so it might be especially useful to evaluate its nutritional completeness and promote a supplemented Soylent diet.)
In the long run you’ll score more points by demonstrating how epistemically rational and even-handed you are than by making any object-level argument for veg(etari)anism. Not only will you thereby find out more about whether you’re wrong, but you’ll convince rationalists to take these ideas more seriously than if you gave a more one-sided argument in favor of a policy.
Fiction, done right, can serve a similar function. I could imagine someone writing a sci-fi story set in a future where humans have evolved into wildly different species with different perceived rights, thus translating animal welfare questions into a transhumanist idiom.
Just as the biggest risk with a blog post is of being too one-sided, the biggest risk with a story is of being too didactic and persuasion-focused. The goal is not to construct heavy-handed allegories; the goal is to make an actually good story, with moral conflicts you’re genuinely unsure about. Make things that would be worth reading even if you were completely wrong about animal ethics, and as a side-effect you’ll get people interested in the science, the philosophy, and the pragmatics of related causes.
Be positive and concrete
Frame animal welfare activism as an astonishingly promising, efficient, and uncrowded opportunity to do good. Scale back moral condemnation and guilt. LessWrong types can be powerful allies, but the way to get them on board is to give them opportunities to feel like munchkins with rare secret insights, not like latecomers to a not-particularly-fun party who have to play catch-up to avoid getting yelled at. It’s fine to frame helping animals as challenging, but the challenge should be to excel and do something astonishing, not to meet a bare standard for decency.
This doesn’t necessarily mean lowering your standards; if you actually demand more of LessWrongers and effective altruists than you do of ordinary people, you’ll probably do better than if you shot for parity. If you want to change minds in a big way, think like Berwick in this anecdote from Switch:
In 2004, Donald Berwick, a doctor and the CEO of the Institute for Healthcare Improvement (IHI), had some ideas about how to save lives—massive numbers of lives. Researchers at the IHI had analyzed patient care with the kinds of analytical tools used to assess the quality of cars coming off a production line. They discovered that the ‘defect’ rate in health care was as high as 1 in 10—meaning, for example, that 10 percent of patients did not receive their antibiotics in the speciﬁed time. This was a shockingly high defect rate—many other industries had managed to achieve performance at levels of 1 error in 1,000 cases (and often far better). Berwick knew that the high medical defect rate meant that tens of thousands of patients were dying every year, unnecessarily.
Berwick’s insight was that hospitals could beneﬁt from the same kinds of rigorous process improvements that had worked in other industries. Couldn’t a transplant operation be ‘produced’ as consistently and ﬂawlessly as a Toyota Camry?
Berwick’s ideas were so well supported by research that they were essentially indisputable, yet little was happening. He certainly had no ability to force any changes on the industry. IHI had only seventy-ﬁve employees. But Berwick wasn’t deterred.
On December 14, 2004, he gave a speech to a room full of hospital administrators at a large industry convention. He said, ‘Here is what I think we should do. I think we should save 100,000 lives. And I think we should do that by June 14, 2006—18 months from today. Some is not a number; soon is not a time. Here’s the number: 100,000. Here’s the time: June 14, 2006—9 a.m.’
The crowd was astonished. The goal was daunting. But Berwick was quite serious about his intentions. He and his tiny team set out to do the impossible.
IHI proposed six very speciﬁc interventions to save lives. For instance, one asked hospitals to adopt a set of proven procedures for managing patients on ventilators, to prevent them from getting pneumonia, a common cause of unnecessary death. (One of the procedures called for a patient’s head to be elevated between 30 and 45 degrees, so that oral secretions couldn’t get into the windpipe.)
Of course, all hospital administrators agreed with the goal to save lives, but the road to that goal was ﬁlled with obstacles. For one thing, for a hospital to reduce its ‘defect rate,’ it had to acknowledge having a defect rate. In other words, it had to admit that some patients were dying needless deaths. Hospital lawyers were not keen to put this admission on record.
Berwick knew he had to address the hospitals’ squeamishness about admitting error. At his December 14 speech, he was joined by the mother of a girl who’d been killed by a medical error. She said, ‘I’m a little speechless, and I’m a little sad, because I know that if this campaign had been in place four or ﬁve years ago, that Josie would be ﬁne…. But, I’m happy, I’m thrilled to be part of this, because I know you can do it, because you have to do it.’ Another guest on stage, the chair of the North Carolina State Hospital Association, said: ‘An awful lot of people for a long time have had their heads in the sand on this issue, and it’s time to do the right thing. It’s as simple as that.’
IHI made joining the campaign easy: It required only a one-page form signed by a hospital CEO. By two months after Berwick’s speech, over a thousand hospitals had enrolled. Once a hospital enrolled, the IHI team helped the hospital embrace the new interventions. Team members provided research, step-by-step instruction guides, and training. They arranged conference calls for hospital leaders to share their victories and struggles with one another. They encouraged hospitals with early successes to become ‘mentors’ to hospitals just joining the campaign.
The friction in the system was substantial. Adopting the IHI interventions required hospitals to overcome decades’ worth of habits and routines. Many doctors were irritated by the new procedures, which they perceived as constricting. But the adopting hospitals were seeing dramatic results, and their visible successes attracted more hospitals to join the campaign.
Eighteen months later, at the exact moment he’d promised to return—June 14, 2006, at 9 a.m.—Berwick took the stage again to announce the results: ‘Hospitals enrolled in the 100,000 Lives Campaign have collectively prevented an estimated 122,300 avoidable deaths and, as importantly, have begun to institutionalize new standards of care that will continue to save lives and improve health outcomes into the future.’
The crowd was euphoric. Don Berwick, with his 75-person team at IHI, had convinced thousands of hospitals to change their behavior, and collectively, they’d saved 122,300 lives—the equivalent of throwing a life preserver to every man, woman, and child in Ann Arbor, Michigan.
This outcome was the fulfillment of the vision Berwick had articulated as he closed his speech eighteen months earlier, about how the world would look when hospitals achieved the 100,000 lives goal:
‘And, we will celebrate. Starting with pizza, and ending with champagne. We will celebrate the importance of what we have undertaken to do, the courage of honesty, the joy of companionship, the cleverness of a field operation, and the results we will achieve. We will celebrate ourselves, because the patients whose lives we save cannot join us, because their names can never be known. Our contribution will be what did not happen to them. And, though they are unknown, we will know that mothers and fathers are at graduations and weddings they would have missed, and that grandchildren will know grandparents they might never have known, and holidays will be taken, and work completed, and books read, and symphonies heard, and gardens tended that, without our work, would have been only beds of weeds.’
As an added bonus, emphasizing excellence and achievement over guilt and wickedness can decrease the odds that you’ll make people feel hounded or ostracized for not immediately going vegan. I expressed this worry in Virtue, Public and Private, e.g., for people with eating disorders that restrict their dietary choices. This is also an area where ‘just be nice to people’ is surprisingly effective.
If you want to propagate a modest benchmark, consider: “After every meal where you eat an animal, donate $1 to the Humane League.” Seems like a useful way to bootstrap toward veg(etari)anism, and it fits the mix of economic mindfulness and virtue cultivation that a lot of rationalists find appealing. This sort of benchmark is forgiving without being shapeless or toothless. If you want to propagate an audacious vision for the future, consider: “There were 1200 meat-eaters on LessWrong in the 2013 survey; if we could get them to consume 30% less meat from land animals over the next 10 years, we could prevent 100,000 deaths (mostly chickens). Let’s shoot for that.” Combining an audacious vision with a simple, actionable policy should get the best results.
Embrace weird philosophies
Here’s an example of the special flavor LessWrong-style animal activism could develop:
Are there any animal welfare groups that emphasize the abyssal otherness of the nonhuman mind? That talk about the impossible dance, the catastrophe of shapeless silence that lies behind a cute puppy dog’s eyes? As opposed to talking about how ‘sad’ or ‘loving’ the puppies are?
I think I’d have a much, much easier time talking about the moral urgency of animal suffering without my Anthropomorphism Alarms going off if I were part of a community like ‘Lovecraftians for the Ethical Treatment of Animals’.
This is philosophically sound and very relevant, since our uncertainty about animal cognition is our best reason to worry about their welfare. (This is especially true when we consider the possibility that non-humans might suffer more than any human can.) And, contrary to popular misconceptions, the Lovecraftian perspective is more about profound otherness than about nightmarish evil. Rejecting anthropomorphism makes the case for veg(etari)anism stronger; and adopting that sort of emotional distance, paradoxically, is the only way to get LessWrong types interested and the only way to build trust.
Yet when I expressed an interest in this nonstandard perspective on animal well-being, I got responses from effective animal altruists like (paraphrasing):
- ‘Your endorsement of Lovecraftian animal rights sounds like an attack on animal rights; so here’s my defense of the importance of animal rights…’
- ‘No, viewing animal psychology as alien and unknown is scientifically absurd. We know for a fact that dogs and chickens experience human-style suffering. (David Pearce adds: Also lampreys!)’
- ‘That’s speciesist!’
Confidence about animal psychology (in the direction of ‘it’s relevantly human-like’) and extreme uncertainty about animal psychology can both justify prioritizing animal welfare; but when you’re primarily accustomed to seeing uncertainty about animal psychology used as a rationalization for neglecting animals, it will take increasing amounts of effort to keep the policy proposal and the question-of-fact mentally distinct. Encourage more conceptual diversity and pursue more lines of questioning for their own sake, and you end up with a community that’s able to benefit more from cross-pollination with transhumanists and mainline effective altruists and, further, one that’s epistemically healthier.
I’ve been going to Val’s rationality dojo for CFAR workshop alumni, and I found a kind-of-similar-to-this exercise useful:
- List a bunch of mental motions — situational responses, habits, personality traits — you wish you could possess or access at will. Visualize small things you imagine would be different about you if you were making more progress toward your goals.
- Make these skills things you could in principle just start doing right now, like ‘when my piano teacher shuts the door at the end of our weekly lessons, I’ll suddenly find it easy to install specific if-then triggers for what times I’ll practice piano that week.’ Or ‘I’ll become a superpowered If-Then Robot, the kind of person who always thinks to use if-then triggers when she needs to keep up with a specific task.’ Not so much ‘I suddenly become a piano virtuoso’ or ‘I am impervious to projectile weapons’.
- Optionally, think about a name or visualization that would make you personally excited and happy to think and talk about the virtuous disposition you desire. For example, when I think about the feeling of investing in a long-term goal in a manageable, realistic way, one association that spring to mind for me is the word healthy. I also visualize a solid forward motion, with my friends and life-as-a-whole relaxedly keeping pace. If I want to frame this habit as a Powerful Technique, maybe I’ll call it ‘Healthiness-jutsu’.
Here’s a grab bag of other things I’d like to start being better at:
1. casual responsibility – Freely and easily noticing and attending to my errors, faults, and obligations, without melodrama. Keeping my responsibility in view without making a big deal about it, beating myself up, or seeking a Grand Resolution. Just, ‘Yup, those are some of the things on the List. They matter. Next question?’
2. rigorous physical gentleness – My lower back is recovering from surgery. I need to consistently work to incrementally strengthen it, while being very careful not to overdo it. Often this means avoiding fun strenuous exercise, which can cause me to start telling frailty narratives to myself and psych myself out of relatively boring-but-sustainable exercise. So I’m mentally combining the idea of a boot camp with the idea of a luxurious spa: I need to be militaristic and zealous about always pampering and caring for and moderately-enhancing myself, without fail, dammit. It takes grit to be that patient and precise and non-self-destructive.
3. tsuyoku naritai – I am the naive but tenacious-and-hard-working protagonist-with-an-aura-of-destiny in a serial. I’ll face foes beyond my power — cinematic obstacles, yielding interesting, surprising failures — and I’ll learn, and grow. My journey is just beginning. I will become stronger.
4. trust – Disposing of one of my biggest practical obstacles to tsuyoku naritai. Feeling comfortable losing; feeling safe and luminous about vulnerability. Building five-second habits and social ties that make growth-mindset weakness-showing normal.
5. outcome pumping – “What you actually end up doing screens off the clever reason why you’re doing it.” Past a certain point, it just doesn’t matter exactly why or exactly how; it matters what. If I somehow find myself studying mathematics for 25 minutes a day over four months. and that is hugely rewarding, it’s almost beside the point what cognitive process I used to get there. I don’t need to have a big cause or justification for doing the awesome thing; I can just do it. Right now, in fact.
6. do the thing – Where outcome pumping is about ‘get it done and who cares about method’, I associate thing-doing with ‘once I have a plan/method/rule, do that. Follow though.’ You did the thing yesterday? Good. Do the thing today. Thing waits for no man. — — — You’re too [predicate]ish or [adjective]some to do the thing? That’s perfectly fine. Go do the thing.
When I try to visualize a shiny badass hybrid Competence Monster with all of these superpowers, I get something that looks like this. Your memetico-motivational mileage may vary.
7. sword of clear sight – Inner bullshit detector, motivated stopping piercer, etc. A thin blade cleanly divorces my person from unhealthy or not-reflectively-endorsed inner monologues. Martial arts metaphors don’t always work for me, but here they definitely feel right.
8. ferocity – STRIKE right through the obstacle. Roar. Spit fire, smash things, surge ahead. A whipping motion — a sudden SPIKE in focused agency — YES. — YES, IT MUST BE THAT TIME AGAIN. CAPS LOCK FEELS UNBELIEVABLY APPROPRIATE. … LET’S DO THIS.
9. easy response – With a sense of lightness and fluid motion-right-to-completion, immediately execute each small task as it arises. Breathe as normal. No need for a to-do list or burdensome juggling act; with no particular fuss or exertion, it is already done.
10. revisit the mountain – Take a break to look at the big picture. Ponder your vision for the future. Write blog posts like this. I’m the kind of person who benefits a lot from periodically looking back over how I’m doing and coming up with handy new narratives.
These particular examples probably won’t match your own mental associations and goals. I’d like to see your ideas; and feel free to steal from and ruthlessly alter entries on my own or others’ lists!
Effective altruists have been discussing animal welfare rather a lot lately, on a few different levels:
1. object-level: How likely is it that conventional food animals suffer?
2. philanthropic: Compared to other causes, how important is non-human animal welfare? How effective are existing organizations and programs in this area? Should effective altruists concentrate attention and resources here?
3. personal-norm: Is it morally acceptable for an individual to use animal products? How important is it to become a vegetarian or vegan?
4. group-norm: Should effective altruist meetings and conventions serve non-vegan food? Should the effective altruist movement rally to laud vegans and/or try to make all effective altruists go vegan?
These questions are all linked, but I’ll mostly focus on 4. For catered EA events, I think it makes sense to default to vegan food whenever feasible, and order other dishes only if particular individuals request them. I’m not a vegan myself, but I think this sends a positive message — that we respect the strength of vegans’ arguments, and the large stakes if they’re right, more than we care about non-vegans’ mild aesthetic preferences.
My views about trying to make as many EAs as possible go vegan are more complicated. As a demonstration of personal virtue, I’d put ‘become a vegan’ in the same (very rough) category as:
- have no carbon footprint.
- buy no product whose construction involved serious exploitation of labor.
- give 10+% of your income to a worthy cause.
- avoid lifestyle choices that have an unsustainable impact on marine life.
- only use antibiotics as a last (or almost-last) resort, so as not to contribute to antibiotic resistance.
- do your best to start a career in effective altruism.
Arguments could be made that many of these are morally obligatory for nearly all people. And most people dismiss these policies too hastily, overestimating the action’s difficulty and underestimating its urgency. Yet, all the same, I’m not confident any of these is universally obligatory — and I’m confident that it’s not a good idea to issue blanket condemnations of everyone who fails to live up to some or all of the above standards, nor to make these actions minimal conditions for respectable involvement in EA.
People with eating disorders can have good grounds for not immediately going vegan. Immunocompromised people can have good grounds for erring on the side of overusing medicine. People trying to dig their way out of debt while paying for a loved one’s medical bills can have good grounds not to give to charity every year.
The deeper problem with treating these as universal Standards of Basic Decency in our community isn’t that we’d be imposing an unreasonable demand on people. It’s that we’d be forcing lots of people to disclose very sensitive details about their personal lives to a bunch of strangers or to the public Internet — physical disabilities, mental disabilities, personal tragedies, intense aversions…. Putting people into a tight spot is a terrible way to get them on board with any of the above proposals, and it’s a great way to make people feel hounded and unsafe in their social circles.
No one’s suggested casting all non-vegans out of our midst. I have, however, heard recent complaints from people who have disabilities that make it unusually difficult to meet some of the above Standards, and who have become less enthusiastic about EA as a result of feeling socially pressured or harangued by EAs to immediately restructure their personal lives. So I think this is something to be aware of and nip in the bud.
In principle, there’s no crisp distinction between ‘personal life’ and ‘EA activities’. There may be lots of private details about a person’s life that would constitute valuable Bayesian evidence about their character, and there may be lots of private activities whose humanitarian impact over a lifetime adds up to be quite large.
Even taking that into account, we should adopt (quasi-)deontic heuristics like ‘don’t pressure people into disclosing a lot about their spending, eating, etc. habits.’ Ends don’t justify means among humans. For the sake of maximizing expected utility, lean toward not jabbing too much at people’s boundaries, and not making it hard for them to have separate private and public lives — even for the sake of maximizing expected utility.
Edit (9/1): Mason Hartman gave the following criticism of this post:
I think putting people into a tight spot is not only not a terrible way to get people on board with veganism, but basically the only way to make a vegan of anyone who hasn’t already become one on their own by 18. Most people like eating meat and would prefer not to be persuaded to stop doing it. Many more people are aware of the factory-like reality of agriculture in 2014 than are vegans. Quietly making the information available to those who seek it out is the polite strategy, but I don’t think it’s anywhere near the most effective one. I’m not necessarily saying we should trade social comfort for greater efficacy re: animal activism, but this article disappoints in that it doesn’t even acknowledge that there is a tradeoff.
Also, all of our Standards of Basic Decency put an “unreasonable demand” (as defined in Robby’s post) on some people. All of them. That doesn’t necessarily mean we’ve made the wrong decision by having them.
In reply: The strategy that works best for public outreach won’t always be best for friends and collaborators, and it’s the latter I’m talking about. I find it a lot more plausible that open condemnation and aggressive uses of social pressure work well for strangers on the street than that they work well for coworkers, romantic partners, etc. (And I’m pretty optimistic that there are more reliable ways to change the behavior of the latter sorts of people, even when they’re past age 18.)
It’s appropriate to have a different set of norms for people you regularly interact with, assuming it’s a good idea to preserve those relationships. This is especially true when groups and relationships involve complicated personal and professional dynamics. I focused on effective altruism because it’s the sort of community that could be valuable, from an animal-welfare perspective, even if a significant portion of the community makes bad consumer decisions. That makes it likelier that we could agree on some shared group norms even if we don’t yet agree on the same set of philanthropic or individual norms.
I’m not arguing that you shouldn’t try to make all EAs vegans, or get all EAs to give 10+% of their income to charity, or make EAs’ purchasing decisions more labor- or environment-friendly in other respects. At this point I’m just raising a worry that should constrain how we pursue those goals, and hopefully lead to new ideas about how we should promote ‘private’ virtue. I’d expect strategies that are very sensitive to EAs’ privacy and boundaries to work better, in that I’d expect them to make it easier for a diverse community of researchers and philanthropists to grow in size, to grow in trust, to reason together, to progressively alter habits and beliefs, and to get some important work done even when there are serious lingering disagreements within the community.
Richard Loosemore recently wrote an essay criticizing worries about AI safety, “The Maverick Nanny with a Dopamine Drip“. (Subtitle: “Debunking Fallacies in the Theory of AI Motivation”.) His argument has two parts. First:
1. Any AI system that’s smart enough to pose a large risk will be smart enough to understand human intentions, and smart enough to rewrite itself to conform to those intentions.
2. Any such AI will be motivated to edit itself and remove ‘errors’ from its own code. (‘Errors’ is a large category, one that includes all mismatches with programmer intentions.)
3. So any AI system that’s smart enough to pose a large risk will be motivated to spontaneously overwrite its utility function to value whatever humans value.
4. Therefore any powerful AGI will be fully safe / friendly, no matter how it’s designed.
5. Logical AI is brittle and inefficient.
6. Neural-network-inspired AI works better, and we know it’s possible, because it works for humans.
7. Therefore, if we want a domain-general problem-solving machine, we should move forward on Loosemore’s proposal, called ‘swarm relaxation intelligence.’
Combining these two conclusions, we get:
8. Since AI is completely safe — any mistakes we make will be fixed automatically by the AI itself — there’s no reason to devote resources to safety engineering. Instead, we should work as quickly as possible to train smarter and smarter neural networks. As they get smarter, they’ll get better at self-regulation and make fewer mistakes, with the result that accidents and moral errors will become decreasingly likely.
I’m not persuaded by Loosemore’s case for point 2, and this makes me doubt claims 3, 4, and 8. I’ll also talk a little about the plausibility and relevance of his other suggestions.
Does intelligence entail docility?
Loosemore’s claim (also made in an older essay, “The Fallacy of Dumb Superintelligence“) is that an AGI can’t simultaneously be intelligent enough to pose a serious risk, but “unsophisticated” enough to disregard its programmers’ intentions. I replied last year in two blog posts (crossposted to Less Wrong).
In “The AI Knows, But Doesn’t Care” I noted that while Loosemore posits an AGI smart enough to correctly interpret natural language and model human motivation, this doesn’t bridge the gap between the ability to perform a task and the motivation, the agent’s decision criteria. In “The Seed is Not the Superintelligence,” I argued, concerning recursively self-improving AI (seed AI):
When you write the seed’s utility function, you, the programmer, don’t understand everything about the nature of human value or meaning. That imperfect understanding remains the causal basis of the fully-grown superintelligence’s actions, long after it’s become smart enough to fully understand our values.
Why is the superintelligence, if it’s so clever, stuck with whatever meta-ethically dumb-as-dirt utility function we gave it at the outset? Why can’t we just pass the fully-grown superintelligence the buck by instilling in the seed the instruction: ‘When you’re smart enough to understand Friendliness Theory, ditch the values you started with and just self-modify to become Friendly.’?
Because that sentence has to actually be coded in to the AI, and when we do so, there’s no ghost in the machine to know exactly what we mean by ‘frend-lee-ness thee-ree’. Instead, we have to give it criteria we think are good indicators of Friendliness, so it’ll know what to self-modify toward.
My claim is that if we mess up on those indicators of friendliness — the criteria the AI-in-progress uses to care about (i.e., factor into its decisions) self-modification toward safety — then it won’t edit itself to care about those factors later, even if it’s figured out that that’s what we would have wanted (and that doing what we want is part of this ‘friendliness’ thing we failed to program it to value).
Loosemore discussed this with me on Less Wrong and on this blog, then went on to explain his view in more detail in the new essay. His new argument is that MIRI and other AGI theorists and forecasters think “AI is supposed to be hardwired with a Doctrine of Logical Infallibility,” meaning “it is incapable of considering the hypothesis that its own reasoning engine may not have taken it to a sensible place”.
Loosemore thinks that if we reject this doctrine, the AI will “understand that many of its more abstract logical atoms have a less than clear denotation or extension in the world”. In addition to recognizing that its reasoning process is fallible, it will recognize that its understanding of terms is fallible and revisable. This includes terms in its representation of its own goals; so the AI will improve its understanding of what it values over time. Since its programmers’ intention was for the AI to have a positive impact on the world, the AI will increasingly come to understand this fact about its values, and will revise its policies to match its (improved interpretation of its) values.
The main problem with this argument occurs at the phrase “understand this fact about its values”. The sentence starts by talking about the programmers’ values, yet it ends by calling this a fact about the AI’s values.
Consider a human trying to understand her parents’ food preferences. As she develops a better model of what her parents mean by ‘delicious,’ of their taste receptors and their behaviors, she doesn’t necessarily replace her own food preferences with her parents’. If her food choices do change as a result, there will need to be some added mechanism that’s responsible — e.g., she will need a specific goal like ‘modify myself to like what others do’.
We can make the point even stronger by considering minds that are alien to each other. If a human studies the preferences of a nautilus, she probably won’t acquire them. Likewise, a human who studies the ‘preferences’ (selection criteria) of an optimization process like natural selection needn’t suddenly abandon her own. It’s not an impossibility, but it depends on the human’s having a very specific set of prior values (e.g., an obsession with emulating animals or natural processes). For the same reason, most decision criteria a recursively self-improving AI could possess wouldn’t cause it to ditch its own values in favor of ours.
If no amount of insight into biology would make you want to steer clear of contraceptives and optimize purely for reproduction, why expect any amount of insight into human values to compel an AGI to abandon all its hopes and dreams and become a humanist? ‘We created you to help humanity!’ we might protest. Yet if evolution could cry out ‘I created you to reproduce!’, we would be neither rationally obliged nor psychologically impelled to comply. There isn’t any theorem of decision theory or probability theory saying ‘rational agents must promote the same sorts of outcomes as the processes that created them, else fail in formally defined tasks’.
Epistemic and instrumental fallibility v. moral fallibility
I don’t know of any actual AGI researcher who endorses Loosemore’s “Doctrine of Logical Infallibility”. (He equates Muehlhauser and Helm’s “Literalness” doctrine with Infallibility in passing, but the link isn’t clear to me, and I don’t see any argument for the identification. The Doctrine is otherwise uncited.) One of the main organizations he critiques, MIRI, actually specializes in researching formal agents that can’t trust their own reasoning, or can’t trust the reasoning of future versions of themselves. This includes work on logical uncertainty (briefly introduced here, at length here) and ’tiling’ self-modifying agents (here).
Loosemore imagines a programmer chiding an AI for the “design error” of pursuing human-harming goals. The human tells the AI that it should fix this error, since it fixed other errors in its code. But Loosemore is conflating programming errors the human makes with errors of reasoning the AI makes. He’s assuming unargued that flaws in an agent’s epistemic and instrumental rationality are of a kind with defects in its moral character or docility.
Any efficient goal-oriented system has convergent instrumental reasons to fix ‘errors of reasoning’ of the kind that are provably obstacles to its own goals. Bostrom discusses this in “The Superintelligent Will,” and Omohundro discusses it in “Rational Artificial Intelligence for the Greater Good,” under the name ‘Basic AI Drives’.
‘Errors of reasoning,’ in the relevant sense, aren’t just things humans think are bad. They’re general obstacles to achieving any real-world goal, and ‘correct reasoning’ is an attractor for systems (e.g., self-improving humans, institutions, or AIs) that can alter their own ability to achieve such goals. If a moderately intelligent self-modifying program lacks the goal ‘generally avoid confirmation bias’ or ‘generally avoid acquiring new knowledge when it would put my life at risk,’ it will add that goal (or something tantamount to it) to its goal set, because it’s instrumental to almost any other goal it might have started with.
On the other hand, if a moderately intelligent self-modifying AI lacks the goal ‘always and forever do exactly what my programmer would ideally wish,’ the number of goals for which it’s instrumental to add that goal to the set is very small, relative to the space of all possible goals. This is why MIRI is worried about AGI; ‘defer to my programmer’ doesn’t appear to be an attractor goal in the way ‘improve my processor speed’ and ‘avoid jumping off cliffs’ are attractor goals. A system that appears amazingly ‘well-designed’ (because it keeps hitting goal after goal of the latter sort) may be poorly-designed to achieve any complicated outcome that isn’t an instrumental attractor, including safety protocols. This is the basis for disaster scenarios like Bostrom on AI deception.
That doesn’t mean that ‘defer to my programmer’ is an impossible goal. It’s just something we have to do the hard work of figuring out ourselves; we can’t delegate the entire task to the AI. It’s a mathematical open problem to define a way for adaptive autonomous AI with otherwise imperfect motivations to defer to programmer oversight and not look for loopholes in its restrictions. People at MIRI and FHI have been thinking about this issue for the past few years; there’s not much published about the topic, though I notice Yudkowsky mentions issues in this neighborhood off-hand in a 2008 blog post about morality.
Do what I mean by ‘do what I mean’!
Loosemore doesn’t discuss in any technical detail how an AI could come to improve its goals over time, but one candidate formalism is Daniel Dewey’s value learning. Following Dewey’s work, Bostrom notes that this general approach (‘outsource some of the problem to the AI’s problem-solving ability’) is promising, but needs much more fleshing out. Bostrom discusses some potential obstacles to value learning in his new book Superintelligence (pp. 192-201):
[T]he difficulty is not so much how to ensure that the AI can understand human intentions. A superintelligence should easily develop such understanding. Rather, the difficulty is ensuring that the AI will be motivated to pursue the described values in the way we intended. This is not guaranteed by the AI’s ability to understand our intentions: an AI could know exactly what we meant and yet be indifferent to that interpretation of our words (being motivated instead by some other interpretation of the words or being indifferent to our words altogether).
The difficulty is compounded by the desideratum that, for reasons of safety, the correct motivation should ideally be installed in the seed AI before it becomes capable of fully representing human concepts or understanding human intentions.
We do not know how to build a general intelligence whose goals are a stable function of human brain states, or patterns of ink on paper, or any other encoding of our preferences. Moreover, merely making the AGI’s goals a function of brain states or ink marks doesn’t help if we make it the wrong function. If the AGI starts off with the wrong function, there’s no reason to expect it to self-correct in the direction of the right one, because (a) having the right function is a prerequisite for caring about self-modifying toward the relevant kind of ‘rightness,’ and (b) having goals that are an ersatz function of human brain-states or ink marks seems consistent with being superintelligent (e.g., with having veridical world-models).
When Loosemore’s hypothetical programmer attempts to argue her AI into friendliness, the AI replies, “I don’t care, because I have come to a conclusion, and my conclusions are correct because of the Doctrine of Logical Infallibility.” MIRI and FHI’s view is that the AI’s actual reply (assuming it had some reason to reply, and to be honest) would invoke something more like “the Doctrine of Not-All-Children-Assigning-Infinite-Value-To-Obeying-Their-Parents.” The task ‘across arbitrary domains, get an AI-in-progress to defer to its programmers when its programmers dislike what it’s doing’ is poorly understood, and looks extremely difficult. Getting a corrigible AI of that sort to ‘learn’ the right values is a second large problem. Loosemore seems to treat corrigibility as trivial, and to equate corrigibility with all other AGI goal content problems.
A random AGI self-modifying to improve its own efficiency wouldn’t automatically self-modify to acquire the values of its creators. We have to actually do the work of coding the AI to have a safe decision-making subsystem. Loosemore is right that it’s desirable for the AI to incrementally learn over time what its values are, so we can make some use of its intelligence to solve the problem; but raw intelligence on its own isn’t the solution, since we need to do the work of actually coding the AI to value executing the desired interpretation of our instructions.
“Correct interpretation” and “instructions” are both monstrously difficult to turn into lines of code. And, crucially, we can’t pass the buck to the superintelligence here. If you can teach an AI to “do what I mean,” you can proceed to teach it anything else; but if you can’t teach it to “do what I mean,” you can’t get the bootstrapping started. In particular, it’s a pretty sure bet you also can’t teach it “do what I mean by ‘do what I mean'”.
Unless you can teach it to do what you mean, teaching it to understand what you mean won’t help. Even teaching an AI to “do what you believe I mean” assumes that we can turn the complex concept “mean” into code.
I’ll run more quickly through some other points Loosemore makes:
a. He criticizes Legg and Hutter’s definition of ‘intelligence,’ arguing that it trivially applies to an unfriendly AI that self-destructs. However, Legg and Hutter’s definition seems to (correctly) exclude agents that self-destruct. On the face of it, Loosemore should be criticizing MIRI for positing an unintelligent AGI, not for positing a trivially intelligent AGI. For a fuller discussion, see Legg and Hutter’s “A Collection of Definitions of Intelligence“.
b. He argues that safe AGI would be “swarm-like,” with elements that are “unpredictably dependent” on non-representational “internal machinery,” because “logic-based AI” is “brittle”. This seems to contradict the views of many specialists in present-day high-assurance AI systems. As Gerwin Klein writes, “everything that makes it easier for humans to think about a system, will help to verify it.” Indiscriminately adding uncertainty or randomness or complexity to a system makes it harder to model the system and check that it has required properties. It may be less “brittle” in some respects, but we have no particular reason to expect safety to be one of those respects. For a fuller discussion, see Muehlhauser’s “Transparency in Safety-Critical Systems“.
c. MIRI thinks we should try to understand safety-critical general reasoning systems as far in advance as possible, and mathematical logic and rational agent models happen to be useful tools on that front. However, MIRI isn’t invested in “logical AI” in the manner of Good Old-Fashioned AI. Yudkowsky and other MIRI researchers are happy to use neural networks when they’re useful for solving a given problem, and equally happy to use other tools for problems neural networks aren’t well-suited to. For a fuller discussion, see Yudkowsky’s “The Nature of Logic” and “Logical or Connectionist AI?“
d. One undercurrent of Loosemore’s article is that we should model AI after humans. MIRI and FHI worry that this would be very unsafe if it led to neuromorphic AI. On the other hand, modeling AI very closely after human brains (approaching the fidelity of whole-brain emulation) might well be a safer option than de novo AI. For a fuller discussion, see Bostrom’s Superintelligence.
On the whole, Loosemore’s article doesn’t engage much with the arguments of other AI theorists regarding risks from AGI.
Assigning less than 5% probability to ‘cows are moral patients’ strikes me as really overconfident. Ditto, assigning greater than 95% probability. (A moral patient is something that can be harmed or benefited in morally important ways, though it may not be accountable for its actions in the way a moral agent is.)
I’m curious how confident others are, and I’m curious about the most extreme confidence levels they’d consider ‘reasonable’.
I also want to hear more about what theories and backgrounds inform people’s views. I’ve seen some relatively extreme views defended recently, and the guiding intuitions seem to have come from two sources:
(1) How complicated is consciousness? In the space of possible minds, how narrow a target is consciousness?
Humans seem to be able to have very diverse experiences — dreams, orgasms, drug-induced states — that they can remember in some detail, and at least appear to be conscious during. That’s some evidence that consciousness is robust to modification and can take many forms. So, perhaps, we can expect a broad spectrum of animals to be conscious.
But what would our experience look like if it were fragile and easily disrupted? There would probably still be edge cases. And, from inside our heads, it would look like we had amazingly varied possibilities for experience — because we couldn’t use anything but our own experience as a baseline. It certainly doesn’t look like a human brain on LSD differs as much from a normal human brain as a turkey brain differs from a human brain.
There’s some risk that we’re overestimating how robust consciousness is, because when we stumble on one of the many ways to make a human brain unconscious, we (for obvious reasons) don’t notice it as much. Drastic changes in unconscious neurochemistry interest us a lot less than minor tweaks to conscious neurochemistry.
And there’s a further risk that we’ll underestimate the complexity of consciousness because we’re overly inclined to trust our introspection and to take our experience at face value. Even if our introspection is reliable in some domains, it has no access to most of the necessary conditions for experience. So long as they lie outside our awareness, we’re likely to underestimate how parochial and contingent our consciousness is.
(2) How quick are you to infer consciousness from ‘intelligent’ behavior?
People are pretty quick to anthropomorphize superficially human behaviors, and our use of mental / intentional language doesn’t clearly distinguish between phenomenal consciousness and behavioral intelligence. But if you work on AI, and have an intuition that a huge variety of systems can act ‘intelligently’, you may doubt that the linkage between human-style consciousness and intelligence is all that strong. If you think it’s easy to build a robot that passes various Turing tests without having full-fledged first-person experience, you’ll also probably (for much the same reason) expect a lot of non-human species to arrive at strategies for intelligently planning, generalizing, exploring, etc. without invoking consciousness. (Especially if your answer to question 1 is ‘consciousness is very complex’. Evolution won’t put in the effort to make a brain conscious unless it’s extremely necessary for some reproductive advantage.)
… But presumably there’s some intelligent behavior that was easier for a more-conscious brain than for a less-conscious one — at least in our evolutionary lineage, if not in all possible lineages that reproduce our level of intelligence. We don’t know what cognitive tasks forced our ancestors to evolve-toward-consciousness-or-perish. At the outset, there’s no special reason to expect that task to be one that only arose for proto-humans in the last few million years.
Even if we accept that the machinery underlying human consciousness is very complex, that complex machinery could just as easily have evolved hundreds of millions of years ago, rather than tens of millions. We’d then expect it to be preserved in many nonhuman lineages, not just in humans. Since consciousness-of-pain is mostly what matters for animal welfare (not, e.g., consciousness-of-complicated-social-abstractions), we should look into hypotheses like:
first-person consciousness is an adaptation that allowed early brains to represent simple policies/strategies and visualize plan-contingent sensory experiences.
Do we have a specific cognitive reason to think that something about ‘having a point of view’ is much more evolutionarily necessary for human-style language or theory of mind than for mentally comparing action sequences or anticipating/hypothesizing future pain? If not, the data of ethology plus ‘consciousness is complicated’ gives us little reason to favor the one view over the other.
We have relatively direct positive data showing we’re conscious, but we have no negative data showing that, e.g., salmon aren’t conscious. It’s not as though we’d expect them to start talking or building skyscrapers if they were capable of experiencing suffering — at least, any theory that predicts as much has some work to do to explain the connection. At present, it’s far from obvious that the world would look any different than it does even if all vertebrates were conscious.
So… the arguments are a mess, and I honestly have no idea whether cows can suffer. The probability seems large enough to justify ‘don’t torture cows (including via factory farms)’, but that’s a pretty low bar, and doesn’t narrow the probability down much.
To the extent I currently have a favorite position, it’s something like: ‘I’m pretty sure cows are unconscious on any simple, strict, nondisjunctive definition of “consciousness;” but what humans care about is complicated, and I wouldn’t be surprised if a lot of ‘unconscious’ information-processing systems end up being counted as ‘moral patients’ by a more enlightened age. … But that’s a pretty weird view of mine, and perhaps deserves a separate discussion.
I could conclude with some crazy video of a corvid solving a rubik’s cube or an octopus breaking into a bank vault or something, but I somehow find this example of dog problem-solving more compelling:
Oxford philosopher Nick Bostrom has argued, in “The Superintelligent Will,” that advanced AIs are likely to diverge in their terminal goals (i.e., their ultimate decision-making criteria), but converge in some of their instrumental goals (i.e., the policies and plans they expect to indirectly further their terminal goals). An arbitrary superintelligent AI would be mostly unpredictable, except to the extent that nearly all plans call for similar resources or similar strategies. The latter exception may make it possible for us to do some long-term planning for future artificial agents.
Bostrom calls the idea that AIs can have virtually any goal the orthogonality thesis, and he calls the idea that there are attractor strategies shared by almost any goal-driven system (e.g., self-preservation, knowledge acquisition) the instrumental convergence thesis.
Bostrom fleshes out his worries about smarter-than-human AI in the book Superintelligence: Paths, Dangers, Strategies, which came out in the US a few days ago. He says much more there about the special technical and strategic challenges involved in general AI. Here’s one of the many scenarios he discusses, excerpted:
[T]he orthogonality thesis suggests that we cannot blithely assume that a superintelligence will necessarily share any of the final values stereotypically associated with wisdom and intellectual development in humans — scientific curiosity, benevolent concern for others, spiritual enlightenment and contemplation, renunciation of material acquisitiveness, a taste for refined culture or for the simple pleasures in life, humility and selflessness, and so forth. We will consider later whether it might be possible through deliberate effort to construct a superintelligence that values such things, or to build one that values human welfare, moral goodness, or any other complex purpose its designers might want it to serve. But it is no less possible — and in fact technically a lot easier — to build a superintelligence that places final value on nothing but calculating the decimal expansion of pi. This suggests that — absent a specific effort — the first superintelligence may have some such random or reductionistic final goal.
[… T]he instrumental convergence thesis entails that we cannot blithely assume that a superintelligence with the final goal of calculating the decimals of pi (or making paperclips, or counting grains of sand) would limit its activities in such a way as not to infringe on human interests. An agent with such a final goal would have a convergent instrumental reason, in many situations, to acquire an unlimited amount of physical resources and, if possible, to eliminate potential threats to itself and its goal system. Human beings might constitute potential threats; they certainly constitute physical resources. […]
It might seem incredible that a project would build or release an AI into the world without having strong grounds for trusting that the system will not cause an existential catastrophe. It might also seem incredible, even if one project were so reckless, that wider society would not shut it down before it (or the AI it was building) attains a decisive strategic advantage. But as we shall see, this is a road with many hazards. […]
With the help of the concept of convergent instrumental value, we can see the flaw in one idea for how to ensure superintelligence safety. The idea is that we validate the safety of a superintelligent AI empirically by observing its behavior while it is in a controlled, limited environment (a “sandbox”) and that we only let the AI out of the box if we see it behaving in a friendly, cooperative, responsible manner.
The flaw in this idea is that behaving nicely while in the box is a convergent instrumental goal for friendly and unfriendly AIs alike. An unfriendly AI of sufficient intelligence realizes that its unfriendly final goals will be best realized if it behaves in a friendly manner initially, so that it will be let out of the box. It will only start behaving in a way that reveals its unfriendly nature when it no longer matters whether we find out; that is, when the AI is strong enough that human opposition is ineffectual.
Consider also a related set of approaches that rely on regulating the rate of intelligence gain in a seed AI by subjecting it to various kinds of intelligence tests or by having the AI report to its programmers on its rate of progress. At some point, an unfriendly AI may become smart enough to realize that it is better off concealing some of its capability gains. It may underreport on its progress and deliberately flunk some of the harder tests, in order to avoid causing alarm before it has grown strong enough to attain a decisive strategic advantage. The programmers may try to guard against this possibility by secretly monitoring the AI’s source code and the internal workings of its mind; but a smart-enough AI would realize that it might be under surveillance and adjust its thinking accordingly. The AI might find subtle ways of concealing its true capabilities and its incriminating intent. (Devising clever escape plans might, incidentally, also be a convergent strategy for many types of friendly AI, especially as they mature and gain confidence in their own judgments and capabilities. A system motivated to promote our interests might be making a mistake if it allowed us to shut it down or to construct another, potentially unfriendly AI.)
We can thus perceive a general failure mode, wherein the good behavioral track record of a system in its juvenile stages fails utterly to predict its behavior at a more mature stage. Now, one might think that the reasoning described above is so obvious that no credible project to develop artificial general intelligence could possibly overlook it. But one should not be too overconfident that this is so.
Consider the following scenario. Over the coming years and decades, AI systems become gradually more capable and as a consequence find increasing real-world application: they might be used to operate trains, cars, industrial and household robots, and autonomous military vehicles. We may suppose that this automation for the most part has the desired effects, but that the success is punctuated by occasional mishaps — a driverless truck crashes into oncoming traffic, a military drone fires at innocent civilians. Investigations reveal the incidents to have been caused by judgment errors by the controlling AIs. Public debate ensues. Some call for tighter oversight and regulation, others emphasize the need for research and better-engineered systems — systems that are smarter and have more common sense, and that are less likely to make tragic mistakes. Amidst the din can perhaps also be heard the shrill voices of doomsayers predicting many kinds of ill and impending catastrophe. Yet the momentum is very much with the growing AI and robotics industries. So development continues, and progress is made. As the automated navigation systems of cars become smarter, they suffer fewer accidents; and as military robots achieve more precise targeting, they cause less collateral damage. A broad lesson is inferred from these observations of real-world outcomes: the smarter the AI, the safer it is. It is a lesson based on science, data, and statistics, not armchair philosophizing. Against this backdrop, some group of researchers is beginning to achieve promising results in their work on developing general machine intelligence. The researchers are carefully testing their seed AI in a sandbox environment, and the signs are all good. The AI’s behavior inspires confidence — increasingly so, as its intelligence is gradually increased.
At this point, any remaining Cassandra would have several strikes against her:
i A history of alarmists predicting intolerable harm from the growing capabilities of robotic systems and being repeatedly proven wrong. Automation has brought many benefits and has, on the whole, turned out safer than human operation.
ii A clear empirical trend: the smarter the AI, the safer and more reliable it has been. Surely this bodes well for a project aiming at creating machine intelligence more generally smart than any ever built before — what is more, machine intelligence that can improve itself so that it will become even more reliable.
iii Large and growing industries with vested interests in robotics and machine intelligence. These fields are widely seen as key to national economic competitiveness and military security. Many prestigious scientists have built their careers laying the groundwork for the present applications and the more advanced systems being planned.
iv A promising new technique in artificial intelligence, which is tremendously exciting to those who have participated in or followed the research. Although safety issues and ethics are debated, the outcome is preordained. Too much has been invested to pull back now. AI researchers have been working to get to human-level artificial intelligence for the better part of a century: of course there is no real prospect that they will now suddenly stop and throw away all this effort just when it finally is about to bear fruit.
v The enactment of some safety rituals, whatever helps demonstrate that the participants are ethical and responsible (but nothing that significantly impedes the forward charge).
vi A careful evaluation of seed AI in a sandbox environment, showing that it is behaving cooperatively and showing good judgment. After some further adjustments, the test results are as good as they could be. It is a green light for the final step . . .
And so we boldly go — into the whirling knives.
We observe here how it could be the case that when dumb, smarter is safe; yet when smart, smarter is more dangerous. There is a kind of pivot point, at which a strategy that has previously worked excellently suddenly starts to backfire.
For more on terminal goal orthogonality, see Stuart Armstrong’s “General Purpose Intelligence“. For more on instrumental goal convergence, see Steve Omohundro’s “Rational Artificial Intelligence for the Greater Good“.