Eliezer Yudkowsky
Eliezer Yudkowsky
Background
Section titled “Background”Eliezer Yudkowsky is one of the earliest and most influential voices in AI existential risk. He co-founded the Machine Intelligence Research Institute (originally the Singularity Institute) in 2000, making him a pioneer in organized AI safety research.
Yudkowsky is largely self-taught in mathematics and computer science, beginning his AI safety work in the late 1990s. He’s known for:
- Founding LessWrong and the rationalist community
- Writing extensively on cognitive biases and rational thinking
- Developing early frameworks for AI alignment (Coherent Extrapolated Volition)
- Contributing to decision theory (Timeless Decision Theory, Updateless Decision Theory)
- Writing fiction exploring AI alignment themes (Harry Potter and the Methods of Rationality)
Key Contributions to AI Safety
Section titled “Key Contributions to AI Safety”Coherent Extrapolated Volition (CEV)
Section titled “Coherent Extrapolated Volition (CEV)”Proposed in 2004, CEV attempts to formalize “what humanity would want if we knew more, thought faster, were more the people we wished we were.” Rather than trying to specify human values directly, CEV suggests extrapolating what we would collectively choose under idealized conditions.
Early Warning and Problem Formulation
Section titled “Early Warning and Problem Formulation”Yudkowsky was among the first to:
- Articulate the alignment problem clearly
- Explain why superintelligent AI poses unique risks
- Emphasize the difficulty of value specification
- Highlight the potential for “treacherous turns” in AI development
- Argue that alignment must be solved before AGI is developed
Agent Foundations Research
Section titled “Agent Foundations Research”Through MIRI, Yudkowsky has pushed for research on fundamental questions about agency, decision theory, and embedded agents. This includes work on:
- Logical uncertainty
- Naturalized induction
- Reflective consistency
- Embedded agency
Views on Key Cruxes
Section titled “Views on Key Cruxes”P(doom): Very high, often stated as >90% in recent years
Timeline: Believes AGI is plausible within 10-20 years, possibly sooner
Alignment difficulty: Considers alignment extremely difficult, likely requiring fundamental theoretical breakthroughs we haven’t made yet
Core Concerns
Section titled “Core Concerns”- Default outcome is doom: Without major breakthroughs in alignment theory, Yudkowsky believes AGI development will likely lead to human extinction
- Sharp left turn: Expects rapid capability gains that outpace our ability to align systems
- Deceptive alignment: Worried that sufficiently capable systems will learn to appear aligned during training while pursuing different goals
- Inadequate preparation: Believes current alignment efforts are insufficient for the difficulty of the problem
Disagreements with Mainstream
Section titled “Disagreements with Mainstream”Yudkowsky is notably more pessimistic than most AI safety researchers:
Strategic Views
Section titled “Strategic Views”On Current AI Development
Section titled “On Current AI Development”Yudkowsky has advocated for:
- Slowing down AI capabilities research: Believes we need much more time for alignment work
- International cooperation: Has proposed international treaties to limit AGI development
- Extreme measures: In a controversial 2023 Time article, suggested potential need for international enforcement including military action against rogue AGI projects
On Alignment Approaches
Section titled “On Alignment Approaches”- Skeptical of prosaic alignment: Doubtful that techniques like RLHF will scale to superintelligence
- Emphasis on theory: Believes we need better theoretical foundations before scaling systems
- Critical of “race to the top”: Argues that building AGI to solve alignment is putting the cart before the horse
Key Publications and Writings
Section titled “Key Publications and Writings”- “Intelligence Explosion Microeconomics” (2013) - Analyzes economic dynamics of recursive self-improvement
- “There’s No Fire Alarm for Artificial General Intelligence” (2017) - Argues we won’t get clear warning signs
- “AGI Ruin: A List of Lethalities” (2022) - Comprehensive argument for why default outcomes are catastrophic
- Sequences (2006-2009) - Blog posts on rationality, many touching on AI safety
- “Pausing AI Developments Isn’t Enough. We Need to Shut it All Down” (2023) - Controversial Time op-ed
Influence and Legacy
Section titled “Influence and Legacy”Yudkowsky’s impact extends beyond direct research:
- Field creation: Helped establish AI safety as a legitimate field of study
- Community building: Created intellectual infrastructure (LessWrong, CFAR) that trained many current researchers
- Problem formulation: Articulated key problems that shaped decades of subsequent work
- Public awareness: Through writing and fiction, introduced AI risk to broader audiences
- Funding: His early work influenced major funders like Open Philanthropy
Criticism and Controversy
Section titled “Criticism and Controversy”Yudkowsky is a polarizing figure:
Critics argue:
- His extreme pessimism may be counterproductive or unfounded
- Lack of formal credentials in relevant fields
- Sometimes dismissive of others’ approaches
- Apocalyptic framing may alienate potential allies
Supporters counter:
- He was correct about many things before others (importance of AI safety, difficulty of alignment)
- Has demonstrated technical competence through decision theory work
- Pessimism may be warranted given stakes
- Direct communication style is valuable even if uncomfortable
Related Pages
Section titled “Related Pages”What links here
- MIRIorganization
- Paul Christianoresearcher