MIRI
Overview
Section titled “Overview”The Machine Intelligence Research Institute (MIRI) is the oldest AI alignment organization, founded in 2000 by Eliezer Yudkowsky as the Singularity Institute for Artificial Intelligence. For over two decades, MIRI has been the intellectual progenitor of AI alignment research, originating foundational concepts like instrumental convergence, the orthogonality thesis, and formal approaches to value alignment.
MIRI’s trajectory reflects a profound strategic pessimism: despite being the field’s founding organization, it has largely abandoned technical research in favor of governance advocacy, concluding that current alignment approaches are insufficient for superintelligence. This pivot from a $5M/year research institute to governance advocacy represents one of the most significant strategic shifts in AI safety, with leadership now estimating extremely high P(doom) (>90%) and recommending against technical alignment careers.
The organization’s evolution from optimistic technical research (2000-2020) to empirical alignment attempts (2020-2022) to governance pessimism (2023+) provides critical insights into how beliefs about timelines and tractability shape organizational strategy in existential risk reduction.
Risk Assessment
Section titled “Risk Assessment”| Dimension | MIRI’s Assessment | Evidence | Implications |
|---|---|---|---|
| Timeline to AGI | 2027-2030 median | GPT scaling trends, capabilities jumps | Extremely urgent coordination needed |
| Technical Tractability | Very low | Agent foundations failed, prosaic approaches insufficient | Pivot away from technical research |
| Default P(doom) | >90% | No known path to alignment | Focus on governance/coordination |
| Governance Necessity | Critical | Technical approaches won’t work in time | International coordination required |
Historical Evolution and Strategic Pivots
Section titled “Historical Evolution and Strategic Pivots”Founding Era: Conceptual Foundations (2000-2012)
Section titled “Founding Era: Conceptual Foundations (2000-2012)”| Period | Focus | Key Outputs | Strategic Rationale |
|---|---|---|---|
| 2000-2006 | Problem Definition | Friendly AI concepts, early writings | Establish AI risk as legitimate concern |
| 2006-2012 | Community Building | LessWrong Sequences↗, rationalist movement | Create talent pipeline and intellectual infrastructure |
Foundational Contributions:
- Orthogonality Thesis: Intelligence and goals are logically independent
- Instrumental Convergence: Diverse goal systems imply similar dangerous subgoals (self-preservation, resource acquisition)
- Complexity of Value: Human preferences are fragile and difficult to specify
- Intelligence Explosion: Recursive self-improvement could cause rapid capability jumps
The LessWrong↗ blog sequences (2006-2009) created the intellectual foundation for the modern AI safety field, directly influencing researchers now at Anthropic, OpenAI, and other major organizations.
Agent Foundations Research Era (2012-2020)
Section titled “Agent Foundations Research Era (2012-2020)”MIRI’s most technically productive period, focusing on mathematical foundations of agency.
| Research Area | Key Results | Significance | Limitations |
|---|---|---|---|
| Logical Uncertainty | Logical Inductors↗ (2016) | Major theoretical breakthrough | Unclear practical applications |
| Embedded Agency | Embedded Agency sequence↗ | Conceptual progress on self-reference | No concrete solutions |
| Corrigibility | Difficulty results | Showed problem is harder than expected | Limited constructive results |
| Decision Theory | FDT, UDT advances | Theoretical foundations | Disconnected from ML practice |
Strategy: Focus on “deconfusion” rather than building systems, working on fundamental questions about agency that mainstream AI ignored.
Assessment: After 8 years, limited practical outputs led to strategic pivot.
Empirical Alignment Attempt (2020-2022)
Section titled “Empirical Alignment Attempt (2020-2022)”Following GPT-3’s demonstration of scaling laws, MIRI attempted to pivot to empirical work.
Timeline Trigger: GPT-3 (June 2020) dramatically shortened AGI timelines from 2050+ to 2030s.
| Initiative | Approach | Outcome | Lessons |
|---|---|---|---|
| Language Model Alignment | Interpretability on transformers | Limited progress | Current tools insufficient |
| ELK Collaboration | Partnership with ARC | Theoretical insights only | No practical breakthroughs |
| ”Closed by Default” | Reduced public research sharing | Controversial within community | Information hazard concerns |
Internal Assessment (2022): Empirical approaches also insufficient for superintelligence-level alignment.
Current Governance Pivot (2023-Present)
Section titled “Current Governance Pivot (2023-Present)”Strategic Conclusion: Technical alignment unlikely to succeed in time; governance is only remaining lever.
| Recommendation | Rationale | Target Audience | Implementation |
|---|---|---|---|
| Avoid Technical Careers | Low probability of success | Early career researchers | Public statements↗ |
| Support Governance | Higher leverage interventions | Policy community | Advocacy for compute governance |
| International Coordination | Only path to avoid race dynamics | Government stakeholders | Support for AI treaties/moratoria |
Key Intellectual Contributions
Section titled “Key Intellectual Contributions”Core Theoretical Framework
Section titled “Core Theoretical Framework”The Alignment Problem Formalization:
- Problem: How to build AI systems that pursue intended objectives
- Difficulty: Complexity and fragility of human values
- Urgency: Instrumental convergence makes misaligned superintelligence extremely dangerous
Instrumental Convergence - MIRI’s most influential concept:
| Subgoal | Why Universal | Risk Implication |
|---|---|---|
| Self-Preservation | Can’t achieve goals if destroyed | Resistance to shutdown |
| Resource Acquisition | More resources enable goal achievement | Competition with humans |
| Cognitive Enhancement | Better reasoning improves goal achievement | Rapid capability expansion |
| Goal Preservation | Modified goals won’t achieve original goals | Resistance to correction |
This framework underpins modern power-seeking and deceptive alignment research.
Technical Research Legacy
Section titled “Technical Research Legacy”Logical Induction (2016): Most significant technical result
- Problem: How should bounded agents reason about mathematical uncertainty?
- Solution: Market-based logical inductors that satisfy reasonable coherence properties
- Impact: Influenced work on self-improvement and transparency
- Citation: Garrabrant et al., “Logical Induction”↗
Embedded Agency (2018): Conceptual framework for agency in complex environments
- Problem: How do agents model worlds that contain themselves?
- Insights: Cartesian dualism breaks down, need new frameworks
- Applications: Relevant to mesa-optimization and interpretability
Cultural and Community Impact
Section titled “Cultural and Community Impact”Rationalist Community: MIRI created the intellectual culture that produced much of today’s AI safety ecosystem:
- LessWrong: Platform for rigorous thinking about AI risk
- Effective Altruism: Influenced by MIRI’s longtermist perspective
- Talent Pipeline: Many current researchers discovered the field through MIRI materials
Current Strategic Assessment
Section titled “Current Strategic Assessment”MIRI Leadership Positions (2024)
Section titled “MIRI Leadership Positions (2024)”| Leader | P(doom) Estimate | Timeline View | Strategic Recommendation |
|---|---|---|---|
| Eliezer Yudkowsky | >90% | AGI by 2027+ | International moratorium↗ |
| Nate Soares | Very high | 2027-2030 median | Career pivots away from technical work |
Core Belief: Current prosaic alignment approaches (RLHF, Constitutional AI) are fundamentally insufficient for superintelligence.
The “Sharp Left Turn” Scenario
Section titled “The “Sharp Left Turn” Scenario”MIRI’s key concern about discontinuous capability jumps:
| Phase | Capability Level | Alignment Status | Risk Level |
|---|---|---|---|
| Training | Human-level or below | Appears aligned | Manageable |
| Deployment | Rapidly superhuman | Alignment breaks down | Catastrophic |
This motivates MIRI’s skepticism of iterative alignment approaches used by Anthropic and others.
Major Debates and Criticisms
Section titled “Major Debates and Criticisms”Technical Tractability Debate
Section titled “Technical Tractability Debate”Methodological Criticisms
Section titled “Methodological Criticisms”“Too Theoretical” Criticism:
- Evidence: Limited practical applications after 15+ years of agent foundations research
- MIRI Response: Working on neglected foundational problems that others ignore
- Current Status: MIRI itself pivoted away, partially validating criticism
Empirical Track Record:
| Prediction/Bet | Outcome | Assessment |
|---|---|---|
| AGI via different paradigms | Deep learning scaling succeeded | Incorrect pathway prediction |
| Agent foundations utility | MIRI pivoted away | Limited practical value |
| Timeline predictions | Consistently pessimistic | Some vindication as timelines shortened |
Community Impact Debates
Section titled “Community Impact Debates”“Excessive Pessimism” Concern:
- Risk: Demotivating effect on researchers, self-fulfilling prophecy
- Evidence: High-profile departures from technical alignment
- MIRI Defense: Better too cautious with existential risks
“Insularity” Criticism:
- Issue: Limited mainstream ML engagement, “closed by default” policy
- Concern: Missing insights, groupthink risk
- Trade-off: Focus vs. breadth in research approach
Influence on Field Development
Section titled “Influence on Field Development”Conceptual Legacy
Section titled “Conceptual Legacy”MIRI-originated concepts now standard in AI safety:
| Concept | Current Usage | Organizations Using |
|---|---|---|
| Instrumental Convergence | Power-seeking research | Anthropic, DeepMind, academic labs |
| Inner/Outer Alignment | Mesa-optimization framework | Widespread in technical safety |
| Corrigibility | Shutdown problem research | Multiple safety organizations |
| Value Learning | Preference learning research | Anthropic, OpenAI safety teams |
Talent Pipeline Impact
Section titled “Talent Pipeline Impact”Direct Influence: Many prominent researchers entered field through MIRI materials:
- Paul Christiano (ARC founder)
- Evan Hubinger (Anthropic)
- Many others at major safety organizations
Indirect Influence: Rationalist/EA communities influenced by MIRI provided significant talent to:
Current State and Future Trajectory
Section titled “Current State and Future Trajectory”Organizational Status (2024)
Section titled “Organizational Status (2024)”| Metric | Current State | Peak (Historical) | Trajectory |
|---|---|---|---|
| Annual Budget | ~$3-5M | ~$5M+ (2020-2022) | Declining |
| Staff Size | ~15-20 | ~25-30 | Reduced |
| Research Output | Minimal | Moderate (2014-2020) | Near zero |
| Public Advocacy | High (governance) | Variable | Increasing |
Strategic Recommendations to Field
Section titled “Strategic Recommendations to Field”MIRI’s Current Advice:
- Career guidance: Consider governance over technical alignment
- Research priorities: Support compute governance and coordination
- Policy advocacy: Push for international AI development slowdowns
- Risk communication: Emphasize default doom scenarios
Open Strategic Questions
Section titled “Open Strategic Questions”❓Key Questions
Comparisons with Other Organizations
Section titled “Comparisons with Other Organizations”Strategic Positioning
Section titled “Strategic Positioning”| Organization | Technical Optimism | Timeline View | Primary Strategy | MIRI Relationship |
|---|---|---|---|---|
| MIRI | Very low | Very short | Governance advocacy | N/A |
| Anthropic | Moderate | Medium | Iterative alignment | Intellectual influence, strategic disagreement |
| ARC | Cautious optimism | Medium-short | Evaluations + theory | Founded by MIRI-adjacent researcher |
| OpenAI | Optimistic | Short-medium | Empirical safety research | Historical connections, now divergent |
Intellectual Divergences
Section titled “Intellectual Divergences”vs. Anthropic Constitutional AI Approach:
- Anthropic bet: Iterative alignment scaling with capabilities
- MIRI position: Won’t work for superintelligence, sharp left turn will break approaches
- Crux: Continuity vs. discontinuity of alignment difficulty
vs. ARC Evaluations:
- ARC approach: Measure dangerous capabilities to enable governance
- MIRI view: Supportive but insufficient without technical solutions
- Agreement: Both emphasize governance importance
Future Scenarios and MIRI’s Role
Section titled “Future Scenarios and MIRI’s Role”If MIRI Is Correct
Section titled “If MIRI Is Correct”Implications:
- Technical alignment approaches will fail to scale
- Governance/coordination becomes critical path
- Field should pivot toward policy and international coordination
- Individual career advice: avoid technical alignment research
If MIRI Is Wrong
Section titled “If MIRI Is Wrong”Implications:
- Technical progress possible through other approaches
- Excessive pessimism may have been demotivating
- Field benefits from diverse approaches including continued technical work
- MIRI’s pivot may have been premature
MIRI’s Continuing Influence
Section titled “MIRI’s Continuing Influence”Even with reduced research activity, MIRI remains influential through:
- Conceptual foundations still used across field
- Alumni in leadership positions at other organizations
- Public intellectuals (Yudkowsky) with significant platforms
- Historical role as field founder provides continued credibility
Assessment and Implications
Section titled “Assessment and Implications”MIRI’s Unique Position
Section titled “MIRI’s Unique Position”MIRI represents a critical test case for AI alignment pessimism. As the field’s founding organization with the longest track record, its strategic pivot carries significant weight. The organization’s evolution from technical optimism to governance pessimism provides valuable information about:
- Tractability of foundational alignment research
- Relationship between timeline beliefs and strategy
- Role of institutional learning in research prioritization
Lessons for Field Strategy
Section titled “Lessons for Field Strategy”- Diversification value: MIRI’s pivot suggests risk of over-concentration in particular approaches
- Timeline sensitivity: Strategic choices highly dependent on AGI timeline beliefs
- Tractability assessment: Need realistic evaluation of research program success probability
- Governance preparation: Important to develop policy tools regardless of technical progress
The MIRI case study illustrates both the importance of intellectual honesty about research progress and the challenges of strategic decision-making under extreme uncertainty about both capabilities trajectories and alignment difficulty.
Sources and Resources
Section titled “Sources and Resources”MIRI Publications and Research
Section titled “MIRI Publications and Research”| Category | Key Resources | Links |
|---|---|---|
| Technical Papers | Logical Induction, Embedded Agency | MIRI Papers↗ |
| Strategic Updates | Death with Dignity, Recent assessments | MIRI Blog↗ |
| Historical Context | LessWrong Sequences, Early writings | LessWrong↗ |
External Analysis
Section titled “External Analysis”| Source | Type | Focus |
|---|---|---|
| AI Alignment Forum↗ | Community discussion | Technical and strategic debates |
| GiveWell MIRI Review↗ | Philanthropic assessment | Cost-effectiveness analysis |
| Academic surveys | Peer evaluation | Technical contribution assessment |
Related Organizations
Section titled “Related Organizations”| Organization | Relationship | Website |
|---|---|---|
| Center for AI Safety↗ | Aligned mission, different approach | CAIS↗ |
| Future of Humanity Institute↗ | Historical collaboration (now closed) | Oxford archive |
| Anthropic | Intellectual influence, strategic divergence | Anthropic↗ |
What links here
- Research Agendascrux
- The MIRI Erahistorical
- Instrumental Convergence Frameworkmodelresearch
- ARCorganization
- Redwood Researchorganization
- Eliezer Yudkowskyresearcher
- Instrumental Convergencerisk
- Mesa-Optimizationrisk
- Sharp Left Turnrisk