Intervention Portfolio
Overview
Section titled “Overview”This page provides a strategic view of the AI safety intervention landscape, analyzing how different interventions address different risk categories and improve key parameters in the AI Transition Model. Rather than examining interventions individually, this portfolio view helps identify coverage gaps, complementarities, and allocation priorities.
The intervention landscape can be divided into several categories: technical approaches (alignment, interpretability, control), governance mechanisms (legislation, compute governance, international coordination), field building (talent, funding, community), and resilience measures (epistemic security, economic adaptation). Each category has different tractability profiles, timelines, and risk coverage—understanding these tradeoffs is essential for strategic resource allocation.
An effective safety portfolio requires both breadth (covering diverse failure modes) and depth (sufficient investment in each area to achieve impact). The current portfolio shows significant concentration in certain areas (RLHF, capability evaluations) while other areas remain relatively neglected (epistemic resilience, international coordination).
Intervention Categories and Risk Coverage
Section titled “Intervention Categories and Risk Coverage”Intervention by Risk Matrix
Section titled “Intervention by Risk Matrix”This matrix shows how strongly each major intervention addresses each risk category. Ratings are based on current evidence and expert assessments.
| Intervention | Accident Risks | Misuse Risks | Structural Risks | Epistemic Risks | Primary Mechanism |
|---|---|---|---|---|---|
| Interpretability | High | Low | Low | — | Detect deception and misalignment in model internals |
| AI Control | High | Medium | — | — | External constraints regardless of AI intentions |
| Evaluations | High | Medium | Low | — | Pre-deployment testing for dangerous capabilities |
| RLHF/Constitutional AI | Medium | Medium | — | — | Train models to follow human preferences |
| Scalable Oversight | Medium | Low | — | — | Human supervision of superhuman systems |
| Compute Governance | Low | High | Medium | — | Hardware chokepoints limit access |
| Export Controls | Low | High | Medium | — | Restrict adversary access to training compute |
| Responsible Scaling | Medium | Medium | Low | — | Capability thresholds trigger safety requirements |
| International Coordination | Low | Medium | High | — | Reduce racing dynamics through agreements |
| AI Safety Institutes | Medium | Medium | Medium | — | Government capacity for evaluation and oversight |
| Field Building | Medium | Low | Medium | Low | Grow talent pipeline and research capacity |
| Epistemic Security | — | Low | Low | High | Protect collective truth-finding capacity |
| Content Authentication | — | Medium | — | High | Verify authentic content in synthetic era |
Legend: High = primary focus, addresses directly; Medium = secondary impact; Low = indirect or limited; — = minimal relevance
Prioritization Framework
Section titled “Prioritization Framework”This framework evaluates interventions across the standard Importance-Tractability-Neglectedness (ITN) dimensions, with additional consideration for timeline fit and portfolio complementarity.
| Intervention | Tractability | Impact Potential | Neglectedness | Timeline Fit | Overall Priority |
|---|---|---|---|---|---|
| Interpretability | Medium | High | Low | Long | High |
| AI Control | High | Medium-High | Medium | Near | Very High |
| Evaluations | High | Medium | Low | Near | High |
| Compute Governance | High | High | Low | Near | Very High |
| International Coordination | Low | Very High | High | Long | High |
| Field Building | High | Medium | Medium | Ongoing | Medium-High |
| Epistemic Resilience | Medium | Medium | High | Near-Long | Medium-High |
| Scalable Oversight | Medium-Low | High | Medium | Long | Medium |
Prioritization Rationale
Section titled “Prioritization Rationale”Very High Priority:
- AI Control scores highly because it provides near-term safety benefits (70-85% tractability for human-level systems) regardless of whether alignment succeeds. It represents a practical bridge during the transition period.
- Compute Governance is one of few levers creating physical constraints on AI development. Hardware chokepoints exist, some measures are already implemented, and impact potential is substantial.
High Priority:
- Interpretability is potentially essential if alignment proves difficult (only reliable way to detect sophisticated deception), though scaling challenges create uncertainty.
- Evaluations provide measurable near-term impact and are already standard practice at major labs, though effectiveness against deceptive AI remains uncertain.
- International Coordination has very high impact potential for addressing structural risks like racing dynamics, but low tractability given current geopolitical tensions.
Medium-High Priority:
- Field Building and Epistemic Resilience are relatively neglected meta-level interventions that multiply the effectiveness of direct technical and governance work.
AI Transition Model Integration
Section titled “AI Transition Model Integration”Each intervention affects different parameters in the AI Transition Model. This mapping helps identify which interventions address which aspects of the transition.
Technical Approaches
Section titled “Technical Approaches”| Intervention | Primary Parameter | Secondary Parameters | Mechanism |
|---|---|---|---|
| Interpretability | Interpretability Coverage | Alignment Robustness, Safety-Capability Gap | Direct visibility into model internals |
| AI Control | Human Oversight Quality | Alignment Robustness | External constraints maintain oversight |
| Evaluations | Safety-Capability Gap | Safety Culture Strength, Human Oversight Quality | Pre-deployment testing identifies risks |
| Scalable Oversight | Human Oversight Quality | Alignment Robustness | Human supervision despite capability gaps |
Governance Approaches
Section titled “Governance Approaches”| Intervention | Primary Parameter | Secondary Parameters | Mechanism |
|---|---|---|---|
| Compute Governance | Racing Intensity | Coordination Capacity, AI Control Concentration | Hardware chokepoints slow development |
| Responsible Scaling | Safety Culture Strength | Safety-Capability Gap | Capability thresholds trigger requirements |
| International Coordination | Coordination Capacity | Racing Intensity | Agreements reduce competitive pressure |
| Legislation | Regulatory Capacity | Safety Culture Strength | Binding requirements with enforcement |
Meta-Level Interventions
Section titled “Meta-Level Interventions”| Intervention | Primary Parameter | Secondary Parameters | Mechanism |
|---|---|---|---|
| Field Building | Safety Research | Alignment Progress | Grow talent pipeline and capacity |
| Epistemic Security | Epistemic Health | Societal Trust, Reality Coherence | Protect collective knowledge |
| AI Safety Institutes | Institutional Quality | Regulatory Capacity | Government capacity for oversight |
Portfolio Gaps and Complementarities
Section titled “Portfolio Gaps and Complementarities”Coverage Gaps
Section titled “Coverage Gaps”Analysis of the current intervention portfolio reveals several areas where coverage is thin:
| Gap Area | Current Status | Risk Exposure | Recommended Action |
|---|---|---|---|
| Epistemic Risks | Few interventions directly address | Epistemic Collapse, Reality Fragmentation | Increase investment in content authentication and epistemic infrastructure |
| Long-term Structural Risks | International coordination is low tractability | Lock-in, Concentration of Power | Develop alternative coordination mechanisms; invest in governance research |
| Post-Incident Recovery | Minimal current work | All risk categories | Develop recovery protocols and resilience measures |
| Misuse by State Actors | Export controls are primary lever | AI Authoritarian Tools, AI Mass Surveillance | Research additional governance mechanisms |
Key Complementarities
Section titled “Key Complementarities”Certain interventions work better together than in isolation:
Technical + Governance:
- AI Evaluations inform Responsible Scaling Policies (RSPs) thresholds
- Interpretability enables verification for International Coordination
- AI Control provides safety margin while governance matures
Near-term + Long-term:
- Compute Governance buys time for Interpretability research
- AI Evaluations identify near-term risks while Scalable Oversight develops
- Field Building and Community ensures capacity for future technical work
Prevention + Resilience:
- Technical safety research aims to prevent failures
- Epistemic Security and economic resilience limit damage if prevention fails
- Both are needed for robust defense-in-depth
Resource Allocation Assessment
Section titled “Resource Allocation Assessment”Current vs. Recommended Allocation
Section titled “Current vs. Recommended Allocation”| Area | Current Allocation | Recommended | Rationale |
|---|---|---|---|
| RLHF/Training | Very High | High | Deployed at scale but retains 70% misalignment on agentic tasks |
| Interpretability | High | High | Rapid progress; potential for fundamental breakthroughs |
| Evaluations | High | Very High | Critical for identifying dangerous capabilities pre-deployment |
| AI Control | Medium | High | Near-term tractable; provides safety regardless of alignment |
| Compute Governance | Medium | High | One of few physical levers; already showing policy impact |
| International Coordination | Low | Medium | Low tractability but very high stakes |
| Epistemic Resilience | Very Low | Medium | Highly neglected; addresses underserved risk category |
| Field Building | Medium | Medium | Maintain current investment; returns are well-established |
Investment Concentration Risks
Section titled “Investment Concentration Risks”The current portfolio shows:
-
Frontier lab concentration: Most technical safety work happens at Anthropic, OpenAI, and DeepMind. Independent safety organizations (MIRI, ARC, Redwood) have significant funding gaps.
-
Technical over governance: Technical approaches receive substantially more investment than governance research, despite governance mechanisms being potentially high-leverage.
-
Prevention over resilience: Nearly all resources go to preventing AI harm; very little goes to limiting damage or enabling recovery if prevention fails.
-
Near-term bias: Tractable near-term interventions receive more attention than long-term work on international coordination and fundamental alignment.
Strategic Considerations
Section titled “Strategic Considerations”Worldview Dependencies
Section titled “Worldview Dependencies”Different beliefs about AI risk lead to different portfolio recommendations:
| Worldview | Prioritize | Deprioritize |
|---|---|---|
| Alignment is very hard | Interpretability, Control, International coordination | RLHF, Voluntary commitments |
| Misuse is the main risk | Compute governance, Content authentication, Legislation | Interpretability, Agent foundations |
| Short timelines | AI Control, Evaluations, Responsible scaling | Long-term governance research |
| Racing dynamics dominate | International coordination, Compute governance | Unilateral safety research |
| Epistemic collapse is likely | Epistemic security, Content authentication | Technical alignment |
Portfolio Robustness
Section titled “Portfolio Robustness”A robust portfolio should:
- Cover multiple failure modes: Don’t assume one risk category dominates
- Include both prevention and resilience: Defense in depth against prediction failure
- Balance near-term and long-term: Near-term work buys time; long-term work addresses root causes
- Maintain independent capacity: Don’t rely solely on frontier labs for safety research
- Support multiple worldviews: Invest in interventions valuable across different scenarios
Related Pages
Section titled “Related Pages”- Responses Overview - Full list of interventions
- Technical Approaches - Alignment, interpretability, control
- Governance Approaches - Legislation, compute governance, international
- Risks Overview - Risk categories addressed by interventions
- AI Transition Model - Framework for understanding AI transition dynamics
AI Transition Model Context
Section titled “AI Transition Model Context”The intervention portfolio collectively affects the Ai Transition Model across all major factors:
| Factor | Key Interventions | Coverage |
|---|---|---|
| Misalignment Potential | Alignment research, interpretability, control | Technical safety |
| Civilizational Competence | Governance, institutions, epistemic tools | Coordination capacity |
| Transition Turbulence | Compute governance, international coordination | Racing dynamics |
| Misuse Potential | Resilience, authentication, detection | Harm reduction |
Portfolio balance matters: over-investment in any single intervention type creates vulnerability if that approach fails.