Skip to content

Intervention Portfolio

This page provides a strategic view of the AI safety intervention landscape, analyzing how different interventions address different risk categories and improve key parameters in the AI Transition Model. Rather than examining interventions individually, this portfolio view helps identify coverage gaps, complementarities, and allocation priorities.

The intervention landscape can be divided into several categories: technical approaches (alignment, interpretability, control), governance mechanisms (legislation, compute governance, international coordination), field building (talent, funding, community), and resilience measures (epistemic security, economic adaptation). Each category has different tractability profiles, timelines, and risk coverage—understanding these tradeoffs is essential for strategic resource allocation.

An effective safety portfolio requires both breadth (covering diverse failure modes) and depth (sufficient investment in each area to achieve impact). The current portfolio shows significant concentration in certain areas (RLHF, capability evaluations) while other areas remain relatively neglected (epistemic resilience, international coordination).


Loading diagram...

This matrix shows how strongly each major intervention addresses each risk category. Ratings are based on current evidence and expert assessments.

InterventionAccident RisksMisuse RisksStructural RisksEpistemic RisksPrimary Mechanism
InterpretabilityHighLowLowDetect deception and misalignment in model internals
AI ControlHighMediumExternal constraints regardless of AI intentions
EvaluationsHighMediumLowPre-deployment testing for dangerous capabilities
RLHF/Constitutional AIMediumMediumTrain models to follow human preferences
Scalable OversightMediumLowHuman supervision of superhuman systems
Compute GovernanceLowHighMediumHardware chokepoints limit access
Export ControlsLowHighMediumRestrict adversary access to training compute
Responsible ScalingMediumMediumLowCapability thresholds trigger safety requirements
International CoordinationLowMediumHighReduce racing dynamics through agreements
AI Safety InstitutesMediumMediumMediumGovernment capacity for evaluation and oversight
Field BuildingMediumLowMediumLowGrow talent pipeline and research capacity
Epistemic SecurityLowLowHighProtect collective truth-finding capacity
Content AuthenticationMediumHighVerify authentic content in synthetic era

Legend: High = primary focus, addresses directly; Medium = secondary impact; Low = indirect or limited; — = minimal relevance


This framework evaluates interventions across the standard Importance-Tractability-Neglectedness (ITN) dimensions, with additional consideration for timeline fit and portfolio complementarity.

InterventionTractabilityImpact PotentialNeglectednessTimeline FitOverall Priority
InterpretabilityMediumHighLowLongHigh
AI ControlHighMedium-HighMediumNearVery High
EvaluationsHighMediumLowNearHigh
Compute GovernanceHighHighLowNearVery High
International CoordinationLowVery HighHighLongHigh
Field BuildingHighMediumMediumOngoingMedium-High
Epistemic ResilienceMediumMediumHighNear-LongMedium-High
Scalable OversightMedium-LowHighMediumLongMedium

Very High Priority:

  • AI Control scores highly because it provides near-term safety benefits (70-85% tractability for human-level systems) regardless of whether alignment succeeds. It represents a practical bridge during the transition period.
  • Compute Governance is one of few levers creating physical constraints on AI development. Hardware chokepoints exist, some measures are already implemented, and impact potential is substantial.

High Priority:

  • Interpretability is potentially essential if alignment proves difficult (only reliable way to detect sophisticated deception), though scaling challenges create uncertainty.
  • Evaluations provide measurable near-term impact and are already standard practice at major labs, though effectiveness against deceptive AI remains uncertain.
  • International Coordination has very high impact potential for addressing structural risks like racing dynamics, but low tractability given current geopolitical tensions.

Medium-High Priority:

  • Field Building and Epistemic Resilience are relatively neglected meta-level interventions that multiply the effectiveness of direct technical and governance work.

Each intervention affects different parameters in the AI Transition Model. This mapping helps identify which interventions address which aspects of the transition.

InterventionPrimary ParameterSecondary ParametersMechanism
InterpretabilityInterpretability CoverageAlignment Robustness, Safety-Capability GapDirect visibility into model internals
AI ControlHuman Oversight QualityAlignment RobustnessExternal constraints maintain oversight
EvaluationsSafety-Capability GapSafety Culture Strength, Human Oversight QualityPre-deployment testing identifies risks
Scalable OversightHuman Oversight QualityAlignment RobustnessHuman supervision despite capability gaps
InterventionPrimary ParameterSecondary ParametersMechanism
Compute GovernanceRacing IntensityCoordination Capacity, AI Control ConcentrationHardware chokepoints slow development
Responsible ScalingSafety Culture StrengthSafety-Capability GapCapability thresholds trigger requirements
International CoordinationCoordination CapacityRacing IntensityAgreements reduce competitive pressure
LegislationRegulatory CapacitySafety Culture StrengthBinding requirements with enforcement
InterventionPrimary ParameterSecondary ParametersMechanism
Field BuildingSafety ResearchAlignment ProgressGrow talent pipeline and capacity
Epistemic SecurityEpistemic HealthSocietal Trust, Reality CoherenceProtect collective knowledge
AI Safety InstitutesInstitutional QualityRegulatory CapacityGovernment capacity for oversight

Analysis of the current intervention portfolio reveals several areas where coverage is thin:

Gap AreaCurrent StatusRisk ExposureRecommended Action
Epistemic RisksFew interventions directly addressEpistemic Collapse, Reality FragmentationIncrease investment in content authentication and epistemic infrastructure
Long-term Structural RisksInternational coordination is low tractabilityLock-in, Concentration of PowerDevelop alternative coordination mechanisms; invest in governance research
Post-Incident RecoveryMinimal current workAll risk categoriesDevelop recovery protocols and resilience measures
Misuse by State ActorsExport controls are primary leverAI Authoritarian Tools, AI Mass SurveillanceResearch additional governance mechanisms

Certain interventions work better together than in isolation:

Technical + Governance:

Near-term + Long-term:

Prevention + Resilience:

  • Technical safety research aims to prevent failures
  • Epistemic Security and economic resilience limit damage if prevention fails
  • Both are needed for robust defense-in-depth

AreaCurrent AllocationRecommendedRationale
RLHF/TrainingVery HighHighDeployed at scale but retains 70% misalignment on agentic tasks
InterpretabilityHighHighRapid progress; potential for fundamental breakthroughs
EvaluationsHighVery HighCritical for identifying dangerous capabilities pre-deployment
AI ControlMediumHighNear-term tractable; provides safety regardless of alignment
Compute GovernanceMediumHighOne of few physical levers; already showing policy impact
International CoordinationLowMediumLow tractability but very high stakes
Epistemic ResilienceVery LowMediumHighly neglected; addresses underserved risk category
Field BuildingMediumMediumMaintain current investment; returns are well-established

The current portfolio shows:

  1. Frontier lab concentration: Most technical safety work happens at Anthropic, OpenAI, and DeepMind. Independent safety organizations (MIRI, ARC, Redwood) have significant funding gaps.

  2. Technical over governance: Technical approaches receive substantially more investment than governance research, despite governance mechanisms being potentially high-leverage.

  3. Prevention over resilience: Nearly all resources go to preventing AI harm; very little goes to limiting damage or enabling recovery if prevention fails.

  4. Near-term bias: Tractable near-term interventions receive more attention than long-term work on international coordination and fundamental alignment.


Different beliefs about AI risk lead to different portfolio recommendations:

WorldviewPrioritizeDeprioritize
Alignment is very hardInterpretability, Control, International coordinationRLHF, Voluntary commitments
Misuse is the main riskCompute governance, Content authentication, LegislationInterpretability, Agent foundations
Short timelinesAI Control, Evaluations, Responsible scalingLong-term governance research
Racing dynamics dominateInternational coordination, Compute governanceUnilateral safety research
Epistemic collapse is likelyEpistemic security, Content authenticationTechnical alignment

A robust portfolio should:

  1. Cover multiple failure modes: Don’t assume one risk category dominates
  2. Include both prevention and resilience: Defense in depth against prediction failure
  3. Balance near-term and long-term: Near-term work buys time; long-term work addresses root causes
  4. Maintain independent capacity: Don’t rely solely on frontier labs for safety research
  5. Support multiple worldviews: Invest in interventions valuable across different scenarios


The intervention portfolio collectively affects the Ai Transition Model across all major factors:

FactorKey InterventionsCoverage
Misalignment PotentialAlignment research, interpretability, controlTechnical safety
Civilizational CompetenceGovernance, institutions, epistemic toolsCoordination capacity
Transition TurbulenceCompute governance, international coordinationRacing dynamics
Misuse PotentialResilience, authentication, detectionHarm reduction

Portfolio balance matters: over-investment in any single intervention type creates vulnerability if that approach fails.