Skip to content

Surprise Threat Exposure

Surprise Threat Exposure captures the risk from novel attack vectors that have not yet been anticipated—cases where AI enables entirely new categories of harm falling outside existing threat models. By definition, we cannot enumerate these threats precisely, making this parameter inherently difficult to assess but critically important.

The Warning Signs Model identifies 32 critical indicators, finding most are 18-48 months from threshold crossing with 45-90% detection probability. However, systematic tracking exists for fewer than 30% of warning signs, and pre-committed response protocols exist for fewer than 15%—revealing dangerous gaps. General resilience building emerges as the primary response strategy.

MetricScoreNotes
Changeability20Very difficult—cannot anticipate and prevent unknown threats
X-risk Impact70High—novel threats could bypass all existing defenses
Trajectory Impact55Moderate-high—could fundamentally alter AI development trajectory
Uncertainty85Very high—the fundamental nature makes assessment extremely difficult

Models:

Responses:

Key Debates:

  • How should we reason about risks we cannot specify?
  • Is general resilience the right approach, or should we try to anticipate specific novel threats?
  • Can we detect novel AI-enabled threats early enough to respond?

Ratings

MetricScoreInterpretation
Changeability20/100Hard to prevent or redirect
X-risk Impact70/100Substantial extinction risk
Trajectory Impact55/100Significant effect on long-term welfare
Uncertainty85/100High uncertainty; estimates speculative