Skip to content

Accident Risks

Accident risks arise when AI systems fail to do what we intend, even when no one is deliberately misusing them. These are the core concerns of technical AI safety research.

Some of these are failure modes (actual things that go wrong), while others are contributing factors (conditions that enable or increase risk). The diagram below shows how they connect.

Gray nodes are contributing factors; blue nodes are failure modes.


These aren’t failures themselves, but conditions and dynamics that enable or increase accident risks:

Failures in how AI systems learn and pursue objectives:

Risks involving AI systems that strategically deceive humans:

Behavioral patterns that emerge from optimization and pose risks:

These risks become more dangerous as AI systems gain certain capabilities:

RiskKey Enabling Capabilities
Deceptive AlignmentSituational Awareness, Persuasion
SchemingSituational Awareness, Reasoning, Long-Horizon Tasks
Treacherous TurnReasoning, Long-Horizon Tasks
Power-SeekingAgentic AI, Tool Use, Reasoning
Corrigibility FailureAgentic AI, Tool Use
Sharp Left TurnSelf-Improvement

A model without situational awareness cannot strategically game its training process. A model without agentic capabilities cannot seek power in the real world. Understanding capability prerequisites helps prioritize safety research.


These risks don’t require malicious intent from developers or users. They arise from the difficulty of:

  1. Specifying objectives - Precisely defining what we want
  2. Robust learning - Ensuring learned behaviors generalize correctly
  3. Maintaining control - Keeping AI systems correctable
  4. Predicting capabilities - Knowing what systems can do before they do it

The common thread: AI systems optimizing for something subtly different from what we actually want.


These accident risks are amplified by the following amplifiers from other risk categories:

FactorHow It Contributes
Racing DynamicsLess time for safety research, rushed deployment
Flash DynamicsAI operates too fast for human oversight to catch errors