Skip to content
This site is deprecated. See the new version.

AI Risks

This section documents the potential risks from advanced AI systems, organized into four major categories based on the source and nature of the risk.

Unintended failures from AI systems pursuing misaligned goals:

  • Scheming - AI strategically concealing misaligned goals
  • Deceptive Alignment - Models appearing aligned during training
  • Mesa-Optimization - Learned optimizers with misaligned objectives
  • Goal Misgeneralization - Objectives that fail in deployment
  • Power-Seeking - Instrumental convergence toward acquiring resources

Deliberate harmful applications of AI capabilities:

Systemic issues from how AI development is organized:

Threats to society’s ability to know and reason:

Many risks interact and compound. For example:

  • Racing dynamics → reduced safety testing → higher accident risk
  • Disinformationtrust decline → reduced coordination capacity
  • Power concentration → lock-in potential → governance failures

See the Risk Interaction Matrix for detailed analysis.