Skip to content

History of AI Safety

The field of AI safety—the study of how to build advanced AI systems that are safe, beneficial, and aligned with human values—has evolved from scattered warnings by visionaries in the 1950s to a mainstream research area backed by billions in funding and government attention.

This section traces that evolution through five distinct eras:

  1. Early Warnings (1950s-2000): Prescient concerns from computing pioneers
  2. The MIRI Era (2000-2015): Formation of the first dedicated AI safety organization
  3. Deep Learning Revolution (2012-2020): AI capabilities accelerate; safety concerns grow
  4. Mainstream Era (2020-Present): From fringe to mainstream
  5. Key Publications: The books and papers that shaped the field
YearEventSignificance
1950Turing’s “Computing Machinery and Intelligence”First philosophical treatment of machine intelligence
1956Dartmouth ConferenceBirth of AI as a field
1960Wiener’s “Some Moral and Technical Consequences of Automation”Early warning about autonomous systems
1965I.J. Good’s “Intelligence Explosion” paperFirst formal treatment of recursive self-improvement
1968”2001: A Space Odyssey” releasedHAL 9000 enters popular consciousness

1970s-1990s: AI Winter and Science Fiction

Section titled “1970s-1990s: AI Winter and Science Fiction”
YearEventSignificance
1942-1950Asimov’s Robot storiesThree Laws of Robotics shape popular thinking
1973-1980sFirst AI WinterFunding dries up; existential concerns seem premature
1993Vernor Vinge’s “The Coming Technological Singularity”Popularizes singularity concept
YearEventSignificance
2000Singularity Institute founded (later MIRI)First organization dedicated to AI safety
2001Yudkowsky’s “Creating Friendly AI”Early technical treatment of alignment
2005Ray Kurzweil’s “The Singularity is Near”Brings AI advancement to mainstream
2006-2009LessWrong and The SequencesCommunity forms around AI safety
2008Bostrom’s “Global Catastrophic Risks”Academic treatment of existential risks
YearEventSignificance
2010DeepMind foundedElite AI lab with explicit safety mission
2012AlexNet wins ImageNetDeep learning revolution begins
2014Bostrom’s “Superintelligence”First comprehensive book on AI risk
2014Elon Musk and others donate to FLIHigh-profile funding enters the field
2015OpenAI foundedMajor lab with safety in charter
2016”Concrete Problems in AI Safety”First mainstream technical research agenda
2016AlphaGo beats Lee SedolAI capabilities shock the world
2017Amodei siblings leave OpenAIBegin planning Anthropic
2018GPT releasedLarge language models emerge
2019OpenAI shifts to “capped profit”Governance questions intensify
YearEventSignificance
2020GPT-3 releasedScaling laws demonstrate continued progress
2020Toby Ord’s “The Precipice”AI risk enters effective altruism mainstream
2021Anthropic foundedSafety-focused lab with major funding
2022ChatGPT releasedAI becomes household topic
2022Constitutional AI paperNew alignment approach demonstrated
2023Geoffrey Hinton leaves Google, warns of AI risk”Godfather of AI” joins safety advocates
2023OpenAI leadership crisisGovernance failures on display
2023EU AI Act passedFirst major AI regulation
2023UK AI Safety SummitGovernment-level coordination begins
2023Frontier Model ForumIndustry self-regulation effort
2024Multiple AI safety institutes foundedGovernment investment in safety research

Phase 1: Philosophical Speculation (1950s-2000)

Section titled “Phase 1: Philosophical Speculation (1950s-2000)”
  • Key figures: Turing, Wiener, Good, Asimov
  • Nature: Mostly thought experiments and warnings
  • Audience: Academics and science fiction readers
  • Response: Largely dismissed as science fiction

Phase 2: Rationalist Community Formation (2000-2012)

Section titled “Phase 2: Rationalist Community Formation (2000-2012)”
  • Key figures: Yudkowsky, Bostrom, Hanson
  • Nature: Philosophical arguments with some technical work
  • Audience: Small online community (LessWrong)
  • Response: Considered fringe, even within AI research
  • Key figures: Bostrom, Russell, Tegmark, Amodei
  • Nature: Academic papers, university courses, research agendas
  • Audience: AI researchers, academics, tech leaders
  • Response: Growing acceptance, but still controversial

Phase 4: Mainstream Urgency (2020-Present)

Section titled “Phase 4: Mainstream Urgency (2020-Present)”
  • Key figures: Hinton, Bengio, Russell, Amodei, Anthropic/OpenAI leadership
  • Nature: Technical research, governance proposals, government testimony
  • Audience: Policymakers, public, industry
  • Response: Major topic in AI development and regulation

The field can be divided into distinct eras, each with its own character, key figures, and defining moments:

The foundational period when computing pioneers first raised concerns about machine intelligence exceeding human control.

Key themes:

  • Philosophical foundations
  • Science fiction influence
  • Academic speculation
  • Prescient warnings largely ignored

Major figures: Alan Turing, Norbert Wiener, I.J. Good, Isaac Asimov, Vernor Vinge

The formation of the first dedicated AI safety organization and online community.

Key themes:

  • Friendly AI research
  • Rationalist community building
  • Philosophical groundwork
  • Academic respectability battles

Major figures: Eliezer Yudkowsky, Nick Bostrom, Robin Hanson, Shane Legg

Rapid AI progress makes safety concerns more urgent and attracts mainstream attention.

Key themes:

  • Capabilities acceleration
  • Safety research professionalization
  • Major lab founding
  • Growing public awareness

Major figures: Demis Hassabis, Dario Amodei, Paul Christiano, Sam Altman, Stuart Russell

AI safety moves from fringe concern to central topic in technology policy.

Key themes:

  • ChatGPT moment
  • Government engagement
  • Corporate safety commitments
  • Scaling continues

Major figures: Geoffrey Hinton, Yoshua Bengio, Dario Amodei, Helen Toner, governments worldwide

The seminal papers and books that defined the field’s intellectual development.

Timelines have shortened:

  • 2000s: AGI in 50-100 years
  • 2010s: AGI in 20-50 years
  • 2020s: AGI in 5-20 years (many estimates)

Concerns have evolved:

  • Early: Philosophical problems (value loading, goal stability)
  • Middle: Technical problems (reward hacking, mesa-optimization)
  • Recent: Governance problems (race dynamics, deployment safety)

Community has grown:

  • 2000: ~10 people thinking seriously about AI safety
  • 2010: ~100 researchers
  • 2020: ~500-1,000 researchers
  • 2024: Several thousand across academia, industry, government

Funding has exploded:

  • 2000-2010: <$1M/year
  • 2010-2020: ~$10-50M/year
  • 2020-2024: $100-500M/year

Core concerns:

  • Alignment difficulty
  • Capability-alignment gap
  • Competitive pressures
  • Difficulty of getting it right the first time

Key uncertainties:

  • When will transformative AI arrive?
  • How fast will takeoff be?
  • Can alignment research keep pace?
  • Will coordination be possible?

Fundamental questions:

  • Can we specify human values?
  • Will AI systems be goal-directed?
  • Is alignment possible in principle?
  • How do we prevent catastrophic outcomes?

AlexNet (2012): Proved deep learning worked at scale. Shifted AI from symbolic systems to neural networks.

AlphaGo (2016): Demonstrated AI could master intuition-based tasks thought to require human-like understanding.

GPT-2 (2019): First language model considered “too dangerous” to release. Sparked debate about responsible disclosure.

GPT-3 (2020): Scaling laws demonstrated. Made clear that bigger models = more capabilities.

ChatGPT (2022): Brought AI to mainstream public consciousness. Transformed AI safety from niche to urgent.

Hinton’s departure (2023): Signal that even AI pioneers are deeply concerned.

United States: Dominated early development (MIRI, OpenAI, Anthropic based in Bay Area)

United Kingdom: Major policy center (UK AI Safety Institute, Oxford’s FHI, Cambridge CSER)

European Union: Regulatory leader (EU AI Act, emphasis on governance)

China: Separate development path with different safety concerns

Global South: Underrepresented but growing engagement

If deep learning hadn’t worked: AI safety might still be a fringe concern. Symbolic AI safety would look very different.

If AlphaGo had lost: Might have delayed timeline concerns by years.

If ChatGPT had been slower to deploy: Public awareness might still be limited.

If Anthropic hadn’t been founded: Safety-focused lab competition might be weaker.

If Bostrom’s book had come out in 2010 instead of 2014: Might have shaped the deep learning era differently.

The history of AI safety reveals several patterns:

  1. Warnings precede action by decades: Early voices were correct but premature
  2. Progress happens faster than expected: Each generation underestimates acceleration
  3. Capabilities lead, safety lags: Technical safety work always plays catch-up
  4. Mainstream adoption is sudden: Years of groundwork followed by rapid acceptance
  5. Time to act is limited: From “fringe” to “urgent” happened in ~5 years (2018-2023)

The question now: Will the 2020s be remembered as the decade we solved AI safety—or the decade we ran out of time?

For detailed exploration of each era:

For analysis of specific organizations mentioned in this history: