Skip to content

Key Researchers in AI Safety

This section profiles key researchers, thought leaders, and practitioners who have made significant contributions to AI safety, alignment, and governance. Their work spans technical research, strategy, policy, and institution-building.

The AI safety field encompasses a wide range of views on critical questions. Timeline estimates for transformative AI range from 5 to 50+ years. Views on alignment difficulty span from “likely solvable with current approaches” to “extremely difficult, may require fundamental breakthroughs.” P(doom) estimates—the probability of existential catastrophe from AI—range from under 1% to over 50%. And strategic approaches differ on whether we should race to build safe AI first, slow down development, focus on governance, or pursue fundamental research.

Understanding who believes what and why is crucial for navigating the field’s key disagreements.

Frontier Labs (AGI-focused)
Researcher

Dario Amodei

CEO of Anthropic, advocate for Constitutional AI and responsible scaling

Researcher

Jan Leike

Head of Alignment at Anthropic, previously led OpenAI's superalignment team

Researcher

Chris Olah

Co-founder of Anthropic, pioneer in neural network interpretability

Researcher

Ilya Sutskever

Co-founder of Safe Superintelligence Inc., formerly Chief Scientist at OpenAI

Safety Research Organizations
Researcher

Paul Christiano

Founder of ARC, creator of iterated amplification and AI safety via debate

Researcher

Eliezer Yudkowsky

Co-founder of MIRI, advocate for strong AI safety precautions

Researcher

Neel Nanda

DeepMind alignment researcher, mechanistic interpretability expert

Researcher

Connor Leahy

CEO of Conjecture, focuses on interpretability and prosaic AGI safety

Academic Researchers
Researcher

Stuart Russell

UC Berkeley professor, CHAI founder, author of 'Human Compatible'

Researcher

Yoshua Bengio

Turing Award winner, AI pioneer now focused on AI safety

Researcher

Geoffrey Hinton

Turing Award winner, 'Godfather of AI', vocal about AI risks

Researcher

Dan Hendrycks

Director of CAIS, focuses on catastrophic AI risk reduction

Strategy and Governance
Researcher

Nick Bostrom

Philosopher at FHI, author of 'Superintelligence'

Researcher

Toby Ord

Philosopher at FHI, author of 'The Precipice'

Researcher

Holden Karnofsky

Co-CEO of Open Philanthropy, influential AI risk grantmaker

Different researchers hold varying positions on crucial questions:

Shorter timeline views (AGI by 2030s) are held by Dario Amodei, Ilya Sutskever, and many at frontier labs. Medium timeline views (AGI by 2040s-2050s) are represented by Paul Christiano and Holden Karnofsky. Longer or more uncertain timelines are more common among academics and governance-focused researchers.

More optimistic researchers like Dario Amodei, Jan Leike, and Paul Christiano believe alignment is solvable with iteration on current approaches. Moderately pessimistic views from Stuart Russell and Yoshua Bengio hold that major breakthroughs are needed. Very pessimistic positions, held by Eliezer Yudkowsky and some MIRI researchers, consider alignment extremely difficult and possibly not solvable in time.

Researchers diverge on strategy. Anthropic and some OpenAI researchers favor racing to build safe AGI first. Geoffrey Hinton and some policy advocates push to slow down development. MIRI and some academics focus on fundamentals. FHI researchers and policy experts prioritize governance and coordination.

These researchers represent various paths to contributing to AI safety. Technical research includes interpretability, alignment methods, and adversarial robustness. Conceptual work involves identifying new threat models and framing key problems. Empirical research tests alignment techniques on current systems. Institution building creates labs, research organizations, and governance bodies. Communication involves writing for different audiences and building field awareness. Funding directs resources to high-priority work.

Understanding their different approaches helps newcomers identify where they might contribute most effectively.