Skip to content

AI Safety Training Programs

📋Page Status
Quality:82 (Comprehensive)
Importance:75.5 (High)
Last edited:2025-12-28 (10 days ago)
Words:1.7k
Structure:
📊 16📈 1🔗 13📚 0•13%Score: 11/15
LLM Summary:Comprehensive analysis of AI safety training programs (MATS, Anthropic Fellows, SPAR, academic PhDs) showing the field produces 100-200 new safety researchers annually against growing need. Documents program structures, selection criteria, and strategic bottlenecks including limited mentor bandwidth and retention challenges.

The AI safety field faces a critical talent bottleneck. While funding has increased substantially—with Open Philanthropy alone granting over $10 million annually—the supply of researchers capable of doing high-quality technical safety work remains constrained. Training programs represent the primary pipeline for addressing this gap, offering structured pathways from general ML expertise to safety-specific research skills.

The landscape has evolved rapidly since 2020. MATS (ML Alignment Theory Scholars) has become the premier research mentorship program, with 80% of alumni now working in AI alignment. Anthropic launched a Fellows Program specifically for mid-career transitions. Academic programs are emerging at York (SAINTS CDT), Berkeley (CHAI), and Cambridge (CHIA). Independent research programs like SPAR and LASR Labs provide part-time pathways. Together, these programs produce perhaps 100-200 new safety researchers annually—a number that may be insufficient given the pace of AI capabilities advancement.

The strategic importance of training extends beyond individual researcher production. Programs shape research culture, determine which problems receive attention, and create networks that influence the field’s direction. How training programs select participants, what methodologies they emphasize, and which mentors they feature all have downstream effects on AI safety’s trajectory.

MATS is the most established and influential AI safety research program, operating as an intensive mentorship connecting promising researchers with leading safety researchers.

AttributeDetails
Duration10 weeks intensive + 4 weeks extension
FormatIn-person (Berkeley, London)
FocusTechnical alignment research
MentorsResearchers from Anthropic, DeepMind, Redwood, FAR.AI
CompensationLiving stipend provided
Selectivity~5-10% acceptance rate
Alumni outcomes80% now working in AI alignment

Research Areas:

  • Interpretability and mechanistic understanding
  • AI control and containment
  • Scalable oversight
  • Evaluations and red-teaming
  • Robustness and security

Notable Alumni Contributions: MATS fellows have contributed to sparse autoencoders for interpretability, activation engineering research, developmental interpretability, and externalized reasoning oversight. Alumni have published at ICML and NeurIPS on safety-relevant topics.

Launched in 2024, the Anthropic Fellows Program targets mid-career technical professionals transitioning into AI safety research.

AttributeDetails
Duration6 months full-time
FormatIn-person (San Francisco)
FocusTransition to safety research
Compensation$1,100/week stipend + benefits
TargetMid-career technical professionals
First cohortMarch 2025

The program addresses a specific gap: talented ML engineers and researchers who want to transition to safety work but lack the mentorship and runway to do so. By providing substantial compensation and direct collaboration with Anthropic researchers, it removes financial barriers to career change.

SPAR offers a part-time, remote research fellowship enabling broader participation in safety research without requiring full-time commitment.

AttributeDetails
DurationSemester-length
FormatRemote, part-time
FocusAI safety and governance research
TargetStudents and professionals
OutputResearch projects, some published
SelectivityModerate

SPAR research has been accepted at top venues including ICML and NeurIPS. The program works well for:

  • Graduate students exploring safety research
  • Professionals testing interest before career change
  • Researchers in adjacent fields wanting to contribute

LASR Labs provides cohort-based technical AI safety research, preparing participants for roles at safety organizations.

AttributeDetails
DurationResearch cohort
FormatRemote
FocusTechnical safety research
OutcomesAlumni at UK AISI, Apollo Research, Leap Labs, Open Philanthropy

Impact Academy’s Global AI Safety Fellowship is a fully funded program (up to 6 months) connecting exceptional STEM talent with leading safety organizations.

AttributeDetails
DurationUp to 6 months
FormatIn-person collaboration
PartnersCHAI (Berkeley), Conjecture, FAR.AI, UK AISI
FundingFully funded
ProgramInstitutionFocusStatus
SAINTS CDTUniversity of York (UK)Safe AutonomyAccepting applications
CHAIUC BerkeleyHuman-Compatible AIEstablished
CHIACambridgeHuman-Inspired AIActive
Steinhardt LabUC BerkeleyML SafetyActive
Other ML programsVariousGeneral ML with safety focusMany options

University of York - SAINTS CDT: The UK’s first Centre for Doctoral Training specifically focused on AI safety, funded by UKRI. Brings together computer science, philosophy, law, sociology, and economics to train the next generation of safe AI experts. Based at the Institute for Safe Autonomy.

Key Academic Researchers: Prospective PhD students should consider advisors who work on safety-relevant topics:

  • Stuart Russell (Berkeley/CHAI) - Human-compatible AI
  • Jacob Steinhardt (Berkeley) - ML safety and robustness
  • Vincent Conitzer (CMU) - AI alignment theory
  • David Duvenaud (Toronto) - Interpretability
  • Roger Grosse (Toronto) - Training dynamics
  • Victor Veitch (Chicago) - Causal ML, safety
DimensionAcademic PathIndustry Path
Timeline4-6 years0-2 years to entry
Research freedomHighVaries
ResourcesLimitedOften substantial
PublicationExpectedSometimes restricted
Salary during trainingPhD stipend (~$10-50K)Full salary or fellowship
Ultimate outcomeResearch careerResearch career
Best forDeep expertise, theoryImmediate impact, applied

For those not yet ready for formal programs or preferring self-directed learning:

ResourceProviderCoverageTime Investment
AI Safety Syllabus80,000 HoursComprehensive reading list40-100+ hours
AI Alignment CourseBlueDot ImpactStructured curriculum8 weeks
ML Safety CourseDan HendrycksTechnical foundationsSemester
ARENAArenaTechnical implementations4-8 weeks
Loading diagram...
Your SituationRecommended Path
Strong ML background, want safety focusMATS or Anthropic Fellows
Exploring interest, employedSPAR (part-time)
Student, want research experienceLASR Labs, SPAR
Early career, want PhDAcademic programs
Mid-career, want full transitionAnthropic Fellows
Strong background, want independenceSelf-study + independent research

Based on program outcomes, successful applicants typically have:

FactorImportanceHow to Develop
ML technical skillsCriticalCourses, projects, publications
Research experienceHighAcademic or industry research
Safety knowledgeMedium-HighReading, courses, writing
CommunicationMediumWriting, presentations
Clear research interestsMediumReading, reflection, pilot projects
Failure ModeDescriptionMitigation
Premature applicationApplying without sufficient ML skillsBuild fundamentals first
No research outputNothing demonstrating research capabilityComplete pilot project
Vague interestsUnable to articulate what you want to work onRead extensively, form views
Poor fitMismatch between interests and programResearch programs carefully
Giving up earlyRejection discouragementMultiple applications, iterate
StageAnnual OutputBottleneck
Interested individualsThousandsConversion
Program applicants500-1000Selectivity
Program participants150-300Capacity
Research-productive alumni100-200Mentorship
Long-term field contributors50-100Retention
ChallengeDescriptionPotential Solutions
Mentor bandwidthLimited senior researchers availablePeer mentorship, async formats
Quality maintenanceScaling may dilute intensityTiered programs
FundingPrograms need sustainable fundingPhilanthropic, industry, government
CoordinationMany programs with unclear differentiationBetter information, specialization
RetentionMany trained researchers leave safetyBetter career paths, culture
DimensionAssessmentNotes
TractabilityHighKnown how to train researchers
If AI risk highHighNeed many more researchers
If AI risk lowMediumStill valuable for responsible development
NeglectednessMediumSignificant investment but scaling gaps
Timeline to impact1-5 yearsTrained researchers take time to contribute
GradeB+Important but faces scaling limits
RiskMechanismEffectiveness
Inadequate safety researchMore researchers doing safety workHigh
Racing dynamicsSafety talent at labs can advocateMedium
Field captureDiverse training reduces groupthinkMedium
  • MATS: matsprogram.org - Official program information
  • Anthropic Fellows: alignment.anthropic.com/2024/anthropic-fellows-program
  • SPAR: sparai.org - Scholars Program for AI Risks
  • LASR Labs: lasrlabs.org
  • Global AI Safety Fellowship: globalaisafetyfellowship.com
  • 80,000 Hours: “AI Safety Syllabus” and career guide
  • Alignment Forum: Career advice threads
  • EA Forum: “Rank Best Universities for AI Safety”
  • University of York SAINTS CDT: york.ac.uk/study/postgraduate-research/centres-doctoral-training/safe-ai-training
  • Stanford Center for AI Safety: aisafety.stanford.edu
  • CHAI (Berkeley): humancompatible.ai

AI safety training programs improve the Ai Transition Model through multiple factors:

FactorParameterImpact
Misalignment PotentialSafety-Capability GapProduces 100-200 new safety researchers annually to address research talent bottleneck
Misalignment PotentialAlignment RobustnessMentored researchers produce higher-quality alignment work
Civilizational CompetenceInstitutional QualityTrained researchers staff AI Safety Institutes and governance organizations

Training programs are critical infrastructure for the field; their effectiveness is bottlenecked by limited mentor bandwidth and retention challenges.