AI Safety Training Programs
Overview
Section titled âOverviewâThe AI safety field faces a critical talent bottleneck. While funding has increased substantiallyâwith Open Philanthropy alone granting over $10 million annuallyâthe supply of researchers capable of doing high-quality technical safety work remains constrained. Training programs represent the primary pipeline for addressing this gap, offering structured pathways from general ML expertise to safety-specific research skills.
The landscape has evolved rapidly since 2020. MATS (ML Alignment Theory Scholars) has become the premier research mentorship program, with 80% of alumni now working in AI alignment. Anthropic launched a Fellows Program specifically for mid-career transitions. Academic programs are emerging at York (SAINTS CDT), Berkeley (CHAI), and Cambridge (CHIA). Independent research programs like SPAR and LASR Labs provide part-time pathways. Together, these programs produce perhaps 100-200 new safety researchers annuallyâa number that may be insufficient given the pace of AI capabilities advancement.
The strategic importance of training extends beyond individual researcher production. Programs shape research culture, determine which problems receive attention, and create networks that influence the fieldâs direction. How training programs select participants, what methodologies they emphasize, and which mentors they feature all have downstream effects on AI safetyâs trajectory.
Major Training Programs
Section titled âMajor Training ProgramsâMATS (ML Alignment Theory Scholars)
Section titled âMATS (ML Alignment Theory Scholars)âMATS is the most established and influential AI safety research program, operating as an intensive mentorship connecting promising researchers with leading safety researchers.
| Attribute | Details |
|---|---|
| Duration | 10 weeks intensive + 4 weeks extension |
| Format | In-person (Berkeley, London) |
| Focus | Technical alignment research |
| Mentors | Researchers from Anthropic, DeepMind, Redwood, FAR.AI |
| Compensation | Living stipend provided |
| Selectivity | ~5-10% acceptance rate |
| Alumni outcomes | 80% now working in AI alignment |
Research Areas:
- Interpretability and mechanistic understanding
- AI control and containment
- Scalable oversight
- Evaluations and red-teaming
- Robustness and security
Notable Alumni Contributions: MATS fellows have contributed to sparse autoencoders for interpretability, activation engineering research, developmental interpretability, and externalized reasoning oversight. Alumni have published at ICML and NeurIPS on safety-relevant topics.
Anthropic Fellows Program
Section titled âAnthropic Fellows ProgramâLaunched in 2024, the Anthropic Fellows Program targets mid-career technical professionals transitioning into AI safety research.
| Attribute | Details |
|---|---|
| Duration | 6 months full-time |
| Format | In-person (San Francisco) |
| Focus | Transition to safety research |
| Compensation | $1,100/week stipend + benefits |
| Target | Mid-career technical professionals |
| First cohort | March 2025 |
The program addresses a specific gap: talented ML engineers and researchers who want to transition to safety work but lack the mentorship and runway to do so. By providing substantial compensation and direct collaboration with Anthropic researchers, it removes financial barriers to career change.
SPAR (Scholars Program for AI Risks)
Section titled âSPAR (Scholars Program for AI Risks)âSPAR offers a part-time, remote research fellowship enabling broader participation in safety research without requiring full-time commitment.
| Attribute | Details |
|---|---|
| Duration | Semester-length |
| Format | Remote, part-time |
| Focus | AI safety and governance research |
| Target | Students and professionals |
| Output | Research projects, some published |
| Selectivity | Moderate |
SPAR research has been accepted at top venues including ICML and NeurIPS. The program works well for:
- Graduate students exploring safety research
- Professionals testing interest before career change
- Researchers in adjacent fields wanting to contribute
LASR Labs
Section titled âLASR LabsâLASR Labs provides cohort-based technical AI safety research, preparing participants for roles at safety organizations.
| Attribute | Details |
|---|---|
| Duration | Research cohort |
| Format | Remote |
| Focus | Technical safety research |
| Outcomes | Alumni at UK AISI, Apollo Research, Leap Labs, Open Philanthropy |
Global AI Safety Fellowship
Section titled âGlobal AI Safety FellowshipâImpact Academyâs Global AI Safety Fellowship is a fully funded program (up to 6 months) connecting exceptional STEM talent with leading safety organizations.
| Attribute | Details |
|---|---|
| Duration | Up to 6 months |
| Format | In-person collaboration |
| Partners | CHAI (Berkeley), Conjecture, FAR.AI, UK AISI |
| Funding | Fully funded |
Academic Pathways
Section titled âAcademic PathwaysâPhD Programs
Section titled âPhD Programsâ| Program | Institution | Focus | Status |
|---|---|---|---|
| SAINTS CDT | University of York (UK) | Safe Autonomy | Accepting applications |
| CHAI | UC Berkeley | Human-Compatible AI | Established |
| CHIA | Cambridge | Human-Inspired AI | Active |
| Steinhardt Lab | UC Berkeley | ML Safety | Active |
| Other ML programs | Various | General ML with safety focus | Many options |
University of York - SAINTS CDT: The UKâs first Centre for Doctoral Training specifically focused on AI safety, funded by UKRI. Brings together computer science, philosophy, law, sociology, and economics to train the next generation of safe AI experts. Based at the Institute for Safe Autonomy.
Key Academic Researchers: Prospective PhD students should consider advisors who work on safety-relevant topics:
- Stuart Russell (Berkeley/CHAI) - Human-compatible AI
- Jacob Steinhardt (Berkeley) - ML safety and robustness
- Vincent Conitzer (CMU) - AI alignment theory
- David Duvenaud (Toronto) - Interpretability
- Roger Grosse (Toronto) - Training dynamics
- Victor Veitch (Chicago) - Causal ML, safety
Academic vs. Industry Research
Section titled âAcademic vs. Industry Researchâ| Dimension | Academic Path | Industry Path |
|---|---|---|
| Timeline | 4-6 years | 0-2 years to entry |
| Research freedom | High | Varies |
| Resources | Limited | Often substantial |
| Publication | Expected | Sometimes restricted |
| Salary during training | PhD stipend (~$10-50K) | Full salary or fellowship |
| Ultimate outcome | Research career | Research career |
| Best for | Deep expertise, theory | Immediate impact, applied |
Upskilling Resources
Section titled âUpskilling ResourcesâFor those not yet ready for formal programs or preferring self-directed learning:
Structured Curricula
Section titled âStructured Curriculaâ| Resource | Provider | Coverage | Time Investment |
|---|---|---|---|
| AI Safety Syllabus | 80,000 Hours | Comprehensive reading list | 40-100+ hours |
| AI Alignment Course | BlueDot Impact | Structured curriculum | 8 weeks |
| ML Safety Course | Dan Hendrycks | Technical foundations | Semester |
| ARENA | Arena | Technical implementations | 4-8 weeks |
Self-Study Path
Section titled âSelf-Study PathâCareer Transition Considerations
Section titled âCareer Transition ConsiderationsâWhen to Apply to Programs
Section titled âWhen to Apply to Programsâ| Your Situation | Recommended Path |
|---|---|
| Strong ML background, want safety focus | MATS or Anthropic Fellows |
| Exploring interest, employed | SPAR (part-time) |
| Student, want research experience | LASR Labs, SPAR |
| Early career, want PhD | Academic programs |
| Mid-career, want full transition | Anthropic Fellows |
| Strong background, want independence | Self-study + independent research |
Success Factors
Section titled âSuccess FactorsâBased on program outcomes, successful applicants typically have:
| Factor | Importance | How to Develop |
|---|---|---|
| ML technical skills | Critical | Courses, projects, publications |
| Research experience | High | Academic or industry research |
| Safety knowledge | Medium-High | Reading, courses, writing |
| Communication | Medium | Writing, presentations |
| Clear research interests | Medium | Reading, reflection, pilot projects |
Common Failure Modes
Section titled âCommon Failure Modesâ| Failure Mode | Description | Mitigation |
|---|---|---|
| Premature application | Applying without sufficient ML skills | Build fundamentals first |
| No research output | Nothing demonstrating research capability | Complete pilot project |
| Vague interests | Unable to articulate what you want to work on | Read extensively, form views |
| Poor fit | Mismatch between interests and program | Research programs carefully |
| Giving up early | Rejection discouragement | Multiple applications, iterate |
Talent Pipeline Analysis
Section titled âTalent Pipeline AnalysisâCurrent Capacity
Section titled âCurrent Capacityâ| Stage | Annual Output | Bottleneck |
|---|---|---|
| Interested individuals | Thousands | Conversion |
| Program applicants | 500-1000 | Selectivity |
| Program participants | 150-300 | Capacity |
| Research-productive alumni | 100-200 | Mentorship |
| Long-term field contributors | 50-100 | Retention |
Scaling Challenges
Section titled âScaling Challengesâ| Challenge | Description | Potential Solutions |
|---|---|---|
| Mentor bandwidth | Limited senior researchers available | Peer mentorship, async formats |
| Quality maintenance | Scaling may dilute intensity | Tiered programs |
| Funding | Programs need sustainable funding | Philanthropic, industry, government |
| Coordination | Many programs with unclear differentiation | Better information, specialization |
| Retention | Many trained researchers leave safety | Better career paths, culture |
Strategic Assessment
Section titled âStrategic Assessmentâ| Dimension | Assessment | Notes |
|---|---|---|
| Tractability | High | Known how to train researchers |
| If AI risk high | High | Need many more researchers |
| If AI risk low | Medium | Still valuable for responsible development |
| Neglectedness | Medium | Significant investment but scaling gaps |
| Timeline to impact | 1-5 years | Trained researchers take time to contribute |
| Grade | B+ | Important but faces scaling limits |
Risks Addressed
Section titled âRisks Addressedâ| Risk | Mechanism | Effectiveness |
|---|---|---|
| Inadequate safety research | More researchers doing safety work | High |
| Racing dynamics | Safety talent at labs can advocate | Medium |
| Field capture | Diverse training reduces groupthink | Medium |
Complementary Interventions
Section titled âComplementary Interventionsâ- Field Building - Broader ecosystem development
- Corporate Influence - Placing trained researchers at labs
- AI Safety Institutes - Employers for trained researchers
Sources
Section titled âSourcesâProgram Information
Section titled âProgram Informationâ- MATS: matsprogram.org - Official program information
- Anthropic Fellows: alignment.anthropic.com/2024/anthropic-fellows-program
- SPAR: sparai.org - Scholars Program for AI Risks
- LASR Labs: lasrlabs.org
- Global AI Safety Fellowship: globalaisafetyfellowship.com
Career Guidance
Section titled âCareer Guidanceâ- 80,000 Hours: âAI Safety Syllabusâ and career guide
- Alignment Forum: Career advice threads
- EA Forum: âRank Best Universities for AI Safetyâ
Academic Programs
Section titled âAcademic Programsâ- University of York SAINTS CDT: york.ac.uk/study/postgraduate-research/centres-doctoral-training/safe-ai-training
- Stanford Center for AI Safety: aisafety.stanford.edu
- CHAI (Berkeley): humancompatible.ai
AI Transition Model Context
Section titled âAI Transition Model ContextâAI safety training programs improve the Ai Transition Model through multiple factors:
| Factor | Parameter | Impact |
|---|---|---|
| Misalignment Potential | Safety-Capability Gap | Produces 100-200 new safety researchers annually to address research talent bottleneck |
| Misalignment Potential | Alignment Robustness | Mentored researchers produce higher-quality alignment work |
| Civilizational Competence | Institutional Quality | Trained researchers staff AI Safety Institutes and governance organizations |
Training programs are critical infrastructure for the field; their effectiveness is bottlenecked by limited mentor bandwidth and retention challenges.