Skip to content

CAIS (Center for AI Safety)

📋Page Status
Quality:78 (Good)
Importance:35.5 (Reference)
Last edited:2025-12-24 (14 days ago)
Words:847
Backlinks:2
Structure:
📊 7📈 0🔗 37📚 020%Score: 10/15
LLM Summary:CAIS is a nonprofit AI safety organization that combines technical research (representation engineering, safety benchmarks like MACHIAVELLI) with field-building (200+ researchers supported, $2M+ in compute grants) and policy communication (organized the 2023 AI extinction risk statement signed by 350+ AI leaders including Sam Altman and Geoffrey Hinton). The organization has 15+ full-time staff, ~$5M annual budget, and focuses on 2-5 year research horizons with measurable impact through benchmark adoption by Anthropic and OpenAI.
Research Lab

CAIS

Importance35
Websitesafe.ai

The Center for AI Safety (CAIS) is a nonprofit research organization that works to reduce societal-scale risks from artificial intelligence through technical research, field-building initiatives, and public communication efforts. Founded by Dan Hendrycks, CAIS gained widespread recognition for organizing the landmark “Statement on AI Risk” in May 2023, which received signatures from over 350 AI researchers and industry leaders.

CAIS’s multi-pronged approach combines cutting-edge technical research on AI alignment and robustness with strategic field-building efforts that have supported over 200 researchers through grants and fellowships. The organization’s work spans from fundamental research on representation engineering to developing critical safety benchmarks like the MACHIAVELLI dataset for evaluating deceptive AI behavior.

Risk CategoryAssessmentEvidenceMitigation Focus
Technical Research ImpactHigh50+ safety publications, novel benchmarksRepresentation engineering, adversarial robustness
Field-Building InfluenceVery High200+ researchers supported, $1M+ distributedCompute grants, fellowship programs
Policy CommunicationHighStatement signed by major AI leadersPublic awareness, expert consensus building
Timeline RelevanceMedium-HighResearch targets near-term safety challenges2-5 year research horizon
Research DomainKey ContributionsImpact Metrics
Representation EngineeringMethods for reading/steering model internals15+ citations within 6 months
Safety BenchmarksMACHIAVELLI, power-seeking evaluationsAdopted by Anthropic, OpenAI
Adversarial RobustnessNovel defense mechanisms, evaluation protocols100+ citations on key papers
Alignment FoundationsConceptual frameworks for AI safetyInfluenced alignment research directions
ProgramScaleImpactTimeline
Compute Grants$2M+ distributed100+ researchers supported2022-present
ML Safety Scholars50+ participants annuallyEarly-career pipeline development2021-present
Research Fellowships$500K+ annually20+ fellows placed at top institutions2022-present
AI Safety Camp200+ participants totalInternational collaboration network2020-present
  • Academic Collaborations: UC Berkeley, MIT, Stanford, Oxford
  • Industry Engagement: Research partnerships with Anthropic, Google DeepMind
  • Policy Connections: Briefings for US Congress, UK Parliament, EU regulators

The May 2023 Statement on AI Risk represented a watershed moment in AI safety advocacy, consisting of a single sentence:

“Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”

CategoryNotable SignatoriesSignificance
Turing Award WinnersGeoffrey Hinton, Yoshua Bengio, Stuart RussellAcademic legitimacy
Industry LeadersSam Altman (OpenAI), Dario Amodei (Anthropic), Demis Hassabis (DeepMind)Industry acknowledgment
Policy ExpertsHelen Toner, Allan Dafoe, Gillian HadfieldGovernance credibility
Technical Researchers300+ ML/AI researchersScientific consensus

The statement’s impact included immediate media coverage across major outlets and influenced subsequent policy discussions, including mentions in UK and US government AI strategies.

Priority Area2024 Goals2025-2026 Projections
Representation EngineeringScale to frontier modelsIndustry adoption for safety checks
Evaluation FrameworksComprehensive benchmark suiteStandard evaluation protocols
Alignment MethodsProof-of-concept demonstrationsPractical implementation
Policy ResearchTechnical governance recommendationsRegulatory framework development
  • Current Budget: ~$5M annually (estimated)
  • Researcher Count: 15+ full-time staff, 50+ affiliates
  • Projected Growth: 2x expansion by 2025 based on field growth
  • Representation Engineering Scalability: Whether current methods work on frontier models remains unclear
  • Benchmark Validity: Unknown if current evaluations capture real safety risks
  • Alignment Verification: No consensus on how to verify successful alignment
  • Research vs. Policy Balance: Optimal allocation between technical work and governance efforts
  • Open vs. Closed Research: Tension between transparency and information hazards
  • Timeline Assumptions: Disagreement on AGI timelines affects research priorities
Key People
DH
Dan Hendrycks
Executive Director
MM
Mantas Mazeika
Research Director
TW
Thomas Woodside
Policy Director
AZ
Andy Zou
Research Scientist
TypeResourceDescription
Websitesafe.aiMain organization hub
ResearchCAIS PublicationsTechnical papers and reports
BlogCAIS BlogResearch updates and commentary
CoursesML Safety CourseEducational materials
PaperYearCitationsImpact
Unsolved Problems in ML Safety2022200+Research agenda setting
MACHIAVELLI Benchmark202350+Industry evaluation adoption
Representation Engineering202330+New research direction