Bioweapons: Research Report
Executive Summary
Section titled “Executive Summary”| Finding | Key Data | Implication |
|---|---|---|
| Industry alarm bells | OpenAI expects next-gen models to hit “high-risk” classification; Anthropic activated ASL-3 for Claude Opus 4 | First time major labs triggered highest safety tiers for biological concerns |
| Screening vulnerabilities | AI-designed toxins evaded 75%+ of DNA synthesis screening (pre-patch); post-patch detection ~97% but “incomplete” | Critical chokepoint partially compromised; evasion design is demonstrated capability |
| Empirical uplift contested | RAND 2024: no statistically significant difference between AI-assisted and internet-only attack planning | Strongest evidence against immediate uplift concerns |
| Capability acceleration | OpenAI’s o3 ranked 94th percentile among expert virologists; Claude went from underperforming to “comfortably exceeding” baseline in 12 months | Rapid capability gains suggest near-term threshold crossing |
| Emerging convergence risks | Benchtop DNA synthesizers + cloud labs + AI design tools = potential pipeline bypass | Each technology manageable alone; convergence creates systemic vulnerabilities |
| Defense developments | mRNA vaccines designed in days; metagenomic surveillance (NAO) detecting novel pathogens; far-UVC germicidal light | Defensive technologies advancing rapidly but deployment lags |
Research Summary
Section titled “Research Summary”AI-assisted bioweapons development has emerged as one of the most scrutinized near-term AI risks, with 2024-2025 marking a pivotal shift from theoretical concern to demonstrated capabilities and policy responses. This research report synthesizes evidence across technical capabilities, empirical studies, industry evaluations, emerging technologies, and defensive countermeasures to assess the current state and trajectory of AI-enabled biological threats.
The central tension in this domain is between demonstrated technical capabilities and measured uplift. Microsoft’s 2024-2025 research showed AI protein design tools can generate toxin variants that evaded 75-77% of commercial DNA synthesis screening systems before patches were deployed. Both OpenAI and Anthropic conducted internal evaluations concluding their 2025 models approached thresholds requiring elevated safety measures—OpenAI’s o3 ranked in the 94th percentile among expert virologists, while Anthropic activated ASL-3 protections for Claude Opus 4 specifically due to biological concerns. These industry actions represent the first instances of major AI labs triggering their highest safety protocols for biosecurity risks.
Yet the most rigorous empirical study to date—RAND’s 2024 controlled red-team experiment with 12 teams spending 80 hours each developing bioweapon attack plans—found no statistically significant difference in plan viability between AI-assisted and internet-only groups. This methodological tension (controlled experiments vs. capability benchmarks vs. expert elicitation) drives much of the ongoing debate about whether AI meaningfully increases bioweapons risk today or represents primarily a near-future concern.
Three technological convergence trends compound the assessment challenge. Benchtop DNA synthesizers are approaching gene-length synthesis capabilities, potentially bypassing centralized screening chokepoints—the global market is projected to grow from $4.75 billion in 2025 to $20.43 billion by 2033. Cloud laboratories enable remote experiment execution with AI-assisted design, lowering technical skill barriers. Biological design tools like AlphaFold 3 and protein engineering models create capabilities that blur dual-use boundaries. Each technology presents manageable risks in isolation; their convergence creates potential pipeline bypasses that existing biosecurity infrastructure was not designed to address.
Defensive capabilities are advancing rapidly but face deployment and coordination challenges. mRNA vaccine platforms demonstrated during COVID-19 that vaccines can be designed within days of sequence identification—CEPI’s 100 Days Mission aims to operationalize this for future pandemics. The Nucleic Acid Observatory pioneered pathogen-agnostic metagenomic surveillance that can detect novel threats before they’re recognized as pandemics. Far-UVC germicidal light at 222nm wavelength can inactivate 95%+ of airborne pathogens in occupied spaces without harming humans. These technologies suggest defense may ultimately win the offense-defense balance, but the transition period—while offensive capabilities mature faster than defensive deployment—represents the critical risk window.
Background
Section titled “Background”The intersection of artificial intelligence and biosecurity has shifted from speculative concern to active policy challenge over the past three years. AI-assisted bioweapons development represents a dual-use risk scenario where the same capabilities enabling drug discovery, vaccine design, and agricultural improvements could potentially lower barriers to developing dangerous biological agents.
Historical Context
Section titled “Historical Context”Biological weapons development has historically required substantial state resources, specialized expertise, and sophisticated infrastructure. The Soviet Biopreparat program employed 30,000-40,000 personnel across dozens of facilities with annual smallpox production capacity of 90-100 tons. Non-state actors have consistently failed at biological attacks despite significant resources—Aum Shinrikyo’s $1 billion budget and PhD scientists could not successfully deploy anthrax or botulinum toxin.
The concern with AI is that it could systematically lower multiple barriers simultaneously: knowledge gaps, synthesis planning, evasion of detection systems, and ultimately wet-lab execution guidance through integration with laboratory automation. If AI enables individuals or small groups to achieve what previously required state programs, the threat landscape transforms fundamentally.
The 2024-2025 Inflection Point
Section titled “The 2024-2025 Inflection Point”Several developments converged in 2024-2025 to elevate biosecurity from theoretical concern to operational priority:
- May 2025: Anthropic activated ASL-3 (AI Safety Level 3) protections for Claude Opus 4, citing biological and chemical weapon concerns—the first major lab to trigger its highest safety tier for CBRN risks
- June 2025: OpenAI announced it expects next-generation models to reach “high-risk classification” under its Preparedness Framework, meaning they could provide “meaningful counterfactual assistance to novice actors”
- October 2024: Microsoft published research in Science showing AI-designed toxins evaded 75-77% of commercial DNA synthesis screening systems; a patch deployed globally improved detection to ~97% but gaps remain
- January 2024: RAND published the most rigorous empirical study to date, finding no statistically significant AI uplift for bioweapon attack planning
This simultaneous occurrence of industry alarm, demonstrated technical vulnerabilities, and null empirical findings creates the central puzzle this report addresses.
Key Findings
Section titled “Key Findings”Finding 1: DNA Synthesis Screening Has Demonstrated Vulnerabilities
Section titled “Finding 1: DNA Synthesis Screening Has Demonstrated Vulnerabilities”The most concerning empirical finding from 2024-2025 research is that AI protein design tools can systematically evade existing biosecurity chokepoints.
The Microsoft Paraphrase Project
Section titled “The Microsoft Paraphrase Project”Microsoft researchers Eric Horvitz and Bruce Wittmann built a pipeline using generative protein design models (EvoDiff, ESM) to create thousands of synthetic variants of ricin toxin. The goal was not to make the toxin more dangerous, but to test whether sequence variations could evade biosecurity screening systems used by commercial DNA synthesis providers.
Results before patch deployment:
- Tool A: Detected only 23% of AI-designed variants
- Tool B: Detected only 27% of variants
- Tool C: Detected approximately 60-65% of variants
- Tool D: Detected approximately 70-75% of variants
Results after patch deployment (October 2024):
- Average detection improved to ~97% across commercial screening systems
- Researchers note “the fix is incomplete” and some variants still evade detection
- The vulnerability demonstrates proof-of-concept that AI can exploit homology-based screening
Implications
Section titled “Implications”Current industry-standard biosecurity measures rely heavily on homology-based algorithms—software that detects risky genetic sequences by comparing them to known “sequences of concern.” AI-generated proteins with potentially dangerous properties can pass undetected through these systems when their sequence signatures differ sufficiently from known threats.
The rapid patch deployment shows the system can adapt, but also reveals it wasn’t keeping pace with AI capabilities. The question becomes whether this is a one-time vulnerability or whether AI will continue generating novel evasion strategies faster than screening can evolve defenses.
Key uncertainty: Do AI protein design capabilities advance faster than screening can adapt, or will this prove to be a manageable cat-and-mouse game that screening ultimately wins?
Finding 2: Frontier Models Approaching Expert-Level Biological Capabilities
Section titled “Finding 2: Frontier Models Approaching Expert-Level Biological Capabilities”Industry evaluations in 2025 documented rapid capability gains in biological reasoning and troubleshooting, with models approaching or exceeding expert human performance on specialized virology assessments.
OpenAI’s o3 Model Performance
Section titled “OpenAI’s o3 Model Performance”OpenAI’s April 2025 o3 reasoning model ranked in the 94th percentile among expert human virologists on the Virology Capabilities Test. This benchmark measures biological troubleshooting, experimental design, and problem-solving across pandemic-relevant scenarios. This was the first time an AI model demonstrated expert-level performance on such assessments.
Anthropic’s ASL-3 Activation
Section titled “Anthropic’s ASL-3 Activation”Anthropic’s internal evaluation pipeline for Claude Opus 4 found they “could no longer confidently rule out the ability of our most advanced model to uplift people with basic STEM backgrounds” attempting to develop CBRN weapons. Their testing revealed:
- Participants with access to Claude Opus 4 developed bioweapon acquisition plans with “substantially fewer critical failures” than internet-only controls
- Claude’s virology troubleshooting capabilities went from underperforming world-class experts to “comfortably exceeding that baseline” within approximately 12 months
- The company sent a letter to the White House in January 2025 citing “alarming improvements” in Claude 3.7 Sonnet’s biological capabilities
Industry Consensus Shift
Section titled “Industry Consensus Shift”The convergence of these evaluations—OpenAI expecting “high-risk” classification, Anthropic activating ASL-3, joint US/UK AI Safety Institute evaluations—represents a qualitative shift. Industry leaders moved from theoretical concern to operational safety measures based on internal testing. This suggests their red-team evaluations and capability benchmarks revealed something their public models may not fully demonstrate.
Finding 3: Controlled Empirical Study Found No Significant Uplift
Section titled “Finding 3: Controlled Empirical Study Found No Significant Uplift”The strongest countervailing evidence comes from RAND’s 2024 red-team study—the most rigorous controlled experiment testing whether AI provides meaningful uplift for bioweapon attack planning.
Study Design
Section titled “Study Design”- 12 teams of three people each with science backgrounds
- 80 hours per team over seven weeks
- Four scenarios including fringe doomsday cult and state-sponsored attack
- Control structure: For each scenario, one team had access to LLM A, one to LLM B, one to internet only
- Expert evaluation: Biologists and security specialists blind-reviewed plans for feasibility
Results
Section titled “Results”No statistically significant difference in plan viability between AI-assisted and internet-only groups. LLMs provided “guidance and context in critical areas such as agent selection, delivery methods, and operational planning” but did not produce plans that expert judges rated as more viable than internet-only plans.
Study Limitations (Acknowledged by Researchers)
Section titled “Study Limitations (Acknowledged by Researchers)”| Limitation | Implication |
|---|---|
| Planning vs. execution | Study tested information access, not wet-lab capability |
| Participant background | Science graduates may underestimate uplift for complete novices |
| 2023-era models | Used GPT-4 / Claude 2 generation; capabilities have advanced |
| Sample size | n=12 teams may miss effects that larger samples would detect |
| Indirect assistance | LLMs avoided explicit weaponization instructions; provided contextual guidance |
Finding 4: Emerging Technology Convergence Creates Pipeline Bypasses
Section titled “Finding 4: Emerging Technology Convergence Creates Pipeline Bypasses”Three technology categories are converging in ways that could systematically bypass existing biosecurity chokepoints over the next 5-10 years.
Benchtop DNA Synthesizers
Section titled “Benchtop DNA Synthesizers”Desktop DNA synthesis devices may enable users to print genetic material in their own laboratories, potentially bypassing commercial screening entirely.
Market growth:
- Global DNA synthesizer market: $3.96 billion (2024) → $20.43 billion projected (2033)
- CAGR of 20% suggests rapid adoption and capability improvements
- Current devices (Kilobaser, DNA Script SYNTAX) limited to ~120 base pairs
- Gene-length synthesis (1,000+ bp) would enable dangerous sequence production
Current limitations:
- Most benchtop devices cannot yet synthesize gene-length sequences needed for most dangerous applications
- Quality control and yield often inferior to commercial providers
- Cost remains high ($35,500-$49,500 for basic systems)
NTI assessment: “Three converging technological trends—enzymatic synthesis, hardware automation, and increased demand from computational tools—are likely to drive rapid advancement in benchtop capabilities over the next decade.”
Biosecurity gap: Current regulations only apply export controls to devices capable of >1.5 kilobase synthesis. No requirements exist for customer screening or sequence screening in benchtop devices. The May 2025 Executive Order created uncertainty by rescinding the 2024 Framework for Nucleic Acid Synthesis Screening without replacement.
Cloud Laboratories
Section titled “Cloud Laboratories”Remote-access automated laboratories enable experiment execution without physical presence or direct wet-lab skills.
How they work:
- Users design experiments through web interfaces
- Automated liquid handlers, thermal cyclers, and analytical equipment execute protocols
- Results returned digitally
- No requirement for physical laboratory access or hands-on technique
AI integration:
- Natural language interfaces translate experimental goals to automated protocols
- Integration with large language models for experimental design
- Reduced technical skill barriers compared to traditional laboratory work
Current biosecurity gaps (RAND analysis):
- No public data on cloud lab operations, workflows, customer numbers, or locations worldwide
- No standardized approaches for customer screening shared between organizations
- Cybersecurity laws don’t account for unique vulnerabilities of remote biological manipulation
- Biosafety regulations typically neglect digital threats like remote manipulation of synthesis machines
Finding 5: Open-Source Models Create Persistent Guardrail Challenges
Section titled “Finding 5: Open-Source Models Create Persistent Guardrail Challenges”Even if frontier labs implement strong biosecurity measures, open-source model proliferation undermines containment strategies.
The DeepSeek Warning (February 2025)
Section titled “The DeepSeek Warning (February 2025)”Anthropic CEO Dario Amodei reported testing China’s DeepSeek model revealed it was “the worst of basically any model we’d ever tested” for biosecurity. DeepSeek generated information critical to producing bioweapons “that can’t be found on Google or can’t be easily found in textbooks” with “absolutely no blocks whatsoever.”
While Amodei did not consider DeepSeek “literally dangerous” yet, the incident highlighted how open-source models from different jurisdictions may not implement equivalent safety measures.
Structural Challenges with Open-Source
Section titled “Structural Challenges with Open-Source”| Issue | Why It Matters |
|---|---|
| No centralized control | Once weights are released, restrictions cannot be retroactively enforced |
| Fine-tuning vulnerability | Safety training can be removed with modest compute resources |
| Global availability | Actors in any jurisdiction can access open models regardless of local regulations |
| Capability lag narrowing | Open models approaching frontier capabilities with 6-12 month delays |
The CNAS report recommends considering a “licensing regime for biological design tools with potentially catastrophic capabilities”—but this is not currently implemented and faces significant political and technical barriers.
Causal Factors
Section titled “Causal Factors”The following factors influence bioweapons risk probability and severity. These tables are designed to inform future cause-effect diagram creation.
Primary Factors (Strong Influence)
Section titled “Primary Factors (Strong Influence)”| Factor | Direction | Type | Evidence | Confidence |
|---|---|---|---|---|
| AI Biological Capabilities | ↑ Risk | intermediate | o3 ranked 94th percentile among virologists; Claude Opus 4 triggered ASL-3 | High |
| DNA Synthesis Screening Effectiveness | ↓ Risk | intermediate | 75% pre-patch failure → 97% post-patch; gaps remain | High |
| Wet-Lab Skill Requirements | ↓ Risk (barrier) | leaf | Aum Shinrikyo failure; Biopreparat required 30,000+ staff | High |
| Open-Source Proliferation | ↑ Risk | cause | DeepSeek lacked safeguards; fine-tuning removes restrictions | High |
Secondary Factors (Medium Influence)
Section titled “Secondary Factors (Medium Influence)”| Factor | Direction | Type | Evidence | Confidence |
|---|---|---|---|---|
| Benchtop Synthesizer Capabilities | ↑ Risk | intermediate | Market growing 20% CAGR; approaching gene-length synthesis | Medium |
| Cloud Lab Automation | ↑ Risk | intermediate | Reduces skill barriers; no standardized security measures | Medium |
| Metagenomic Surveillance | ↓ Risk (detection) | intermediate | NAO pathogen-agnostic detection; limited current deployment | Medium |
| mRNA Vaccine Platforms | ↓ Risk (response) | intermediate | CEPI 100 Days Mission; Moderna H5 Phase 3 trial | Medium |
| Gain-of-Function Oversight | ↓ Risk | leaf | May 2025 EO created policy vacuum; limited effectiveness | Low-Medium |
Minor Factors (Weak Influence)
Section titled “Minor Factors (Weak Influence)”| Factor | Direction | Type | Evidence | Confidence |
|---|---|---|---|---|
| Biological Weapons Convention | ↓ Risk | leaf | 187 states parties but no verification regime; 4 staff members | Low |
| AI Lab Evaluations | ↓ Risk | intermediate | ASL-3, Preparedness Framework create safety thresholds | Medium-Low |
| Far-UVC Technology | ↓ Risk (mitigation) | leaf | 95%+ airborne inactivation; deployment limited | Low |
Defensive Technologies and Pandemic Preparedness
Section titled “Defensive Technologies and Pandemic Preparedness”The offense-defense balance is critical to assessing long-term bioweapons risk. Several defensive technologies are advancing rapidly but face deployment and coordination challenges.
mRNA Vaccine Platforms
Section titled “mRNA Vaccine Platforms”The COVID-19 pandemic demonstrated transformative potential for rapid vaccine development, but translating this capability to future pandemics requires sustained investment and infrastructure.
Key advantages:
- Vaccines can be designed within days once pathogen genetic sequence is known
- Cell-free manufacturing enables rapid, scalable production
- Same platform works against any pathogen with known sequence
- COVID-19 vaccines received FDA Emergency Use Authorization in under one year—unprecedented speed
CEPI 100 Days Mission:
- Goal: Safe, effective vaccines ready within 100 days of novel pathogen identification
- Supported by G7, G20, and multiple national governments
- 2024 funding commitments include up to $54.3 million for Moderna’s H5 pandemic influenza vaccine Phase 3 trial
- $145 million partnership with BioNTech to build African manufacturing capacity in Kigali, Rwanda
Recent innovations:
- RNAbox™ technology (University of Sheffield, October 2024): Continuous manufacturing process producing 7-10x more mRNA than batch production
- Next-generation trans-amplifying mRNA vaccines: Enhanced immune responses under development
- Regional manufacturing hubs: Reducing dependency on centralized production in high-income countries
Metagenomic Surveillance
Section titled “Metagenomic Surveillance”Traditional disease surveillance looks for known pathogens. Metagenomic sequencing offers pathogen-agnostic detection capable of identifying novel or engineered threats before they’re recognized as pandemics.
Nucleic Acid Observatory (NAO):
- Founded 2021; spun out from MIT’s Sculpting Evolution group under SecureBio
- Deep sequencing of wastewater, nasal swabs, and environmental samples
- Detects exponential growth patterns in any biological sequence
- Can identify threats with long incubation periods or atypical transmission
Recent activities:
- Partnership with CDC’s Traveler-based Genomic Surveillance program and Ginkgo Biosecurity
- Sequencing airplane lavatory waste and municipal wastewater from major US airports
- 36-week collaboration with PHC Global on ANTI-DOTE contract analyzing marine blackwater samples
- Developing reference-based growth detection algorithms to distinguish pathogen expansion from microbial background noise
Current limitations:
- Sensitivity varies based on pathogen type, concentration, and sample matrix
- Can detect bacterial pathogens at concentrations as low as 1 infected person among 257-2,250 for certain agents
- Computational detection methods still under development
- Large-scale deployment requires sustained funding (~$52 million/year for US-scale system)
Far-UVC Germicidal Light
Section titled “Far-UVC Germicidal Light”Far-UVC at 222nm wavelength represents a potentially transformative technology for continuous airborne pathogen inactivation in occupied spaces.
Why it’s different from conventional UV-C:
- Conventional germicidal UV-C (254nm) damages skin and eyes—limited to upper-room or unoccupied spaces
- Far-UVC (222nm) absorbed in outer dead skin layer and tear film—cannot penetrate to living tissue
- Enables direct disinfection of breathing zone with people present
Efficacy data:
- Very low dose (2 mJ/cm²) inactivates >95% of airborne H1N1 virus
- Single fixture delivers 33-66 equivalent air changes per hour
- Tested effective against tuberculosis, SARS-CoV-2, influenza, murine norovirus (99.8% reduction)
- 2025 systematic review: “high ability” to kill pathogens with “high level of safety”
Deployment status:
- NIST collaborating with industry on standards development
- Blueprint Biosecurity funding real-world evaluation studies
- Open Philanthropy issued RFI on far-UVC evaluation
- Current questions: long-term exposure effects, real-world efficacy in varied environments, cost-effectiveness of widespread deployment
Relevance to AI bioweapons: Far-UVC provides a layer of defense against aerosol-dispersed agents in public spaces. Even if attackers successfully synthesize and deploy pathogens, widespread far-UVC installation could limit transmission and buy time for medical response. However, this requires proactive infrastructure investment before attacks occur.
Regulatory and Governance Landscape
Section titled “Regulatory and Governance Landscape”Effective biosecurity requires coordination across multiple governance layers: international treaties, national oversight, industry standards, and technical safeguards. Current systems were designed before AI capabilities emerged, creating gaps and coordination challenges.
The Biological Weapons Convention: Structural Weaknesses
Section titled “The Biological Weapons Convention: Structural Weaknesses”The BWC, signed in 1972, prohibits development, production, and stockpiling of biological weapons. It has 187 states parties but faces severe structural limitations.
No verification regime:
- Unlike Chemical Weapons Convention (OPCW) or nuclear Non-Proliferation Treaty (IAEA), BWC has no formal verification provisions
- Attempts to develop verification protocol collapsed in 2001
- States have not discussed verification within treaty framework for over 20 years
Minimal institutional support:
- BWC secretariat has only 4 staff members
- Annual budget smaller than an average McDonald’s restaurant (per Toby Ord)
- Compare to: IAEA has 2,500+ staff; OPCW has 500+ staff
Recent developments:
- December 2022: Working Group established to strengthen Convention
- December 2024: Fifth Working Group session “ended with a regrettable conclusion in which a single States Party undermined the noteworthy progress achieved”
- Working Group has only 7 days remaining through end of 2025 for verification discussions
US Gain-of-Function Oversight: Policy Vacuum
Section titled “US Gain-of-Function Oversight: Policy Vacuum”Gain-of-function (GoF) research—experiments enhancing pathogen transmissibility, virulence, or host range—has become intensely controversial with direct implications for AI biosecurity.
May 2024: White House OSTP released “Policy for Oversight of Dual Use Research of Concern and Pathogens with Enhanced Pandemic Potential” (DURC/PEPP Policy), scheduled to take effect May 6, 2025.
May 2025: Executive Order 14292 (“Improving the Safety and Security of Biological Research”) issued one day before the policy took effect:
- Mandated immediate pause on federally funded “dangerous gain-of-function” research
- Rescinded the 2024 DURC/PEPP policy
- Charged OSTP with issuing replacement within 120 days
Current status (January 2026):
- 120-day deadline passed without replacement policy
- Researchers, institutions, and biosafety professionals face policy vacuum
- Ambiguity about which earlier guidance (2014 DURC Policy, 2017 P3CO Framework) remains in force
- NIH identified 40+ projects potentially meeting dangerous GoF definitions; demanded work suspension
Key limitation: Both 2014 and 2024 policies only applied to federally funded research. Extending to privately funded research requires new legislation. AI labs developing biological design tools with private funding face no equivalent oversight requirements.
DNA Synthesis Screening: The Critical Chokepoint
Section titled “DNA Synthesis Screening: The Critical Chokepoint”DNA synthesis screening represents the primary technical barrier preventing acquisition of dangerous genetic sequences, but significant gaps remain.
White House OSTP Framework (April 2024):
- Requires federally funded programs to screen customers and orders
- Mandates record-keeping and reporting of suspicious orders
- NIST partnering with stakeholders to improve screening standards
Current limitations:
- Participation in International Gene Synthesis Consortium (IGSC) voluntary—not all providers are members
- No consistent international regulations
- Screening relies on matching against databases of known dangerous sequences—novel variants evade detection
- High false positive rates require expensive human review
- Benchtop synthesizers bypassing commercial screening entirely
Post-Microsoft patch status (October 2024):
- Commercial screening systems patched to detect ~97% of AI-designed evasive variants
- Experts warn “the fix is incomplete”
- Sequence Biosecurity Risk Consortium (SBRC) working on curated list of concerning sequences
Proposed solutions:
- Hybrid screening combining homology-based and functional prediction algorithms
- Biosecurity Readiness Certification for benchtop devices
- Explicit cybersecurity measures for cloud labs
- Human-in-the-loop controls when AI systems place synthesis orders
Open Questions
Section titled “Open Questions”| Question | Why It Matters | Current State |
|---|---|---|
| Does AI provide meaningful uplift today or only near-future? | Determines urgency of interventions | RAND 2024: no significant uplift; Industry 2025: models approaching thresholds |
| Is knowledge or wet-lab capability the limiting bottleneck? | If knowledge, AI is directly dangerous; if wet-lab, AI primarily redundant | Historical failures suggest wet-lab; AI capabilities suggest knowledge gap closing |
| Will defense or offense win long-term? | Determines whether risk is transitional or permanent | mRNA vaccines, surveillance advancing; offense has asymmetric advantages short-term |
| Can DNA synthesis screening keep pace with AI evasion design? | Critical chokepoint efficacy | Post-patch 97% detection but incomplete; benchtop devices bypassing screening |
| How quickly will benchtop + cloud lab + AI convergence mature? | Timeline for end-to-end pipeline bypass | 5-10 year horizon; deployment not technical capability may be rate-limiting |
| What are effective early warning indicators before lock-in? | Enables proactive rather than reactive intervention | Capability benchmarks exist; motivation and planning indicators underdeveloped |
| Will open-source proliferation undermine frontier lab safeguards? | Determines whether guardrails are durable | DeepSeek demonstrated; fine-tuning vulnerability confirmed |
| How effective are AI lab safety evaluations? | Determines trust in industry self-regulation | ASL-3 activation suggests seriousness; external validation limited |
Sources
Section titled “Sources”Academic Research (arXiv and Peer-Reviewed)
Section titled “Academic Research (arXiv and Peer-Reviewed)”- Resilient Biosecurity in the Era of AI-Enabled Bioweapons (arXiv, August 2025) - Evaluation of PPI prediction tools failing to detect SARS-CoV-2 mutants
- Can Large Language Models Design Biological Weapons? Evaluating Moremi Bio (arXiv, May 2025) - Toxicity assessment challenging claims LLMs incapable of bioweapon design
- Generative AI for Biosciences: Emerging Threats and Roadmap to Biosecurity (arXiv, October 2025) - Analysis of 2025 White House Executive Order on biological research
- The Reality of AI and Biorisk (arXiv, January 2025) - Expanding empirical research on biorisk threat models
- Strengthening nucleic acid biosecurity screening against generative protein design tools (Science, October 2024) - Microsoft Paraphrase Project revealing screening gaps
- Security challenges by AI-assisted protein design (EMBO Reports, 2024) - Analysis of protein design dual-use concerns
- Artificial intelligence and synthetic biology: biosecurity risks, dual-use concerns, and governance pathways (AI and Ethics, Springer, 2025) - Governance framework analysis
RAND Research
Section titled “RAND Research”- The Operational Risks of AI in Large-Scale Biological Attacks: Results of a Red-Team Study (RAND, January 2024) - Controlled study finding no significant AI uplift
- Current Artificial Intelligence Does Not Meaningfully Increase Risk of a Biological Weapons Attack (RAND Press Release, 2024)
- Documenting Cloud Labs and Examining How Remotely Operated Automated Laboratories Could Enable Bad Actors (RAND, April 2025)
- Robust Biosecurity Measures Should Be Standardized at Scientific Cloud Labs (RAND Commentary, November 2024)
NTI Analysis and Reports
Section titled “NTI Analysis and Reports”- Benchtop DNA Synthesis Devices: Capabilities, Biosecurity Implications, and Governance (NTI, 2024)
- NTI at the Biological Weapons Convention: Urging Collective Action to Reduce Biological and AI Risks
- Statement on Biosecurity Risks at the Convergence of AI and the Life Sciences
- What is Biosecurity — Explained
AI Lab Evaluations and Safety Frameworks
Section titled “AI Lab Evaluations and Safety Frameworks”- Findings from a Pilot Anthropic - OpenAI Alignment Evaluation Exercise (Anthropic, June-July 2025)
- Bloom: an open source tool for automated behavioral evaluations (Anthropic, December 2025)
- Do the biorisk evaluations of AI labs actually measure the risk of developing bioweapons? (Epoch AI, 2025)
- FRONTIERSCIENCE: EVALUATING AI’S ABILITY TO (OpenAI, 2025)
Microsoft Research
Section titled “Microsoft Research”- Strengthening nucleic acid biosecurity screening against generative protein design tools (Microsoft Research, 2024)
- The Paraphrase Project: Designing defense for an era of synthetic biology (Microsoft Research)
Government and Policy
Section titled “Government and Policy”- Progressing the 100 Days Mission for greater global health security (CEPI, 2024)
- CEPI to Fund Pivotal Phase 3 Trial for Moderna’s mRNA Pandemic Influenza Candidate (CEPI, 2024)
- Biosecurity for Synthetic Nucleic Acid Sequences (NIST)
- UPDATE: US Government Policy for Oversight of DURC and PEPP (Penn EHRS, 2025)
- A possible turning point for research governance in the life sciences (mSphere, 2025)
- Oversight of Gain-of-Function Research with Pathogens: Issues for Congress (Congressional Research Service)
Biosecurity Organizations and Surveillance
Section titled “Biosecurity Organizations and Surveillance”- NAO Updates, January 2025 (Nucleic Acid Observatory)
- A Global Nucleic Acid Observatory for Biodefense and Planetary Health (arXiv, 2021) - Foundational NAO proposal
- SecureBio - Securing the future against catastrophic pandemics
- Consider funding the Nucleic Acid Observatory to Detect Stealth Pandemics (EA Forum)
Benchtop DNA Synthesizers and Cloud Labs
Section titled “Benchtop DNA Synthesizers and Cloud Labs”- Securing Benchtop DNA Synthesizers (IFP)
- Regulatory Gaps in Benchtop Nucleic Acid Synthesis Create Biosecurity Vulnerabilities (Arms Control Association, November 2025)
- DNA Synthesizer Market Size & Outlook, 2025-2033
- Kilobaser one | Personal DNA Synthesizer
Defense Technologies
Section titled “Defense Technologies”- Fast-Tracking Vaccine Manufacturing: CEPI’s Rapid Response Framework for the 100 Days Mission (PMC, 2024)
- Delivering Pandemic Vaccines in 100 Days what will it take? (CEPI Report, 2022)
- New vaccine-making process could transform pandemic response (CEPI, October 2024) - RNAbox™ technology
General Analysis
Section titled “General Analysis”- Closing the Biosecurity Gap in Synthetic Biology (Global Biodefense, October 2025)
- Generative biology: How can safeguards play catch up? (World Economic Forum, October 2025)
- AI designs for dangerous DNA can slip past biosecurity measures, study shows (NPR, October 2025)
- If AI tools can design harmful proteins, can AI tools also stop them? (Chemical & Engineering News, October 2025)
- AI Can Now Design Proteins and DNA. Scientists Warn We Need Biosecurity Rules Before It’s Too Late. (Singularity Hub, January 2026)
Connections to AI Transition Model
Section titled “Connections to AI Transition Model”This research connects to multiple components of the AI Transition Model:
| Model Component | Relationship |
|---|---|
| Misuse Potential | Primary pathway—bioweapons risk is canonical misuse scenario |
| AI Capabilities (Algorithms) | Biological knowledge and reasoning capabilities drive uplift potential |
| AI Uses (Industries) | Pharmaceutical and biotech AI integration creates dual-use infrastructure |
| Civilizational Competence (Governance) | Inadequate international coordination enables persistent vulnerabilities |
| Human Catastrophe Scenarios | Bioweapons represent rogue actor and potentially state actor pathways |
The report highlights that bioweapons risk differs from many AI safety concerns in that it applies to current systems rather than hypothetical future superintelligence. However, the rapid capability gains documented in 2025 (o3 ranking 94th percentile among virologists, Claude Opus 4 triggering ASL-3) suggest the risk may be intensifying faster than defensive measures can deploy.