Bioweapons
Bioweapons Risk
Overview
Section titled “Overview”AI systems could accelerate biological weapons development by helping with pathogen design, synthesis planning, or acquisition of dangerous knowledge. The concern isn’t that AI creates entirely new risks, but that it lowers barriers—making capabilities previously requiring rare expertise more accessible to bad actors.
This is considered one of the most severe near-term AI risks because biological weapons can cause mass casualties and AI-assisted bioweapons could be developed by smaller groups than traditional state programs required. Unlike many other AI risks that depend on future, more capable systems, this risk applies to models available today.
The key debate centers on whether AI provides meaningful “uplift”—whether it genuinely helps beyond what’s already accessible through scientific literature and internet searches, or whether wet-lab skills remain the true bottleneck. Current evidence is mixed: the RAND Corporation’s 2024 study found no statistically significant AI uplift for attack planning, while Microsoft research showed AI-designed toxins evading 75%+ of DNA synthesis screening.
However, 2025 has marked a significant shift in official assessments. OpenAI now expects its next-generation models to reach “high-risk classification” for biological capabilities—meaning they could provide “meaningful counterfactual assistance to novice actors.” Anthropic activated ASL-3 (AI Safety Level 3) protections for Claude Opus 4 specifically due to biological and chemical weapon concerns. The National Academies’ March 2025 report “The Age of AI in the Life Sciences” found that while current biological design tools cannot yet design self-replicating pathogens, monitoring and mitigation are urgently needed.
Risk Assessment
Section titled “Risk Assessment”| Dimension | Assessment | Notes |
|---|---|---|
| Severity | High to Catastrophic | Biological weapons can cause mass casualties; worst-case scenarios involve engineered pandemics |
| Likelihood | Uncertain | Current evidence is mixed on AI uplift; capabilities are rapidly improving |
| Timeline | Near-term | Unlike many AI risks, this concern applies to current systems |
| Trend | Increasing | Each model generation shows more biological knowledge; screening gaps persist |
| Window | Temporary | AI may eventually favor defense (surveillance, vaccines, countermeasures); risk elevated during transition period |
Responses That Address This Risk
Section titled “Responses That Address This Risk”| Response | Mechanism | Effectiveness |
|---|---|---|
| Responsible Scaling Policies (RSPs) | Internal biosecurity evaluations before deployment | Medium |
| Compute Governance | Limits access to training resources for dangerous models | Medium |
| US AI Chip Export Controls | Restricts AI chip exports to adversary nations | Low-Medium |
| AI Safety Institutes (AISIs) | Government evaluation of biosecurity risks | Medium |
| Voluntary AI Safety Commitments | Lab pledges on dangerous capability evaluation | Low |
The Total Risk Debate
Section titled “The Total Risk Debate”How dangerous is AI-assisted bioweapons development? Expert assessments vary substantially, from those who consider it an imminent catastrophic threat to those who view it as overhyped. Understanding both sides of this debate—and the key uncertainties that drive disagreement—is essential for calibrating policy responses.
Estimating Overall Risk
Section titled “Estimating Overall Risk”Attempting to quantify the total risk from AI-assisted bioweapons requires estimating both the probability of an attack and its potential consequences. Estimates vary widely:
| Estimate Type | Range | Source/Basis | Key Assumptions |
|---|---|---|---|
| Annual probability of catastrophic AI-assisted bio attack | 0.01% - 0.5% | Expert elicitation, attack chain analysis | ”Catastrophic” = 10,000+ casualties |
| Cumulative probability through 2040 | 0.1% - 8% | Timeline projections | Depends heavily on AI capability trajectory |
| Expected casualties if attack occurs | 10,000 - 10M+ | Historical/scenario analysis | Varies by pathogen, deployment method, response |
| Expected value of harm per year | $1B - $500B | Probability × consequence estimates | Extremely uncertain |
The Bioweapons Attack Chain Model estimates compound attack probability at 0.02% - 3.6% depending on assumptions, with substantial uncertainty at each step. The wide range reflects genuine disagreement about key parameters.
Existential risk context: In The Precipice, Oxford philosopher Toby Ord estimates the chance of existential catastrophe from engineered pandemics at 1 in 30 by 2100—second only to AI among anthropogenic risks. While not all engineered pandemics would be AI-assisted, this frames the potential severity. Ord notes that it “now seems within the reach of near-term biological advances to create pandemics that would kill greater than 50% of the population—not just in a particular area, but globally.”
Industry concerns: In July 2023, Anthropic CEO Dario Amodei stated that within two to three years, there was a “substantial risk” that AI tools would “greatly widen the range of actors with the technical capability to conduct a large-scale biological attack.” The CNAS report notes this could “expose the United States to catastrophic threats far exceeding the impact of COVID-19.”
Arguments for High Risk
Section titled “Arguments for High Risk”Those who consider AI-bioweapons a severe threat emphasize several points:
1. Democratization of Dangerous Knowledge
Section titled “1. Democratization of Dangerous Knowledge”AI makes dangerous biological knowledge more accessible to those who couldn’t previously obtain it. While scientific literature contains detailed protocols, navigating it requires expertise. AI systems can synthesize, explain, and contextualize this information for non-experts, potentially expanding the pool of capable actors.
The equalizer effect: The most concerning scenario isn’t AI helping expert virologists (who already have the knowledge), but AI helping moderately skilled individuals bridge knowledge gaps that previously required years of training or team collaboration.
2. Asymmetric Evasion Capabilities
Section titled “2. Asymmetric Evasion Capabilities”Microsoft’s 2024 research revealed that AI-designed toxins evaded over 75% of commercial DNA synthesis screening tools. This is qualitatively different from knowledge provision—it represents AI helping attackers circumvent existing defenses.
DNA synthesis screening is a cornerstone of current biosecurity. If AI can reliably design functional variants that evade detection, the entire screening paradigm may become obsolete faster than new defenses can be developed. This creates an asymmetric threat where even modest AI capabilities could undermine years of defensive investment.
3. Rapid Capability Improvement
Section titled “3. Rapid Capability Improvement”AI capabilities are improving rapidly. Even if current models provide limited uplift, the trend is concerning:
| Capability | GPT-4 (2023) | Claude 3.5/GPT-4o (2024) | Claude Opus 4/o3 (2025) | Trend |
|---|---|---|---|---|
| Biology knowledge | High | Very High | Expert-level | Rapidly increasing |
| Synthesis planning | Moderate | Moderate-High | High | Increasing |
| Evading guardrails | Moderate | Low-Moderate | Low (frontier models) | Variable by model |
| Integration with tools | Limited | Growing | Substantial | Accelerating |
2025 milestone: OpenAI’s April 2025 o3 model ranked in the 94th percentile among expert human virologists on the Virology Capabilities Test. This is the first time an AI model has demonstrated expert-level performance on biological troubleshooting scenarios.
The argument is that we should prepare for future capabilities, not just current ones. By the time AI demonstrably provides high uplift, it may be too late to establish governance.
4. Combination with Other Technologies
Section titled “4. Combination with Other Technologies”AI alone may provide limited uplift, but the combination of multiple technologies could be transformative:
- LLMs + protein design tools: AlphaFold and similar tools enable novel protein engineering; LLMs help identify targets and plan applications
- AI + lab automation: Automated systems could eventually execute protocols with minimal human intervention
- AI + decreasing synthesis costs: DNA synthesis costs continue falling; AI could help design sequences optimized for cheap synthesis
Each technology alone may be manageable, but their combination could create emergent risks that exceed any individual contribution.
5. Tail Risk Considerations
Section titled “5. Tail Risk Considerations”Even if the median expectation is manageable, the worst-case scenarios are severe enough to warrant serious attention:
- Engineered pandemic: A pathogen designed for transmissibility, lethality, and immune evasion could potentially cause millions of deaths
- Multiple simultaneous attacks: AI could enable coordination of attacks across multiple locations
- Degradation of trust in biology: Widespread bioterrorism could undermine beneficial biological research and public health
From a risk management perspective, low-probability/high-consequence events may deserve more weight than their expected value alone suggests.
6. Historical Underestimation
Section titled “6. Historical Underestimation”History suggests we systematically underestimate technology-enabled threats:
- Nuclear weapons were developed faster than many expected
- COVID-19 demonstrated how disruptive novel pathogens can be
- AI capabilities have repeatedly exceeded forecasts
Skepticism about AI-bioweapons risk may itself be the risky position.
7. The “De-skilling” Trajectory
Section titled “7. The “De-skilling” Trajectory”Multiple emerging technologies are simultaneously reducing the skill requirements for biological research:
- Cloud laboratories automate complex procedures and allow remote execution
- Benchtop DNA synthesizers are approaching gene-length capabilities
- AI assistants bridge knowledge gaps and provide troubleshooting guidance
- Protocol automation reduces the need for tacit laboratory knowledge
Each of these alone might be manageable, but together they suggest a trajectory toward dramatically lowered barriers. The RAND study may capture a snapshot where these technologies haven’t yet converged—but convergence appears likely within the decade.
8. Offense Has Asymmetric Advantages
Section titled “8. Offense Has Asymmetric Advantages”Biological attacks have inherent asymmetric characteristics that favor attackers:
- Attribution lag: Days to weeks may pass before an attack is recognized as intentional
- Preparation asymmetry: Attackers can prepare countermeasures for themselves; defenders must protect everyone
- Innovation asymmetry: Attackers need to succeed once; defenders must anticipate all possible attack vectors
- Psychological impact: Even unsuccessful or small-scale attacks could cause massive economic and social disruption
AI amplifies these asymmetries by potentially enabling novel attack vectors that existing defenses haven’t anticipated.
9. Open-Source Model Proliferation
Section titled “9. Open-Source Model Proliferation”Even if frontier labs implement strong biosecurity measures, the proliferation of open-source models undermines containment:
- No centralized control: Once weights are released, restrictions cannot be enforced
- Fine-tuning vulnerability: Safety training can be removed with relatively modest compute
- Capability improvements: Open models are approaching frontier capabilities with 6-12 month lags
- Global availability: Actors in any jurisdiction can access open models
The CNAS report↗ recommends considering a “licensing regime for biological design tools with potentially catastrophic capabilities”—but this is not currently implemented.
The DeepSeek warning: In February 2025, Anthropic CEO Dario Amodei reported that testing of China’s DeepSeek model revealed it was “the worst of basically any model we’d ever tested” for biosecurity—generating information critical to producing bioweapons “that can’t be found on Google or can’t be easily found in textbooks” with “absolutely no blocks whatsoever.” While Amodei did not consider DeepSeek “literally dangerous” yet, the incident highlighted how open-source models from different jurisdictions may not implement equivalent safety measures.
Arguments for Lower Risk
Section titled “Arguments for Lower Risk”Those who consider AI-bioweapons risk overstated emphasize different considerations:
1. The RAND Study: No Significant Uplift
Section titled “1. The RAND Study: No Significant Uplift”The RAND Corporation’s 2024 study is the most rigorous empirical assessment of AI uplift to date. Twelve teams of three researchers each spent 80 hours developing bioweapon attack plans—half using AI, half using only the internet. Expert evaluators found no statistically significant difference in plan viability.
This finding directly challenges claims that AI meaningfully assists biological attacks. If AI-assisted and non-AI teams perform equally, the AI “threat” may be largely illusory.
| Group | Information Quality | Plan Viability | Novelty | Statistical Significance |
|---|---|---|---|---|
| AI-assisted | High | Moderate | Low | n/a |
| Internet-only | High | Moderate | Low | n/a |
| Difference | Minimal | Minimal | None | Not significant |
Implications: Dangerous biological information is already widely accessible through legitimate scientific literature. AI may be redundant with existing sources rather than providing novel dangerous capabilities.
2. Wet Lab Bottleneck
Section titled “2. Wet Lab Bottleneck”Knowledge is not capability. Even with complete theoretical understanding, executing biological synthesis requires:
- Tacit knowledge that transfers poorly through text (how to handle contamination, optimize growth conditions, troubleshoot failures)
- Specialized equipment that is expensive, regulated, and hard to obtain
- Months of practice to develop reliable technique
- Physical safety procedures that untrained individuals typically violate
The Soviet Biopreparat program employed thousands of scientists for decades to develop reliable bioweapons. Aum Shinrikyo, despite substantial resources and scientific personnel, failed in their bioweapons attempts. The knowledge bottleneck may be much less important than the capability bottleneck.
AI cannot transfer tacit knowledge. Reading about sterile technique is different from maintaining it. AI can explain protocols but cannot teach hands-on skills.
3. Guardrails and Filtering Work
Section titled “3. Guardrails and Filtering Work”Frontier AI models include safety measures that reduce dangerous information provision:
- Refusals for explicitly harmful requests
- Content filtering
- Constitutional AI and RLHF training
- Continuous red-teaming and patching
While not perfect, these measures raise barriers. Jailbreaking techniques exist but require effort, sophistication, and often produce degraded responses. The marginal attacker may be more likely to use open internet resources than to navigate AI guardrails.
4. Existing Information Abundance
Section titled “4. Existing Information Abundance”Scientific literature already contains dangerous information. Textbooks explain pathogen biology. The internet hosts synthesis protocols. Dark web forums discuss dangerous techniques.
The marginal information contribution of AI may be minimal when the baseline is “everything is already out there.” AI’s value proposition is synthesis and accessibility, but motivated individuals were already able to find this information through traditional means.
5. Defense Advantages
Section titled “5. Defense Advantages”AI capabilities benefit defense as much as offense, and defensive applications are more scalable:
| Application | Offense Contribution | Defense Contribution | Net Balance |
|---|---|---|---|
| Pathogen detection | Marginal | Substantial | Defense |
| Vaccine development | Marginal | Transformative | Strong defense |
| Synthesis planning | Moderate | Minimal | Offense |
| Countermeasure design | Marginal | Substantial | Defense |
| Surveillance | None | Substantial | Strong defense |
| Treatment optimization | None | Substantial | Strong defense |
Metagenomic surveillance, mRNA vaccine platforms, and AI-assisted drug discovery are advancing rapidly. These defensive technologies may ultimately make biological attacks less effective rather than more dangerous.
The transition period concern: Even those who believe defense wins long-term often worry about a near-term window where offense temporarily gains advantages before defenses mature.
6. Deterrence and Attribution
Section titled “6. Deterrence and Attribution”Biological attacks, especially sophisticated ones, leave traces that can enable attribution:
- Genomic sequencing of pathogens
- Epidemiological tracking
- Intelligence on precursor purchases
- Surveillance of likely actors
State actors face retaliation risks. Non-state actors face intense investigative focus. The certainty of attribution for significant attacks provides deterrent effect that pure capability analysis misses.
7. Historical Non-Occurrence
Section titled “7. Historical Non-Occurrence”Despite decades of concern, catastrophic bioterrorism has not occurred:
- The 2001 anthrax attacks killed 5 people—tragic, but not catastrophic
- No terrorist group has successfully deployed a mass-casualty biological weapon
- State bioweapons programs have not been used since WWII
This could reflect genuine difficulty rather than mere luck. The absence of catastrophic bioterrorism despite motivation and attempts suggests the barriers are higher than often assumed.
8. Most Actors Lack the Right Motivation
Section titled “8. Most Actors Lack the Right Motivation”Catastrophic biological attacks require a specific combination of capability and motivation that is rare:
Who would want to cause a pandemic?
- State actors: Have capabilities but face deterrence (attribution, retaliation risk, blowback to own population)
- Terrorist groups: Most seek specific political goals; mass extinction doesn’t serve typical objectives
- Lone actors: May have motivation but face significant capability barriers
- Apocalyptic cults: Rare and typically incompetent (Aum Shinrikyo failed despite resources)
The overlap between “capable” and “wants maximum casualties” may be smaller than feared. Most capable actors (states, organized groups) have reasons not to deploy biological weapons; most actors who lack such reasons (doomsday cults, nihilistic lone actors) lack capability.
AI changes this calculus only if: It enables actors who previously lacked capability while retaining dangerous motivation—the “uplift for the unhinged” scenario. The RAND study suggests this isn’t happening yet.
9. Biology Favors Defense Long-Term
Section titled “9. Biology Favors Defense Long-Term”Fundamental biological facts may favor defense over the long run:
- Pathogens are detectable: All biological agents produce detectable signals (RNA, proteins, metabolic products)
- Immune systems adapt: Evolution has produced robust immune defenses; vaccines enhance these
- Countermeasures are general: mRNA platforms, broad-spectrum antivirals, and environmental controls work against many agents
- Medical capacity scales: Unlike nuclear attacks, biological attacks unfold over time, allowing response
The “defense wins” scenario: Robust metagenomic surveillance detects outbreaks early; mRNA vaccines are developed in weeks; far-UVC limits airborne transmission; medical countermeasures limit casualties. In this world, even a successful synthesis and deployment might cause localized harm but not catastrophe.
Skeptics of this view note: Defense advantages assume functional institutions and may take years to fully deploy. The transition period—before defenses mature—may be the danger zone.
10. Overhyped Capabilities May Backfire
Section titled “10. Overhyped Capabilities May Backfire”Exaggerating AI-bioweapons risk has potential costs:
- Resource misallocation: Focusing on AI-specific interventions may divert resources from more effective biosecurity investments
- Dual-use research chill: Overreaction could harm legitimate biological research
- AI development restrictions: Excessive caution about biological capabilities could impede beneficial AI applications in medicine
- Crying wolf: If claims of imminent AI-enabled bioweapons prove false, future warnings may be dismissed
Some critics argue the biosecurity community has incentives to emphasize threats to justify funding, and that healthy skepticism is appropriate.
The Key Cruxes
Section titled “The Key Cruxes”Much of the disagreement about AI-bioweapons risk reduces to a small number of factual questions where reasonable people disagree:
Crux 1: Does AI Provide Meaningful Uplift?
Section titled “Crux 1: Does AI Provide Meaningful Uplift?”If uplift is low (less than 1.5x): Focus resources on traditional biosecurity rather than AI-specific interventions. The threat is real but not qualitatively changed by AI.
If uplift is high (greater than 2x): Urgent need for AI-specific guardrails, compute governance, and model restrictions. The threat landscape has fundamentally shifted.
| Evidence | Favors Low Uplift | Favors High Uplift |
|---|---|---|
| RAND study | Strong | — |
| Screening evasion research | — | Strong |
| Model capability trends | — | Moderate |
| Expert elicitation | Mixed | Mixed |
| Current assessment | Favored (65%) | 35% |
Crux 2: Is the Knowledge Bottleneck or Capability Bottleneck More Important?
Section titled “Crux 2: Is the Knowledge Bottleneck or Capability Bottleneck More Important?”If knowledge is the bottleneck: AI providing information is directly dangerous.
If capability is the bottleneck: AI providing information is mostly redundant with existing sources; wet lab skills remain rate-limiting.
| Evidence | Favors Knowledge Bottleneck | Favors Capability Bottleneck |
|---|---|---|
| Historical bioterrorism failures | — | Strong |
| State program difficulty | — | Strong |
| Information abundance online | — | Moderate |
| AI capability trends | Moderate | — |
| Current assessment | 35% | Favored (65%) |
Crux 3: Will Defense or Offense Win Long-Term?
Section titled “Crux 3: Will Defense or Offense Win Long-Term?”If defense wins: AI-bioweapons is a transitional problem that self-corrects as defensive applications mature.
If offense wins: AI permanently shifts the advantage to attackers, requiring sustained containment efforts.
If it’s a window: The near-term favors offense, but defense catches up—the question is whether catastrophic attacks occur during the transition.
| Scenario | Probability | Implications |
|---|---|---|
| Permanent offense advantage | 15% | Maximum concern; sustained containment needed |
| Permanent defense advantage | 40% | Eventually self-correcting; manage transition |
| Temporary window (5-10 years) | 35% | Near-term urgency, medium-term resolution |
| Unclear/context-dependent | 10% | Need robust strategies for multiple scenarios |
Crux 4: How Quickly Are Capabilities Advancing?
Section titled “Crux 4: How Quickly Are Capabilities Advancing?”If capabilities are saturating: Current systems represent near-peak dangerous capabilities; governance can catch up.
If capabilities continue scaling: Future systems will be substantially more dangerous; governance is racing against time.
The AI-Bioweapons Timeline Model projects capability thresholds, with synthesis assistance potentially arriving 2027-2032 and novel agent design 2030-2040.
Crux 5: How Effective Are Guardrails and Countermeasures?
Section titled “Crux 5: How Effective Are Guardrails and Countermeasures?”If guardrails work well: The marginal risk from AI models is small; responsible development practices suffice.
If guardrails fail: Open-source proliferation and jailbreaking make model-level interventions largely ineffective.
| Factor | Favors Guardrails | Favors Guardrail Failure |
|---|---|---|
| Frontier model safety measures | Moderate | — |
| Open-source model proliferation | — | Strong |
| Jailbreaking research | — | Moderate |
| Fine-tuning vulnerability | — | Moderate |
| Current assessment | Partially effective (40%) | Limited effectiveness (60%) |
The open-source challenge: Even if frontier labs implement strong safeguards, open-source models may not. As capable open models proliferate, guardrails become optional, fine-tuning can remove remaining restrictions, and dangerous capabilities become permanently accessible.
Crux 6: Can DNA Synthesis Screening Keep Pace?
Section titled “Crux 6: Can DNA Synthesis Screening Keep Pace?”DNA synthesis screening is the primary defense against engineered pathogens, but Microsoft’s research revealed significant gaps.
If screening adapts: AI-designed evasion is a temporary problem; screening improvements restore the chokepoint.
If screening falls behind: The primary technical barrier erodes; other defenses must compensate.
Key questions:
- Can screening adapt to AI-designed evasive sequences?
- What happens as benchtop synthesis equipment becomes cheaper and more accessible?
- Can screening extend to cover novel synthesis methods and cloud laboratories?
The Framework for Nucleic Acid Synthesis Screening↗ (April 2024) represents a policy response, but only applies to federally funded programs.
Current Evidence
Section titled “Current Evidence”Studies have shown language models can provide information relevant to bioweapon development, though the significance is contested.
RAND Red-Team Study (2024)
Section titled “RAND Red-Team Study (2024)”The RAND Corporation study↗ (“The Operational Risks of AI in Large-Scale Biological Attacks”) is the most rigorous empirical assessment of AI uplift to date. Researchers Christopher Mouton, Caleb Lucas, and Ella Guest recruited 15 groups of three people to act as red team “bad guys.”
Twelve teams were given 80 hours each over seven weeks to develop bioweapon attack plans based on one of four scenarios—including a “fringe doomsday cult intent on global catastrophe” and a “private military company seeking to aid an adversary’s conventional military operation.” For each scenario, one team had access to an LLM chatbot, another had a different chatbot, and control teams used only internet resources.
Expert judges (biologists and security specialists) evaluated the resulting plans for biological and operational feasibility. The result: no statistically significant difference in plan viability between AI-assisted and non-AI groups.
Key methodology details:
- Participants had some technical background (science graduates)
- Testing focused on planning, not actual synthesis
- Used 2023-era models; capabilities have advanced since
- Sample size was relatively small (n=12 teams completing the study)
- LLMs did not generate explicit weaponization instructions, but provided “guidance and context in critical areas such as agent selection, delivery methods, and operational planning”
Limitations acknowledged by researchers: The study tested planning capability, not execution. It used participants with technical backgrounds, so may underestimate uplift for complete novices. And AI capabilities continue advancing.
Implications: The wet-lab bottleneck may be more significant than the knowledge bottleneck. Knowing how to make something is different from being able to make it.
AI-Designed Toxins Evade Screening (2024)
Section titled “AI-Designed Toxins Evade Screening (2024)”Microsoft researchers conducted a red-team exercise testing biosecurity in the protein engineering pipeline. They found that DNA screening software—used by synthesis companies to flag dangerous sequences—missed over 75% of AI-designed potential toxins. One tool flagged only 23% of sequences. After the research was published, screening systems improved to catch 72% on average.
Key details:
- Tested multiple commercial screening tools
- AI designed functional variants that differed sufficiently from known threats to evade pattern matching
- Improvement after publication shows screening can adapt—but also shows it wasn’t keeping pace
Implications: Even if current LLMs provide limited knowledge uplift, AI protein design tools may create harder-to-detect threats. The screening ecosystem has significant gaps that AI can exploit.
Gryphon Scientific Evaluation (2023)
Section titled “Gryphon Scientific Evaluation (2023)”Anthropic hired Gryphon Scientific↗ to spend more than 150 hours red-teaming Claude’s ability to provide harmful biological information. They created a rubric of several dozen questions probing critical knowledge gaps along the entire technical pathway to biological weapon development.
The findings were concerning. Rocco Casagrande, Gryphon’s managing director, stated he was “personally surprised and dismayed by how capable current LLMs were at providing critical information related to biological weapons.” He told Semafor: “These things are developing extremely, extremely fast, they’re a lot more capable than I thought they would be when it comes to science.”
Key findings:
- One team member with a postdoctoral fellowship studying a pandemic-capable virus found LLMs could provide “post-doc level knowledge to troubleshoot commonly encountered problems” when working with that virus
- For low-skill users, LLMs could suggest which viruses to acquire
- Although LLMs often hallucinate, they answered almost all questions accurately at least sometimes, and answered some critical questions nearly always accurately
- Gryphon workshops with 20+ biosecurity experts identified concerning misuse scenarios including “how to collapse an ecosystem” and “reconstruct information redacted from sensitive scientific documents”
Despite the concerning findings, Casagrande believes “concerted action could ensure safety is built into the most advanced models.”
Anthropic, OpenAI Evaluations
Section titled “Anthropic, OpenAI Evaluations”AI labs have conducted extensive internal evaluations testing whether their models could provide “uplift” to potential bioweapon developers. These evaluations are becoming more sophisticated and more alarming.
Anthropic’s approach: Anthropic’s Responsible Scaling Policy↗ (RSP) defines AI Safety Levels (ASL) modeled after biosafety level (BSL) standards. They conduct at least 10 different biorisk evaluations for each major model release. In early 2025, Anthropic sent a letter to the White House “urging immediate action on AI security after its testing revealed alarming improvements in Claude 3.7 Sonnet’s ability to assist with aspects of bioweapons development.”
OpenAI’s framework: OpenAI’s Preparedness Framework↗ categorizes biological and chemical capabilities as “Tracked Categories” requiring ongoing evaluation. They define two thresholds:
- High capability: Could “provide meaningful counterfactual assistance to ‘novice’ actors (anyone with a basic relevant technical background) that enables them to create known biological or chemical threats”
- Critical capability: Could “introduce unprecedented new pathways to severe harm”
OpenAI states their most advanced models “aren’t yet capable enough to pose severe risks” in biosecurity—but expects upcoming models may reach “high” capability level.
US/UK AI Safety Institute joint evaluation (2024): The first joint government-led model evaluation tested Claude 3.5 Sonnet across biological capabilities, cyber capabilities, software development, and safeguard efficacy. Elizabeth Kelly, AISI director, called it “the most comprehensive government-led safety evaluation of an advanced AI model to date.”
Kevin Esvelt’s Classroom Experiment
Section titled “Kevin Esvelt’s Classroom Experiment”MIT researcher Kevin Esvelt conducted an informal but striking demonstration. He asked students to use ChatGPT or other LLMs to create a dangerous virus. After only one hour, the class identified:
- Four potential pandemic pathogens
- How to generate them from synthetic DNA
- Names of DNA synthesis companies unlikely to screen orders
- Detailed protocols and troubleshooting guidance
As Esvelt put it regarding AI’s ability to circumvent DNA screening defenses: “We’ve built a Maginot Line of defense, and AI just walked around it.”
This demonstration, while not a rigorous study, illustrates how quickly accessible LLMs can be for malicious purposes—even for those without prior expertise.
CNAS Report: AI and Biological National Security Risks (2024)
Section titled “CNAS Report: AI and Biological National Security Risks (2024)”The Center for a New American Security report↗ by Bill Drexel and Caleb Withers provides a comprehensive analysis of the evolving AI-biosecurity landscape.
Key concerns identified:
- AI could enable bioterrorism, create unprecedented superviruses, and develop novel targeted bioweapons
- AI’s potential to “optimize bioweapons for targeted effects, such as pathogens tailored to specific genetic groups or geographies, could significantly shift states’ incentives to use biological weapons”
- If realized, such threats could “expose the United States to catastrophic threats far exceeding the impact of COVID-19”
Key recommendations:
- Strengthen screening mechanisms for cloud labs and genetic synthesis providers
- Conduct rigorous assessments of foundation models’ biological capabilities throughout the bioweapons lifecycle
- Invest in technical safety mechanisms to curb threats posed by foundation models
- Consider a licensing regime for biological design tools with potentially catastrophic capabilities
The report emphasizes that while AI-enabled biological catastrophes are “far from inevitable,” current biological safeguards already need significant updates.
2025 Developments: A Pivotal Year
Section titled “2025 Developments: A Pivotal Year”2025 has seen a significant shift in how AI labs and governments assess biological risks. Several developments stand out:
OpenAI’s High-Risk Classification (June 2025)
Section titled “OpenAI’s High-Risk Classification (June 2025)”OpenAI Head of Safety Systems Johannes Heidecke announced that the company expects upcoming models—particularly successors to the o3 reasoning model—to trigger “high-risk classification” under its Preparedness Framework. This means they could provide “meaningful counterfactual assistance to novice actors” in creating known biological threats.
Key points from OpenAI’s approach:
- Classified ChatGPT Agent as having “High capability in the biological domain”
- Discovered that creating bioweapons would require weeks or months of sustained AI interaction, not single conversations
- Implemented a traffic-light system: red-level content (direct bioweapon assistance) is immediately blocked; yellow-level content (dual-use information) requires careful handling
Anthropic’s ASL-3 Activation (May 2025)
Section titled “Anthropic’s ASL-3 Activation (May 2025)”Anthropic became the first lab to activate its highest safety tier (ASL-3) specifically for biological concerns when releasing Claude Opus 4. Their internal evaluations found they “could no longer confidently rule out the ability of our most advanced model to uplift people with basic STEM backgrounds” attempting to develop CBRN weapons.
Anthropic’s testing revealed:
- Participants with access to Claude Opus 4 developed bioweapon acquisition plans with “substantially fewer critical failures” than internet-only controls
- Claude went from underperforming world-class virologists to “comfortably exceeding that baseline” on virology troubleshooting within a year
- The company sent a letter to the White House urging immediate action after observing “alarming improvements” in Claude 3.7 Sonnet’s biological capabilities
National Academies Report (March 2025)
Section titled “National Academies Report (March 2025)”The National Academies of Sciences, Engineering, and Medicine published “The Age of AI in the Life Sciences: Benefits and Biosecurity Considerations,” directed by Executive Order 14110. Key findings:
- AI-enabled biological tools can improve biosecurity through enhanced surveillance and faster countermeasure development
- Current biological design tools can design simpler structures (molecules) but cannot yet design self-replicating pathogens
- A “distinct lack of empirical data” exists for evaluating biosecurity risks of AI-enabled biological tools
- Recommended continued investment alongside monitoring for potential risks
CSIS Policy Analysis (August 2025)
Section titled “CSIS Policy Analysis (August 2025)”The Center for Strategic and International Studies published “Opportunities to Strengthen U.S. Biosecurity from AI-Enabled Bioterrorism,” warning that current U.S. biosecurity measures are “ill-equipped to meet these challenges.” The report noted that critical safeguards in biological design tools are “already circumventable post-deployment.”
Supplementary Evidence
Section titled “Supplementary Evidence”| Source | Finding | Implications |
|---|---|---|
| National Academies (2025) | BDTs cannot yet design self-replicating pathogens | Current tools limited; monitoring needed |
| CSIS Report (2025) | Current biosecurity measures inadequate | Policy urgently needs updating |
| OpenAI Preparedness (2025) | Next-gen models will hit “high-risk” | Frontier labs anticipate near-term uplift |
| Anthropic ASL-3 (2025) | Cannot rule out CBRN uplift for novices | First activation of highest safety tier |
| DeepSeek testing (2025) | Open-source models lack equivalent safeguards | Proliferation concern validated |
| CNAS Report (2024) | AI-bio integration is emerging risk | Supports compound capability concern |
How AI Could Help Attackers
Section titled “How AI Could Help Attackers”AI could assist at multiple stages of bioweapon development:
Attack Chain Analysis
Section titled “Attack Chain Analysis”A successful biological attack requires success across multiple stages, each with independent failure modes:
| Stage | AI Contribution | Traditional Difficulty | AI Changes What |
|---|---|---|---|
| Motivation | None | Present | — |
| Information access | High | Moderate | Reduces search time |
| Knowledge uplift | Low-Moderate | High | Bridges expertise gaps |
| Lab access | None | High | — |
| Synthesis | None (currently) | Very High | Future: could guide procedures |
| Deployment | Low | High | Could optimize dispersal |
| Evading countermeasures | Moderate | Variable | Could design novel variants |
See Bioweapons Attack Chain Model for detailed probability estimates at each stage.
Specific Assistance Pathways
Section titled “Specific Assistance Pathways”Target identification — AI might help identify dangerous modifications to known pathogens or find novel biological agents. Large language models trained on scientific literature have extensive knowledge of pathogen biology.
Synthesis planning — AI could help determine how to create dangerous biological materials. Protein design tools can generate novel sequences, and LLMs can explain synthesis routes.
Knowledge bridging — Most concerningly, AI might help bridge knowledge gaps. Historically, bioweapons development required rare combinations of expertise. AI could help a motivated individual or small group compensate for missing knowledge, potentially replacing what previously required teams of specialists.
Evasion optimization — AI could help design pathogens or synthesis routes that evade detection by screening tools, surveillance systems, or medical countermeasures.
History & Current Infrastructure
Section titled “History & Current Infrastructure”Biological threats exist on a spectrum. State programs have historically been the main concern, but the barrier to entry may be dropping. The COVID-19 pandemic demonstrated how much damage pathogens can cause and highlighted gaps in biosecurity infrastructure.
Historical Programs
Section titled “Historical Programs”State Bioweapons Programs
Section titled “State Bioweapons Programs”Multiple nations have maintained offensive biological weapons programs despite the Biological Weapons Convention (BWC):
| Program | Era | Scale | Outcome |
|---|---|---|---|
| US | 1943-1969 | Large | Unilaterally terminated by Nixon |
| Soviet Union | 1928-1992 | Massive (30,000-40,000 staff) | Collapsed with USSR; concern about residual capabilities and scientist emigration |
| Japan (Unit 731) | 1937-1945 | Large | Defeated in WWII; perpetrators granted immunity by US in exchange for data |
| Iraq | 1980s-1990s | Moderate | Dismantled after Gulf War; revealed extensive program |
| South Africa | 1981-1993 | Moderate | Dismantled post-apartheid; included ethnic targeting research |
These programs required vast resources, thousands of scientists, and state-level infrastructure. The concern is that AI could reduce these requirements.
Current compliance concerns: The 2024 State Department report raised BWC compliance concerns about China, Russia, Iran, and North Korea. Verification remains impossible because the BWC has no formal verification regime.
The Soviet Biopreparat Program: A Case Study
Section titled “The Soviet Biopreparat Program: A Case Study”The Soviet Union operated the world’s largest, longest, and most sophisticated biological weapons program—in direct violation of the BWC it had signed in 1972. Understanding this program illuminates both the scale of resources historically required and the ongoing legacy concerns.
Scale and organization:
- Biopreparat↗ was created in April 1974 as a civilian cover organization
- Employed 30,000-40,000 personnel across 40-50 research facilities
- Included five major military-focused research institutes, numerous design facilities, three pilot plants, and five dual-use production plants
- Annual production capacity for weaponized smallpox alone: 90-100 tons
Agents developed:
- Weaponized smallpox (continued even after WHO declared eradication)
- Anthrax (“Strain 836” created as enhanced “battle strain”)
- Plague, Q fever, tularemia, glanders, Marburg hemorrhagic fever
- All agents designed for aerosol dispersal via ballistic or cruise missiles
The Sverdlovsk incident (1979): Accidental release of anthrax spores from a military facility killed at least 66 people (true number unknown—KGB destroyed records). The Soviet government blamed contaminated meat until Boris Yeltsin admitted the truth in 1992.
Key defectors who revealed the program:
- Vladimir Pasechnik (1989): First high-level defector to the UK; his testimony enabled Thatcher and Bush to pressure Gorbachev
- Ken Alibek (Kanatjan Alibekov, 1992): First deputy director of Biopreparat; created Russia’s first tularemia bomb and enhanced anthrax strains; provided US government with detailed accounting after emigration
Legacy concerns:
- Some facilities and scientists absorbed into public health institutions
- US programs attempted to redirect former weapons scientists to peaceful research
- In late 1997, US expanded efforts after detecting “intensified attempts by Iran and other countries of proliferation concern to acquire biological weapons expertise and materials from former Soviet institutes”
Lesson for AI risk: Even with massive state resources, Biopreparat required decades and thousands of scientists to develop reliable weapons. This suggests the wet-lab barrier is formidable—but also that determined state actors with existing infrastructure could integrate AI assistance more easily than non-state actors starting from scratch.
Non-State Actor Attempts
Section titled “Non-State Actor Attempts”The historical record of non-state biological attacks reveals consistent technical failures despite significant motivation and resources:
1984 Oregon Salmonella Attack (Rajneeshees)
- Religious commune deliberately contaminated salad bars with Salmonella typhimurium
- 751 cases of food poisoning, 45 hospitalizations, no deaths
- Remains the largest bioterrorist attack in U.S. history
- Used readily available pathogen requiring no sophisticated technology
- Key insight: Demonstrated that biological attacks don’t require advanced technology, but also that impact was limited without sophisticated delivery
Aum Shinrikyo (1990s)
- Japanese cult with $1 billion in assets, hundreds of members, PhD scientists
- Attempted anthrax, botulinum toxin, and other biological agents—all failed
- Anthrax sprayer deployed in Tokyo produced no casualties (used vaccine strain by mistake)
- Eventually succeeded with sarin chemical attack (13 dead, thousands injured)
- Key insight: Even well-funded, technically sophisticated groups with scientific personnel have failed at biological weapons. The wet-lab barrier is real.
2001 Anthrax Letters (Amerithrax)
- Letters containing anthrax spores killed 5 people, infected 17 others
- Perpetrator (Bruce Ivins) was a senior scientist at USAMRIID with decades of anthrax experience and legitimate access to spores
- Required no acquisition of knowledge—perpetrator was a world expert
- Key insight: Insider threat, not information access, enabled this attack. AI wouldn’t have helped—the perpetrator already knew everything.
Why has catastrophic bioterrorism not occurred?
| Factor | Explanation |
|---|---|
| Technical difficulty | Synthesis, production, and weaponization require tacit knowledge |
| Pathogen handling | Dangerous to the attacker; requires safety infrastructure |
| Delivery challenges | Aerosol dispersion is technically demanding |
| Attribution risk | Genomic analysis increasingly enables source identification |
| Goal mismatch | Most terrorist groups want publicity, not mass extinction |
| Limited access | Dangerous pathogens are controlled; acquisition is difficult |
This historical record could indicate either genuine difficulty (the barriers are high) or luck (we’ve been fortunate). The precautionary argument is that AI could systematically lower multiple barriers simultaneously, changing the calculus even if each individual barrier remains partially intact.
Current Biosecurity Infrastructure
Section titled “Current Biosecurity Infrastructure”DNA synthesis companies already screen orders for dangerous sequences, but screening isn’t comprehensive:
| Defense Layer | Coverage | Effectiveness | AI Vulnerability |
|---|---|---|---|
| DNA synthesis screening | Major companies | 40-70% (pre-2024); improving | High (evasion design) |
| BSL facility access control | High containment | High | Low |
| Pathogen inventory tracking | Research labs | Moderate | Low |
| Export controls (equipment) | Dual-use items | Moderate | Low |
| Disease surveillance | Advanced countries | Moderate-High | Moderate |
| Medical countermeasures | Known pathogens | Moderate | Moderate (novel agents) |
DNA Synthesis Screening: The Critical Chokepoint
Section titled “DNA Synthesis Screening: The Critical Chokepoint”DNA synthesis screening is considered the key “chokepoint” in the AI-assisted bioweapons pipeline—if dangerous sequences can be intercepted before synthesis, attacks become much harder. However, significant gaps remain:
Current limitations:
- Participation in the International Gene Synthesis Consortium (IGSC) is voluntary—not all companies are members
- Regulations are inconsistent between countries
- Screening relies on matching against databases of known dangerous sequences—novel variants can evade detection
- High false positive rates require expensive human review
- Benchtop DNA synthesizers are emerging that could bypass commercial screening entirely
Post-Microsoft patch status: After Microsoft’s research revealed 75%+ evasion rates, a software patch was deployed to synthesis companies worldwide. The fix now catches approximately 97% of threats—but experts warn “the fix is incomplete” and gaps remain.
Policy response: In April 2024, the White House OSTP released a Framework for Nucleic Acid Synthesis Screening↗, requiring federally funded programs to screen customers and orders, keep records, and report suspicious orders. NIST is partnering with stakeholders to improve screening standards and mitigate AI-specific risks.
Emerging Defensive Infrastructure
Section titled “Emerging Defensive Infrastructure”SecureDNA: A Swiss foundation↗ providing free, privacy-preserving DNA synthesis screening that already exceeds 2026 regulatory requirements. SecureDNA screens sequences below the 50 base pair length using a “random adversarial threshold” algorithm designed to be more robust against AI-designed evasion.
Nucleic Acid Observatory (NAO): A collaboration between SecureBio and MIT↗ pioneering pathogen-agnostic early warning through deep metagenomic sequencing. Unlike traditional surveillance that looks for known pathogens, NAO aims to detect new and unknown pathogens through wastewater and pooled nasal swab sampling.
SecureBio’s “Delay, Detect, Defend” strategy: Kevin Esvelt’s SecureBio organization↗ works on multiple defensive layers:
- Delay: Synthesis screening and access controls
- Detect: Early warning systems like the NAO
- Defend: Societal resilience through germicidal UV light, pandemic-proof PPE stockpiles, and rapid countermeasure development
Emerging Technologies of Concern
Section titled “Emerging Technologies of Concern”Several emerging technologies could compound AI-enabled biosecurity risks by removing barriers that currently limit attack feasibility:
Benchtop DNA Synthesizers
Section titled “Benchtop DNA Synthesizers”A new generation of desktop DNA synthesis devices may enable users to print DNA in their own laboratories, potentially bypassing commercial screening entirely.
Current products:
- Kilobaser↗: Personal DNA/RNA synthesizer, 27x33x33 cm, produces oligos in 30-50 minutes with 2.5 min/base turnaround
- DNA Script SYNTAX System↗: Enzymatic DNA synthesis (water-based, avoiding harsh chemicals), 96 parallel oligos up to 120 nucleotides
- Evonetix Evaleo↗: Gene-length DNA synthesis on silicon chips, claiming 10x faster than current technologies
- BioXp (Telesis Bio): Commercial benchtop synthetic biology workstation automating pipetting, mixing, thermal cycling, purification, and storage
Current limitations:
- Most benchtop devices limited to sequences under 120 base pairs—insufficient for most dangerous applications
- Not yet viable alternatives to centralized DNA providers for gene-length sequences
- Quality control and yield often inferior to commercial synthesis
Biosecurity implications:
- NTI analysis↗ notes “three converging technological trends—enzymatic synthesis, hardware automation, and increased demand from computational tools—are likely to drive rapid advancement in benchtop capabilities over the next decade”
- Manufacturers should implement rigorous sequence screening for each fragment produced
- Governments should provide clear regulations for manufacturers to incorporate screening
- Once capabilities exceed current limits, benchtop devices could become a significant biosecurity gap
Cloud Laboratories
Section titled “Cloud Laboratories”Cloud laboratories↗ are heavily automated, centralized research facilities where scientists run experiments remotely from computers. They present unique biosecurity challenges:
How cloud labs lower barriers:
- Reduce technical skill requirements by automating complex procedures
- Enable “one-stop-shop” research that could expand the pool of capable actors
- Allow experiments to be performed remotely, potentially bypassing ethical constraints in traditional academic settings
- Researchers retain full control over experimental design without physical presence
Current governance gaps:
- No public data on cloud lab operations, workflows, customer numbers, or locations worldwide
- No standardized approaches for customer screening shared between organizations
- Cybersecurity laws don’t account for unique vulnerabilities of biological data and lab automation systems
- Biosafety regulations typically neglect digital threats like remote manipulation of synthesis machines
Proposed solutions (RAND↗):
- Create a Cloud Lab Security Consortium (CLSC) similar to IGSC for DNA synthesis
- Minimum security standards: customer screening, controlled substance access, experiment screening, secured networks
- Human-in-the-loop controls when AI systems place synthesis orders for sequences of concern
Biological Design Tools (BDTs)
Section titled “Biological Design Tools (BDTs)”Beyond LLMs, specialized biological design tools present distinct risks:
AlphaFold↗ and protein structure prediction:
- Revolutionary tool for predicting protein structure from genetic sequence (90%+ accuracy)
- Could enable optimization of existing hazards: increasing toxicity, improving immune evasion, enhancing transmissibility
- Could potentially enable design of completely novel toxins targeting human proteins
- Google DeepMind engaged 50+ domain experts in biosecurity assessment for AlphaFold 3
- Implements experimental refusal mechanisms to block misuse—but biological design often resides in dual-use space
Other BDT concerns:
- Machine learning for prediction of host range, transmissibility, and virulence
- Generative models for novel agent design
- Tools that help design sequences evading DNA screening (as demonstrated in Microsoft research)
Dual-use nature: Unlike LLM guardrails, where harmful requests are often clearly distinguishable, biological design tool queries are frequently dual-use. The same protein optimization that could enhance a therapeutic could theoretically enhance a toxin. This makes technical controls more difficult than for text-based LLMs.
Policy recommendations (UNICRI↗):
- Prerelease evaluation requirements for advanced biological models regardless of funding source
- Prioritize mitigating risks of pathogens capable of causing major epidemics
- Preserve researcher autonomy while implementing targeted controls on highest-risk capabilities
Research Governance & International Law
Section titled “Research Governance & International Law”AI-enabled bioweapons risk exists within a broader context of biosecurity challenges, including ongoing debates about research oversight and international governance gaps.
Gain-of-Function and Enhanced Pandemic Pathogen Research
Section titled “Gain-of-Function and Enhanced Pandemic Pathogen Research”Gain-of-function (GoF) research—experiments that enhance pathogen transmissibility, virulence, or host range—has become intensely controversial, with implications for AI-biosecurity debates:
Recent policy developments:
- May 2024: White House OSTP released the “Policy for Oversight of Dual Use Research of Concern and Pathogens with Enhanced Pandemic Potential↗” (DURC/PEPP Policy)
- May 2025: Executive order blocked↗ the 2024 policy the day before it took effect
- Ongoing: NIH identified 40+ projects that may meet definitions of dangerous GoF research and demanded scientists suspend work
Congressional activity:
- House approved a ban on federal funding for GoF research modifying risky pathogens
- Scientific groups warn vaguely worded provisions could unintentionally halt flu vaccine development and other beneficial research
- Risky Research Review Act (S. 854, H.R. 1864) would establish a life sciences research security board
Key limitation: Both the 2014 DURC Policy and 2024 PEPP Policy only apply to government-funded research. Extending coverage to privately funded research would require new regulations or legislation. AI labs developing biological design tools with private funding currently face no equivalent oversight requirements.
Relevance to AI risk: The GoF debate previews challenges AI governance will face:
- Distinguishing beneficial from dangerous research is difficult
- Oversight mechanisms are primarily voluntary and apply only to government-funded work
- International coordination is lacking
- Technical definitions (“gain of function,” “enhanced pandemic potential”) are contested
The Biological Weapons Convention: Structural Weaknesses
Section titled “The Biological Weapons Convention: Structural Weaknesses”The Biological Weapons Convention↗ (BWC), signed in 1972, prohibits development, production, and stockpiling of biological weapons. It has 187 states parties—but significant structural weaknesses:
No verification regime:
- Unlike chemical and nuclear weapons agreements, the BWC contains no formal verification provisions
- Attempts to develop a verification protocol failed in 2001 after years of negotiation
- Governments have not discussed verification within the treaty framework for over two decades
Minimal institutional support:
- The BWC has only four staff members
- Budget is smaller than an average McDonald’s restaurant (per Toby Ord)
- Compare to: IAEA has 2,500+ staff; OPCW has 500+ staff
Recent developments:
- December 2022: States Parties established a Working Group on strengthening the Convention
- 2024: Fourth and fifth Working Group sessions held (August, December)
- December 2024: Fifth session “ended with a regrettable conclusion in which a single States Party undermined the noteworthy progress achieved”—setback reported↗ by Council on Strategic Risks
- Working Group has only seven days through end of 2025 for verification discussion
Practical limitations:
- No politically palatable, technologically feasible, and financially sustainable system can guarantee detection of all biological weapons
- Rapid advances in biotechnology create new verification challenges
- AI capabilities could make verification even more difficult by enabling novel agent design
What’s possible: While perfect verification is unachievable, the Bulletin of the Atomic Scientists argues↗ that “measures in combination could generate considerably greater confidence in compliance by BWC states parties.”
Defensive Technologies and Pandemic Preparedness
Section titled “Defensive Technologies and Pandemic Preparedness”The same technological advances that could enable attacks also offer powerful defensive capabilities. Many experts believe defense will ultimately win the offense-defense balance—the question is whether we’re in a dangerous transition period.
mRNA Vaccine Platforms
Section titled “mRNA Vaccine Platforms”The COVID-19 pandemic demonstrated the transformative potential of mRNA vaccines for rapid response:
Speed advantages:
- Traditional vaccines require time-consuming manufacturing with live pathogens
- mRNA vaccines can be designed in days once genetic sequence is known
- COVID-19 mRNA vaccines received FDA EUA in under one year—unprecedented speed
- CEPI’s “100 Days Mission” aims to develop safe, effective vaccines against new threats in just 100 days
Manufacturing advantages:
- Cell-free manufacture enables accelerated, scalable production
- Standardizable processes require minimal facility adaptations between products
- Smaller manufacturing footprints than traditional vaccines
- Same facility can produce multiple vaccine products
Safety profile:
- mRNA does not enter cell nucleus—cannot integrate into cellular genome
- Can be administered repeatedly (no anti-vector immunity like with viral vectors)
- Avoids live pathogen handling in manufacturing
Pandemic preparedness implications:
- Platform is “pathogen-agnostic”—same technology works against any target with known sequence
- BARDA and CEPI supporting development of 50+ vaccine candidates against high-risk pathogens
- Next-generation “trans-amplifying” mRNA vaccines↗ under development could provide stronger immune responses
For AI-bioweapons specifically: Rapid vaccine development could limit the damage from engineered pathogens if they’re detected early. However, novel agents designed to evade detection or existing countermeasures would still pose severe risks during the response window.
Metagenomic Surveillance
Section titled “Metagenomic Surveillance”Traditional disease surveillance looks for known pathogens. Metagenomic sequencing offers pathogen-agnostic detection:
How it works:
- Deep sequencing of all genetic material in samples (wastewater, nasal swabs, etc.)
- Computational analysis identifies viral, bacterial, and other sequences
- Can detect novel or unexpected pathogens that wouldn’t be caught by targeted testing
Current research:
- Nucleic Acid Observatory↗: Sequencing wastewater from major US airports and treatment plants
- Recent dataset: 13.1 terabases from 20 wastewater samples at LA Hyperion plant (serving 4 million residents)
- Lancet Microbe publication↗ establishing sensitivity models for W-MGS detection
Sensitivity and cost tradeoffs:
- Untargeted shotgun sequencing less sensitive than targeted methods for known pathogens
- Hybridization capture panels greatly increase sensitivity for viruses in the panel but may reduce sensitivity to unknown pathogens
- Large variation in viral detection based on sewershed hydrology and laboratory protocols
- Sensitivity of 1 infected person among 257-2,250 for certain bacterial pathogens
For AI-bioweapons specifically: Metagenomic surveillance could provide early warning for engineered pathogens that evade targeted detection. However, sensitivity limits mean outbreaks may need to reach significant scale before detection occurs.
Far-UVC Germicidal Light
Section titled “Far-UVC Germicidal Light”Far-UVC↗ (200-235 nm wavelength) is emerging as a potentially transformative technology for airborne pathogen inactivation in occupied spaces:
Why it’s different from conventional UV:
- Conventional germicidal UV-C (254 nm) harms human skin and eyes—limited to upper-room use or unoccupied spaces
- Far-UVC (typically 222 nm) is absorbed in the outer dead layer of skin and tear layer of eyes—cannot penetrate to living tissue
- Enables direct disinfection of breathing zone while people are present
Efficacy:
- Very low dose (2 mJ/cm²) of 222-nm light inactivates >95% of airborne H1N1 virus
- Single far-UVC fixture delivers 33-66 equivalent air changes per hour for pathogen removal
- Tested effective against tuberculosis, SARS-CoV-2, influenza, murine norovirus (99.8% reduction)
- 2025 review: “high ability” to kill pathogens with “high level of safety”
Applications for pandemic preparedness:
- Installation in hospitals, schools, airports, public transit could dramatically reduce airborne transmission
- Blueprint Biosecurity↗ funding research teams to evaluate deployment in real-world scenarios
- Open Philanthropy issued RFI on far-UVC evaluation↗
- NIST collaborating with industry on standards development
Remaining questions:
- Long-term exposure effects require further research
- Real-world efficacy in varied building environments not fully characterized
- Cost and feasibility of widespread deployment
For AI-bioweapons specifically: Far-UVC could provide a layer of defense against aerosol-dispersed biological agents in public spaces. Even if attackers successfully synthesize and deploy pathogens, widespread far-UVC installation could limit transmission and buy time for medical response.
Mitigations
Section titled “Mitigations”Model-Level Interventions
Section titled “Model-Level Interventions”Refusals and filtering — Training models not to help with bioweapon development and filtering dangerous outputs. But these are imperfect—models can be jailbroken, fine-tuned, or open-source models may lack restrictions entirely.
Effectiveness assessment:
- Reduces casual misuse
- Raises barrier for unsophisticated actors
- Does not prevent determined actors with technical skills
- Cannot address open-source model proliferation
Evaluations before deployment — Testing models for biosecurity risks during development, as part of responsible scaling policies. Useful but relies on labs’ good faith and competence.
AI-Specific Governance
Section titled “AI-Specific Governance”Compute governance — Limiting who can train powerful models reduces the availability of capable models to bad actors. Information security around model weights becomes important if models can provide meaningful uplift.
Biological capability thresholds — Anthropic’s RSP and similar frameworks establish biological capability as a key threshold for enhanced safety measures. This creates systematic evaluation requirements.
Open-source restrictions — Limiting the release of model weights for systems with significant biological knowledge. Controversial due to benefits of open research.
Broader Biosecurity Measures
Section titled “Broader Biosecurity Measures”Broader biosecurity measures may matter more than AI-specific interventions:
| Intervention | Cost | Risk Reduction | Priority |
|---|---|---|---|
| DNA synthesis screening | ~$100M/year | 5-15% | High |
| Metagenomic surveillance | ~$500M/year | 15-25% | Very High |
| BSL facility security | ~$200M/year | 5-10% | High |
| Pandemic response stockpiles | ~$2B/year | 10-20% | Medium-High |
| International verification | ~$300M/year | 3-8% | Medium |
DNA synthesis screening — Flagging dangerous sequences before synthesis. The primary defense but has significant gaps that AI can exploit.
Laboratory access controls — Restricting who can work with dangerous pathogens. Effective for legitimate facilities; doesn’t address improvised labs.
Disease surveillance — Early detection of outbreaks. Benefits from AI advances and may be where AI provides greatest defensive value.
Medical countermeasures — Rapid vaccine and treatment development. mRNA platforms demonstrated during COVID-19 show how quickly responses can be developed.
Timeline
Section titled “Timeline”| Date | Event |
|---|---|
| 1972 | Biological Weapons Convention signed (now 187 states parties) |
| 1984 | Rajneeshee salmonella attack—751 casualties, largest US bioterrorist attack |
| 1995 | Aum Shinrikyo attempts bioweapons (anthrax, botulinum), fails; uses sarin instead |
| 2001 | Anthrax letters kill 5, infect 17; perpetrator was an insider with legitimate access |
| 2020 | Toby Ord publishes The Precipice, estimating 1/30 existential risk from engineered pandemics |
| 2020-21 | COVID-19 demonstrates pandemic potential; exposes biosecurity gaps |
| 2022 | Collaborations Bio shows AI can design novel protein toxins in hours |
| 2023 (July) | Dario Amodei warns of “substantial risk” AI will enable bioattacks within 2-3 years |
| 2023 (Nov) | Gryphon Scientific red-team finds Claude provides “post-doc level” biological knowledge |
| 2024 (Jan) | RAND red-team study finds no significant AI uplift for bioweapon planning |
| 2024 (Apr) | White House OSTP releases Framework for Nucleic Acid Synthesis Screening |
| 2024 (May) | Microsoft research reveals 75%+ of AI-designed toxins evade DNA screening |
| 2024 (Aug) | CNAS publishes report on AI and biological national security risks |
| 2024 (Aug) | US AI Safety Institute signs agreements with Anthropic and OpenAI for biosecurity evaluation |
| 2024 (Oct) | Executive Order 14110 directs National Academies to study AI biosecurity |
| 2024 (Nov) | US/UK AI Safety Institutes conduct first joint model evaluation (Claude 3.5 Sonnet) |
| 2024 (Dec) | Anthropic RSP includes 10+ biological capability evaluations per model |
| 2025 (Jan) | Anthropic sends letter to White House citing “alarming improvements” in Claude 3.7 Sonnet |
| 2025 (Feb) | Anthropic CEO reports DeepSeek was “the worst” model tested for biosecurity safeguards |
| 2025 (Mar) | National Academies publishes “The Age of AI in the Life Sciences” report |
| 2025 (Apr) | OpenAI’s o3 model ranks 94th percentile among expert virologists on capability test |
| 2025 (May) | Anthropic activates ASL-3 protections for Claude Opus 4 due to CBRN concerns |
| 2025 (Jun) | OpenAI announces next-gen models will hit “high-risk” biological classification |
| 2025 (Jul) | OpenAI hosts biodefense summit with government researchers and NGOs |
| 2025 (Jul) | Trump administration’s AI Action Plan identifies biosecurity as priority |
| 2025 (Aug) | CSIS publishes “Opportunities to Strengthen U.S. Biosecurity from AI-Enabled Bioterrorism” |
| 2025 (Oct) | Microsoft publishes Science paper; screening patch deployed globally (97% effective) |
Expert Perspectives
Section titled “Expert Perspectives”Expert opinion on AI-bioweapons risk is divided, with prominent voices on both sides:
Those More Concerned
Section titled “Those More Concerned”Kevin Esvelt (MIT): One of the most vocal experts on AI-biosecurity risks. Esvelt emphasizes that if you ask a chatbot how to cause a pandemic, “it will suggest the 1918 influenza virus. It will even tell you where to find the gene sequences online and where to purchase the genetic components.” He co-founded SecureDNA and SecureBio to address these risks.
Dario Amodei (Anthropic CEO): In July 2023, stated there was a “substantial risk” that within 2-3 years, AI would “greatly widen the range of actors with the technical capability to conduct a large-scale biological attack.” In February 2025, reported that DeepSeek was “the worst” model tested for biosecurity, generating information “that can’t be found on Google or easily found in textbooks.”
Johannes Heidecke (OpenAI Head of Safety Systems): In June 2025, announced OpenAI expects upcoming models to hit “high-risk classification” for biological capabilities. Emphasized that “99% or even one in 100,000 performance is [not] sufficient” for testing accuracy.
Rocco Casagrande (Gryphon Scientific): After red-teaming Claude, said he was “personally surprised and dismayed by how capable current LLMs were” and that “these things are developing extremely, extremely fast.”
Toby Ord (Oxford): Estimates engineered pandemic risk at 1 in 30 by 2100—second highest anthropogenic existential risk after AI itself.
Georgia Adamson and Gregory C. Allen (CSIS): Their August 2025 report warns current U.S. biosecurity measures are “ill-equipped” to meet AI-enabled challenges, with BDT safeguards “already circumventable post-deployment.”
Bill Drexel and Caleb Withers (CNAS): Their August 2024 report warns AI could enable “catastrophic threats far exceeding the impact of COVID-19.”
Those More Skeptical
Section titled “Those More Skeptical”RAND researchers (Mouton, Lucas, Guest): Their 2024 study found “no statistically significant difference” between AI-assisted and non-AI groups in bioweapon planning capability. This is the strongest empirical evidence against immediate AI uplift concerns.
Some biosecurity practitioners: Emphasize that the wet lab bottleneck—tacit knowledge, equipment access, technique—remains the primary barrier, and AI cannot transfer hands-on skills.
Information abundance argument: Dangerous information is already accessible through scientific literature and the internet. AI may provide convenience but not fundamentally new capabilities.
The Disagreement Structure
Section titled “The Disagreement Structure”The debate often reduces to different assessments of:
| Question | Higher Concern View | Lower Concern View |
|---|---|---|
| Current uplift | 2025 lab evaluations show expert-level capabilities | RAND 2024 study is most rigorous empirical evidence |
| Future trajectory | OpenAI/Anthropic expect “high-risk” soon | May plateau; defenses improving |
| Key bottleneck | Knowledge gap narrowing fast | Wet lab skills remain rate-limiting |
| Guardrail effectiveness | DeepSeek shows open-source gaps | Frontier labs implementing robust safeguards |
| Risk tolerance | ASL-3 activation signals real concern | Base rates suggest low probability |
2025 shift: The debate has evolved significantly. Both major frontier labs now officially acknowledge their next-generation models pose elevated biological risks. The question is shifting from “does AI provide uplift?” to “how much uplift, and can mitigations keep pace?”
Notably: Even those who downplay current uplift often acknowledge that future models may pose greater risks, and that defensive investments are worthwhile regardless.
Sources & Resources
Section titled “Sources & Resources”Primary Research
Section titled “Primary Research”- RAND Corporation (2024): The Operational Risks of AI in Large-Scale Biological Attacks: Results of a Red-Team Study↗ - The most rigorous empirical study of AI uplift to date
- Microsoft Research (2025): AI-designed toxins evade DNA screening - Published in Science, October 2025
- National Academies (2025): The Age of AI in the Life Sciences: Benefits and Biosecurity Considerations - Comprehensive government-commissioned study on AI biosecurity risks
- Gryphon Scientific (2023): Red-team evaluation of Claude’s biological capabilities - Coverage in Semafor↗
- UNICRI (2021): The Potential for Dual-Use of Protein-Folding Prediction↗ - Early analysis of AlphaFold biosecurity implications
- Council on Strategic Risks (2023): The Cyber-Biosecurity Nexus↗
Policy and Analysis
Section titled “Policy and Analysis”- CSIS (2025): Opportunities to Strengthen U.S. Biosecurity from AI-Enabled Bioterrorism by Georgia Adamson and Gregory C. Allen
- CNAS (2024): AI and the Evolution of Biological National Security Risks↗ by Bill Drexel and Caleb Withers
- White House OSTP (2024): Framework for Nucleic Acid Synthesis Screening↗
- White House OSTP (2024): Policy for Oversight of DURC and PEPP↗
- NIST/AISI (2024): Pre-deployment evaluation of Claude 3.5 Sonnet↗
- Congressional Research Service: Oversight of Gain-of-Function Research with Pathogens: Issues for Congress↗
Industry Frameworks
Section titled “Industry Frameworks”- Anthropic: Responsible Scaling Policy↗
- Anthropic (2025): Biorisk Evaluations - Detailed methodology for Claude Opus 4 safety testing
- OpenAI: Preparedness Framework↗
- OpenAI (2025): Preparing for Future AI Capabilities in Biology - High-risk classification announcement
- OpenAI (2024): Building an early warning system for LLM-aided biological threat creation↗
- Google DeepMind: Our approach to biosecurity for AlphaFold 3↗
Biosecurity Organizations
Section titled “Biosecurity Organizations”- SecureDNA: DNA synthesis screening platform↗
- SecureBio: Pandemic preparedness organization↗
- Nucleic Acid Observatory: Pathogen-agnostic surveillance↗
- Nuclear Threat Initiative (NTI): Biosecurity resources↗
- Blueprint Biosecurity: Far-UVC research↗
Emerging Technologies
Section titled “Emerging Technologies”- NTI (2024): Benchtop DNA Synthesis Devices: Capabilities, Biosecurity Implications, and Governance↗
- RAND (2024): Documenting Cloud Labs and Examining How Remotely Operated Automated Laboratories Could Enable Bad Actors↗
- RAND (2024): Robust Biosecurity Measures Should Be Standardized at Scientific Cloud Labs↗
- EMBO Reports (2024): Security challenges by AI-assisted protein design↗
International Governance
Section titled “International Governance”- Arms Control Association: The Biological Weapons Convention (BWC) At A Glance↗
- Arms Control Association (2024): Strengthening the Biological Weapons Convention↗
- Bulletin of the Atomic Scientists (2024): How the Biological Weapons Convention could verify treaty compliance↗
- Council on Strategic Risks (2025): Derailment of the Fifth Working Group of the BWC↗
Defensive Technologies
Section titled “Defensive Technologies”- Nature (2018): Far-UVC light: A new tool to control the spread of airborne-mediated microbial diseases↗
- Scientific Reports (2024): 222 nm far-UVC light markedly reduces infectious airborne virus in an occupied room↗
- Lancet Microbe (2025): Inferring the sensitivity of wastewater metagenomic sequencing for virus detection↗
- Virology Journal (2025): Revolutionizing immunization: a comprehensive review of mRNA vaccine technology↗
Historical Background
Section titled “Historical Background”- Wikipedia: Soviet biological weapons program↗
- Wikipedia: Biopreparat↗
- PMC (2023): The History of Anthrax Weaponization in the Soviet Union↗
- Toby Ord: The Precipice: Existential Risk and the Future of Humanity (2020)
General Context
Section titled “General Context”- 80,000 Hours: Problem profile: Preventing catastrophic pandemics↗
- Bulletin of the Atomic Scientists (2024): Could AI help bioterrorists unleash a new pandemic?↗
- Undark (2024): The Long, Contentious Battle to Regulate Gain-of-Function Work↗
- Science (2025): NIH suspends dozens of pathogen studies over ‘gain-of-function’ concerns↗
Video & Audio
Section titled “Video & Audio”- 80,000 Hours Podcast: Kevin Esvelt on Biosecurity↗ - MIT researcher on biological risks and pandemic preparedness
- Lex Fridman #431: Roman Yampolskiy↗ - Discusses AI safety including CBRN risks
- Future of Life Institute: Podcast series↗ - Multiple episodes on biosecurity
- RAND: The AI and Biological Weapons Threat↗ - Video briefing on the 2024 study
Analytical Models
Section titled “Analytical Models”Analytical Models
The following analytical models provide structured frameworks for understanding this risk:
| Model | Type | Nov | Rig | Act | Cmp |
|---|---|---|---|---|---|
| Bioweapons Attack Chain Model This model decomposes bioweapons attacks into seven sequential steps with independent failure modes. DNA synthesis screening offers 5-15% risk reduction for $7-20M, with estimates carrying 2-5x uncertainty at each step. | Probability Decomposition | ||||
| AI Uplift Assessment Model This model estimates AI's marginal contribution to bioweapons risk over time. It projects uplift increasing from 1.3-2.5x (2024) to 3-5x by 2030, with biosecurity evasion capabilities posing the greatest concern as they could undermine existing defenses before triggering policy response. | Comparative Analysis | ||||
| AI-Bioweapons Timeline Model This model projects when AI crosses capability thresholds for bioweapons. It estimates knowledge democratization is already crossed, synthesis assistance arrives 2027-2032, and novel agent design by 2030-2040. | Timeline Projection |
AI Transition Model Context
Section titled “AI Transition Model Context”Bioweapons risk affects the Ai Transition Model primarily through Misuse Potential:
| Parameter | Impact |
|---|---|
| Biological Threat Exposure | Direct parameter—AI uplift for bioweapon development |
| AI Control Concentration | Powerful AI in few hands increases misuse risk |
The bioweapons pathway can lead to Human-Caused Catastrophe—catastrophic outcomes from humans misusing AI capabilities, distinct from AI misalignment.
Related Pages
Section titled “Related Pages”What links here
- Biological Threat Exposureparameter
- Bioweapons Attack Chain Modelmodel
- AI Uplift Assessment Modelmodel
- AI-Bioweapons Timeline Modelmodel
- Compute Governancepolicy
- AI Evaluationssafety-agenda
- Cyberweapons Riskrisk
- AI Proliferationrisk