Page Type:ContentStyle Guide →Standard knowledge base article
Quality:64 (Good)⚠️
Importance:72.5 (High)
Last edited:2026-01-28 (3 weeks ago)
Words:2.5k
Structure:
📊 17📈 2🔗 3📚 24•12%Score: 14/15
LLM Summary:Provable Safe AI uses formal verification to provide mathematical safety guarantees, with UK's ARIA investing £59M through 2028. Current verification handles ~10^6 parameters while frontier models exceed 10^12 (6 orders of magnitude gap), yielding 1-5% probability of dominance at TAI, with critical unsolved challenge of verifying world models match reality.
Issues (2):
QualityRated 64 but structure suggests 93 (underrated by 29 points)
LessWrongOrganizationLessWrongLessWrong is a rationality-focused community blog founded in 2009 that has influenced AI safety discourse, receiving $5M+ in funding and serving as the origin point for ~31% of EA survey respondent...Quality: 44/100
Provable or “Guaranteed Safe” AI refers to approaches that aim to build AI systems with mathematical safety guarantees rather than empirical safety measures. The core premise is that we should be able to prove that an AI system will behave safely, not merely hope based on testing.
The most prominent current effort is Davidad’s Open Agency Architecture, funded through the UK’s ARIA (Advanced Research + Invention Agency) programme. This approach represents a fundamentally different philosophy from mainstream AI development: safety through formal verificationApproachFormal VerificationFormal verification seeks mathematical proofs of AI safety properties but faces a ~100,000x scale gap between verified systems (~10k parameters) and frontier models (~1.7T parameters). While offeri...Quality: 65/100 rather than alignment through training.
Estimated probability of being dominant at transformative AI: 1-5%. This is ambitious and faces significant scalability questions, but offers uniquely strong safety properties if achievable.
The field of guaranteed safe AI encompasses multiple verification methodologies, each with different tradeoffs between expressiveness, scalability, and the strength of guarantees provided.
Key finding: The best tools can now verify local robustness properties on networks with millions of parameters in reasonable time, but this remains 6 orders of magnitude below frontier model scale.
The UK’s Advanced Research + Invention Agency (ARIA) has committed £59 million over 3-5 years to the Safeguarded AI programme, led by Programme Director David “davidad” Dalrymple. This represents the largest government investment specifically targeting provably safe AI.
The foundational paper “Towards Guaranteed Safe AI” (May 2024) was co-authored by 17 prominent researchers including Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Joe Halpern, Clark Barrett, Jeannette Wing, and Joshua Tenenbaum. It defines the core architecture:
Winner announced October 2025; operational January 2026
TA 2
World model development
Not yet announced
2025-2027
TA 3
Integration and demonstration
Not yet announced
2027-2028
Key quote from davidad: “The frontier model needs to submit a proof certificate, which is something that is written in a formal language that we’re defining in another part of the program… This new language for proofs will hopefully be easy for frontier models to generate and then also easy for a deterministic, human-audited algorithm to check.”
Core unsolved problem - verifying model matches reality is undecidable in general
Physics simulators achieve less than 1% error in constrained domains; open-world accuracy unknown
Specification completeness
HIGH
Formal specs may miss important values/constraints
Estimated 60-80% of human values may be difficult to formalize
Computational tractability
HIGH
Verification can be exponentially expensive
SAT solving is NP-complete; neural network verification is co-NP-complete
Scalability gap
VERY HIGH
Current tools verify ~10^6 parameters; frontier models have ≈10^12
6 orders of magnitude gap; closing at ≈10x per 3 years
Capability competitiveness
HIGH
Unknown if verified systems can match unverified ones
No competitive benchmarks yet exist
Handling uncertainty
MEDIUM
Formal methods traditionally assume certainty
Probabilistic verification emerging but less mature
According to the Stanford Center for AI Safety whitepaper, the complexity and heterogeneity of AI systems means that formal verification of specifications is undecidable in general—even deciding whether a state of a linear hybrid system is reachable is undecidable.
Strong safety incentives - As AI gets more powerful, demand for guarantees may increase; International AI Safety Report 2025 emphasizes need for mathematical guarantees in safety-critical domains
Regulatory pressure - EU AI Act requires risk assessment for high-risk AI; formal verification may become compliance pathway
ARIA funding - £59M+ committed with clear milestones; additional £110M in related ARIA funding in 2024
Compositionality - Verified components could be combined with other approaches; modular verification enables scaling
Critical observation: At the current rate of ~10x improvement every 3 years, verification tools will not reach frontier model scale until 2039-2042—potentially after transformative AI has already been developed via other paradigms.
Neuro-Symbolic HybridsCapabilityNeuro-Symbolic Hybrid SystemsComprehensive analysis of neuro-symbolic AI systems combining neural networks with formal reasoning, documenting AlphaProof's 2024 IMO silver medal (28/42 points) and 2025 gold medal achievements. ...Quality: 55/100 - Related approach using symbolic reasoning
AI Governance - Policy context for formal verification requirements
Technical AI Safety - Comparison with other safety approaches