Automated Proof Generation for Rust Code via Self-Evolution
Ensuring correctness is crucial for code generation. Formal verification offers a definitive assurance of correctness, but demands substantial human effort in proof construction and hence raises a pre...
Automated Proof Generation for Rust Code via Self-Evolution
βwe introduce SAFE, a novel framework that overcomes the lack of human-written proof [β¦] [SAFE achieves] a 70.50% accuracy rate in a benchmark crafted by human experts, [vs] GPT-4o's performance of 24.46%β
arxiv.org/abs/2410.15756
12.04.2025 22:59 β
π 1
π 0
π¬ 0
π 0
Can LLMs Enable Verification in Mainstream Programming?
Although formal methods are capable of producing reliable software, they have seen minimal adoption in everyday programming. Automatic code generation using large language models is becoming increasin...
Can LLMs Enable Verification in Mainstream Programming?
ββ¦ we explore the ability of LLMs to produce verified code in three verification languages (Dafny, Nagini, and Verus) [β¦] we use manually curated datasets derived from the state-ofthe-art Python benchmark, HumanEvalβ
arxiv.org/abs/2503.14183
12.04.2025 22:51 β
π 0
π 0
π¬ 0
π 0
YouTube video by FARβ€AI
Zac Hatfield-Dodds β Formal Verification is Overrated [Alignment Workshop]
Formal Verification is Overrated
βZac Hatfield-Dodds [argues] that relying solely on verification methods may not provide real AI safetyβ
youtu.be/bs5snugP1VA?...
18.02.2025 06:51 β
π 2
π 0
π¬ 0
π 0
Proving the Coding Interview: A Benchmark for Formally Verified Code Generation
We introduce the Formally Verified Automated Programming Progress Standards, or FVAPPS, a benchmark of 4715 samples for writing programs and proving their correctness, the largest formal verification ...
Proving the Coding Interview: A Benchmark for Formally Verified Code Generation
βWe introduce the Formally Verified Automated Programming Progress Standards, or FVAPPS, a benchmark of 4715 samples [β¦] including 1083 curated and quality controlled samplesβ
arxiv.org/abs/2502.05714
12.02.2025 01:44 β
π 4
π 1
π¬ 1
π 0
Super excited: my new @darpa program on AI for pure mathematics!
Exponentiating Mathematics (expMath) aims to accelerate the rate of progress in pure math through the development of an AI collaborator and new professional-level math benchmarks.
sam.gov/opp/4def3c13...
07.02.2025 16:58 β
π 16
π 5
π¬ 0
π 1
LLM-Assisted Static Analysis for Detecting Security Vulnerabilities
Software is prone to security vulnerabilities. Program analysis tools to detect them have limited effectiveness in practice due to their reliance on human labeled specifications. Large language models...
LLM-Assisted Static Analysis for Detecting Security Vulnerabilities
"[We combine] LLMs with static analysis to perform whole-repository reasoning for security vulnerability detection. [...] IRIS leverages LLMs to infer taint specifications and perform contextual analysis"
arxiv.org/abs/2405.17238
01.02.2025 01:34 β
π 3
π 1
π¬ 0
π 0
The VerifAI Workshop
VerifAI: AI Verification in the Wild @ ICLR 2025
VerifAI: AI Verification in the Wild @ ICLR 2025
"This workshop explores the intersection of scale-driven generative artificial intelligence (AI) and the correctness-focused principles of verification."
verifai-workshop.github.io
09.01.2025 06:18 β
π 0
π 0
π¬ 0
π 0
Laurel: Generating Dafny Assertions Using Large Language Models
Dafny is a popular verification language, which automates proofs by outsourcing them to an SMT solver. This automation is not perfect, however, and the solver often requires guidance in the form of he...
Laurel: Generating Dafny Assertions Using Large Language Models
"...we propose Laurel, a tool that uses LLMs to automatically generate helper assertions for Dafny [...] Laurel is able to generate over 50% of the required helper assertions given only a few attempts"
arxiv.org/abs/2405.16792
18.12.2024 17:38 β
π 2
π 0
π¬ 0
π 0
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs
The formalization of existing mathematical proofs is a notoriously difficult process. Despite decades of research on automation and proof assistants, writing formal proofs remains arduous and only acc...
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs - βwe introduce [β¦] a method that maps informal proofs to formal proof sketches, and uses the sketches to guide an automated prover by directing its search to easier sub-problemsβ
arxiv.org/abs/2210.12283
10.12.2024 01:28 β
π 3
π 0
π¬ 0
π 0
Sounds like essentially the LLM has a set of operations it can perform, and the policy constrains these operations to the ones that are reasonable. Eg βuser canβt buy a ticket for less than <minimum price>β
09.12.2024 18:11 β
π 0
π 0
π¬ 0
π 0
Looks like the process is (1) free text input of developerβs policy docs, (2) translation (by an LLM?) into a candidate set of SAT-checkable policy constraints, (3) audit of policies by developers, (4) auto-enforcement of policies on incoming operations generated by LLM from user interaction
09.12.2024 18:07 β
π 0
π 0
π¬ 1
π 0
Since all my Twitter content is now gone, I will start reposting some of it here. Here are the slides for my talk on the coming wave of ML-accelerated formal methods, given at the Isaac Newton Institute last month. May interest some of you.
drive.google.com/file/d/1ybQx...
29.11.2024 14:37 β
π 30
π 9
π¬ 2
π 0
They target variants of known bugs: βBy providing a starting point [β¦] we remove a lot of ambiguity from vulnerability research, and start from a concrete, well-founded theory: "This was a previous bug; there is probably another similar one somewhere"
30.11.2024 21:22 β
π 0
π 0
π¬ 1
π 0
Towards Neural Synthesis for SMT-Assisted Proof-Oriented Programming
Proof-oriented programs mix computational content with proofs of program correctness. However, the human effort involved in programming and proving is still substantial, despite the use of Satisfiabil...
"Towards Neural Synthesis for SMT-Assisted Proof-Oriented Programming" - LLM-based proof generation for F*, and a 600K LoC dataset of F* programs and proofs, suitable for ML applications. Impressive results synthesizing real-world proofs about programs!
arxiv.org/abs/2405.01787
26.11.2024 22:06 β
π 1
π 0
π¬ 0
π 0
NSF Award Search: Award # 2422214 - FMitF : Track I: Aligning Code-Generating Models with Formal Specifications Lock
Recent NSF award with several of the same authors: "Aligning Code-Generating Models with Formal Specifications" www.nsf.gov/awardsearch/...
25.11.2024 23:46 β
π 2
π 0
π¬ 0
π 0
Grammar-Aligned Decoding
Large Language Models (LLMs) struggle with reliably generating highly structured outputs, such as program code, mathematical formulas, or well-formed markup. Constrained decoding approaches mitigate t...
Grammar-Aligned Decoding - "[We propose] a decoding algorithm that guarantees the output to be grammatical while provably producing outputs that match the conditional probability of the LLM's distribution conditioned on the given grammar constraint" arxiv.org/abs/2405.21047 @lorisdanto.bsky.social
25.11.2024 23:46 β
π 14
π 2
π¬ 1
π 1