Geoffrey Irving's Avatar

Geoffrey Irving

@girving.bsky.social

Chief Scientist at the UK AI Security Institute (AISI). Previously DeepMind, OpenAI, Google Brain, etc.

3,893 Followers  |  119 Following  |  547 Posts  |  Joined: 12.07.2023
Posts Following

Posts by Geoffrey Irving (@girving.bsky.social)

Your periodic reminder that software engineering is a mixture of easy-to-verify subtasks and hard-to-verify subtasks, and the fact that the machines are getting better at coding assistance should not be explained away as "they're only good on easy-to-verify stuff".

08.03.2026 21:34 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The constant on that O(log n) depth theorem is 1.41e64, because Margulis–Gabber–Galil is not a very good spectral expander. What we really need is Ramanujan graphs, but this requires harder number theory than is feasible with February 2026 Mathlib + LLMs.

Contributions welcome!

08.03.2026 15:09 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The mmap routine copies all data to a private temporary file before exposing it to Lean as a String to avoid the chance that someone mutates the data during the check.

github.com/girving/aks/...

08.03.2026 15:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

This required generating a 6GB certificate in Rust in O(n^3) time, then checking it via an optimised, threaded, verified Lean certificate checker in O(n^2) time. I also wrote a small C FFI-based mmap routine to avoid constantly crashing the machines I was working on.

github.com/girving/aks/...

08.03.2026 15:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Random Number

In addition to the MGG expander graph construction, I also formalised the zig-zag construction of expander graphs using a native_decide-checked certificate that a particular "random" base 16-regular graph on 65536 vertices is a good spectral expander, per the discussion here.

xkcd.com/221

08.03.2026 15:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I would not be confident without careful checks: once Opus proved a theorem just for Ι£ = Β½ despite me saying I needed general Ι£. When I complained, it churned for a while and declared victory; when I looked it had a proved a theorem that took an arbitrary Ι£, but also an (hΙ£ : Ι£ = Β½) hypothesis.

08.03.2026 15:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I've run the proof through two post-hoc proof checkers via github.com/leanprover/c... (Lean's and github.com/ammkrn/nanod...), so I'm reasonably confident it's correct. See github.com/girving/aks/... for the spec.

08.03.2026 15:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Tons of human handholding throughout: coding agents are more than good enough these days for significant acceleration of formal verification (and we should use them to harden software!), but the AKS proofs are too hard for Opus 4.6 by itself.

08.03.2026 15:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
The toplevel theorems that our network has O(log n) depth and O(n log n) size.

The toplevel theorems that our network has O(log n) depth and O(n log n) size.

As an exercise in learning recent Claude Code + Opus 4.6, I've formalised Seiferas's simplified construction of the Ajtai-KomlΓ³s-SzemerΓ©di O(log n)-depth, O(n log n)-size sorting networks in Lean, using Margulis–Gabber–Galil expander graphs.

github.com/girving/aks

08.03.2026 15:09 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Astronaut gun meme: 

Wait, it's all proxies and heuristics?

Close enough

Astronaut gun meme: Wait, it's all proxies and heuristics? Close enough

06.03.2026 20:41 β€” πŸ‘ 276    πŸ” 39    πŸ’¬ 3    πŸ“Œ 0

"Literally" is a nice touch.

28.02.2026 22:25 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1

What’s the best current SNARK or computationally sound proof (if zero-knowledge doesn’t matter) to use for this kind of arbitrary-length computation purpose?

28.02.2026 11:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We now welcome OpenAI, Microsoft, the Australian Department of Industry, Science and Resources’ AI Safety Institute, the AI Safety Tactical Opportunities Fund, Sympatico Ventures, and Renaissance Philanthropy. ❀️

19.02.2026 20:54 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Thank you to all of our partners! The 2025 launch was backed by an international coalition including the Canadian AI Safety Institute, CIFAR, Schmidt Sciences, AWS, Anthropic, Halcyon Futures, the Safe AI Fund, UK Research and Innovation, and UK ARIA. ❀️

19.02.2026 20:54 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Funding 60 projects to advance AI alignment research | AISI Work The Alignment Project welcomes its first cohort of grantees, and new partners join the coalition, bringing total funding to Β£27m.

I'm excited that AISI is announcing the first 60 Alignment Project grants, bringing more independent experts and ideas into AI alignment and control research! Since the RFP last year, we've grown the total funding to £27M. Which means more ideas will be explored! 🧡

www.aisi.gov.uk/blog/funding...

19.02.2026 20:54 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Unless you make multiple accounts!

18.02.2026 14:12 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

It is unfortunately that arxiv has no mechanism for setting a social media image for a paper. I do not need the enormous ARXIV logo for this kind of link.

17.02.2026 22:07 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Fundamental Limitations in Pointwise Defences of LLM Finetuning APIs LLM developers have imposed technical interventions to prevent fine-tuning misuse attacks, attacks where adversaries evade safeguards by fine-tuning the model using a public API. Previous work has est...

Oops, here's the real paper link: arxiv.org/abs/2502.14828

17.02.2026 22:06 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The scala replacements are truly cursed.

17.02.2026 22:05 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Fundamental Limitations in Defending LLM Finetuning APIs

1. No, BPJ is fully black box.
2. Yep, by many of the same people. :)

arxiv.org/html/2502.14...

17.02.2026 22:04 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Work by the AISI Red Team + advisors: Xander Davies, Giorgi Giglemiani, Edmund Lau, Eric Winsor, me, and Yarin Gal. The Red Team is hiring if you like breaking things to motivate stronger defences! πŸ’™

17.02.2026 20:55 β€” πŸ‘ 10    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We believe this kind of attack will be very difficult to defend against with per-point jailbreak defences, but more tractable using defences that notice patterns across many queries as the method generates many failed attempts along the way.

arxiv.org/abs/2602.15001

17.02.2026 20:55 β€” πŸ‘ 10    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

New "boundary point jailbreaking" method against LLM safeguards (with prior disclosure to multiple labs) by using noised versions of harmful queries to turn sparse feedback from failed attacks into dense feedback. 🧡

www.aisi.gov.uk/blog/boundar...

17.02.2026 20:55 β€” πŸ‘ 44    πŸ” 5    πŸ’¬ 2    πŸ“Œ 3

Yeah, alas I need a rigorous, deterministic proof (for Lean purposes), so no randomness allowed. But also we should 100% get a standardised SNARK system into Lean as an option axiom. There are tricks for how to formalise that alongside a conventional kernel, but nothing that can't be surmounted.

14.02.2026 20:52 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Eigenvectors don't work as a certificate: SPD matrices are usually full rank, and checking that you have an eigenvector basis is not quadratic time.

14.02.2026 09:28 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

They don't give you rigorous bounds, though. It's always possible you missed the critical eigenspace.

13.02.2026 23:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Anyone know if there are certificates for any sparse, symmetric positive definition matrix to be positive definite, that can be checked in quadratic time? Emphasis on *any* such matrix, no structure or other assumptions allowed other than SPD.

scicomp.stackexchange.com/questions/45...

13.02.2026 07:59 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 2    πŸ“Œ 1

The nogil stuff is truly a thing of beauty or of baling wire, depending on one's perspective.

12.02.2026 20:32 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

If he's not careful, soon he will turn into an Alan Kay graduate student.

12.02.2026 10:59 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This doesn't treat obfuscated arguments: the provers are unbounded, and the verifier's choice of bits to read can be hard to compute. But hopefully a clean description of the unbounded setting helps get more people to think about the bounded case!

www.alignmentforum.org/posts/DGt9mJ...

12.02.2026 10:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0