Geoffrey Irving's Avatar

Geoffrey Irving

@girving.bsky.social

Chief Scientist at the UK AI Security Institute (AISI). Previously DeepMind, OpenAI, Google Brain, etc.

3,883 Followers  |  119 Following  |  538 Posts  |  Joined: 12.07.2023
Posts Following

Posts by Geoffrey Irving (@girving.bsky.social)

"Literally" is a nice touch.

28.02.2026 22:25 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

What’s the best current SNARK or computationally sound proof (if zero-knowledge doesn’t matter) to use for this kind of arbitrary-length computation purpose?

28.02.2026 11:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We now welcome OpenAI, Microsoft, the Australian Department of Industry, Science and Resources’ AI Safety Institute, the AI Safety Tactical Opportunities Fund, Sympatico Ventures, and Renaissance Philanthropy. ❀️

19.02.2026 20:54 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Thank you to all of our partners! The 2025 launch was backed by an international coalition including the Canadian AI Safety Institute, CIFAR, Schmidt Sciences, AWS, Anthropic, Halcyon Futures, the Safe AI Fund, UK Research and Innovation, and UK ARIA. ❀️

19.02.2026 20:54 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Funding 60 projects to advance AI alignment research | AISI Work The Alignment Project welcomes its first cohort of grantees, and new partners join the coalition, bringing total funding to Β£27m.

I'm excited that AISI is announcing the first 60 Alignment Project grants, bringing more independent experts and ideas into AI alignment and control research! Since the RFP last year, we've grown the total funding to £27M. Which means more ideas will be explored! 🧡

www.aisi.gov.uk/blog/funding...

19.02.2026 20:54 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Unless you make multiple accounts!

18.02.2026 14:12 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

It is unfortunately that arxiv has no mechanism for setting a social media image for a paper. I do not need the enormous ARXIV logo for this kind of link.

17.02.2026 22:07 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Fundamental Limitations in Pointwise Defences of LLM Finetuning APIs LLM developers have imposed technical interventions to prevent fine-tuning misuse attacks, attacks where adversaries evade safeguards by fine-tuning the model using a public API. Previous work has est...

Oops, here's the real paper link: arxiv.org/abs/2502.14828

17.02.2026 22:06 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The scala replacements are truly cursed.

17.02.2026 22:05 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Fundamental Limitations in Defending LLM Finetuning APIs

1. No, BPJ is fully black box.
2. Yep, by many of the same people. :)

arxiv.org/html/2502.14...

17.02.2026 22:04 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Work by the AISI Red Team + advisors: Xander Davies, Giorgi Giglemiani, Edmund Lau, Eric Winsor, me, and Yarin Gal. The Red Team is hiring if you like breaking things to motivate stronger defences! πŸ’™

17.02.2026 20:55 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We believe this kind of attack will be very difficult to defend against with per-point jailbreak defences, but more tractable using defences that notice patterns across many queries as the method generates many failed attempts along the way.

arxiv.org/abs/2602.15001

17.02.2026 20:55 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

New "boundary point jailbreaking" method against LLM safeguards (with prior disclosure to multiple labs) by using noised versions of harmful queries to turn sparse feedback from failed attacks into dense feedback. 🧡

www.aisi.gov.uk/blog/boundar...

17.02.2026 20:55 β€” πŸ‘ 43    πŸ” 5    πŸ’¬ 2    πŸ“Œ 3

Yeah, alas I need a rigorous, deterministic proof (for Lean purposes), so no randomness allowed. But also we should 100% get a standardised SNARK system into Lean as an option axiom. There are tricks for how to formalise that alongside a conventional kernel, but nothing that can't be surmounted.

14.02.2026 20:52 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Eigenvectors don't work as a certificate: SPD matrices are usually full rank, and checking that you have an eigenvector basis is not quadratic time.

14.02.2026 09:28 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

They don't give you rigorous bounds, though. It's always possible you missed the critical eigenspace.

13.02.2026 23:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Anyone know if there are certificates for any sparse, symmetric positive definition matrix to be positive definite, that can be checked in quadratic time? Emphasis on *any* such matrix, no structure or other assumptions allowed other than SPD.

scicomp.stackexchange.com/questions/45...

13.02.2026 07:59 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 2    πŸ“Œ 1

The nogil stuff is truly a thing of beauty or of baling wire, depending on one's perspective.

12.02.2026 20:32 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

If he's not careful, soon he will turn into an Alan Kay graduate student.

12.02.2026 10:59 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This doesn't treat obfuscated arguments: the provers are unbounded, and the verifier's choice of bits to read can be hard to compute. But hopefully a clean description of the unbounded setting helps get more people to think about the bounded case!

www.alignmentforum.org/posts/DGt9mJ...

12.02.2026 10:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The cross-examination trick is to have one prover simulate a naive verifier, and the other "cross-examine" the simulation so the actual verifier need only check one step of the overall computation. Reading only O(log n) bits then gets us all the way to PSPACE/poly.

12.02.2026 10:30 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Joint with Jonah Brown-Cohen, Simon Marshall, Ilan Newman, Georgios Piliouras, and Mario Szegedy (now I have papers with multiple Szegedy brothers :)).

arxiv.org/abs/2602.08630

12.02.2026 10:30 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

New complexity theory paper mapping the precise query complexity of debate, given unbounded provers. No new safety ideas: the goal is a self-contained presentation of debate + cross-examination, with the precise complexity class it achieves. 🧡

12.02.2026 10:30 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Apparently last year there was a cyberattack on the Irving Medical Center at Columbia University, which resulted in my name getting leaked. I...feel like they didn't need a cyberattack to do that in this case?

11.02.2026 22:13 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Ah, actually I had just failed to read correctly: yes, certainly the answer in this case would be expose as a feed.

09.02.2026 23:37 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

If you have the models tweak NetNewsWire I can more easily inherit your improvements. :)

09.02.2026 23:01 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The random oracle hypothesis is mostly true.

03.02.2026 15:58 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

Today we’re releasing the International AI Safety Report 2026: the most comprehensive evidence-based assessment of AI capabilities, emerging risks, and safety measures to date. 🧡

(1/19)

03.02.2026 13:16 β€” πŸ‘ 58    πŸ” 30    πŸ’¬ 1    πŸ“Œ 17

Being one of the two Deputy Directors of AISI's Research Unit is a very central and important role! Please apply if interested!

> This isn’t your average Civil Service job. For 9–12 months, you’ll co-lead one of the world’s most influential AI safety research organisations.

x.com/nateburnikel...

02.02.2026 17:13 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Right, but it isn’t big enough: there are (2^n)^(2^n)! keyed permutations, which is doubly exponential, and only exponentially many polynomial size circuits.

01.02.2026 20:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0