"Literally" is a nice touch.
28.02.2026 22:25 β π 3 π 0 π¬ 0 π 0"Literally" is a nice touch.
28.02.2026 22:25 β π 3 π 0 π¬ 0 π 0Whatβs the best current SNARK or computationally sound proof (if zero-knowledge doesnβt matter) to use for this kind of arbitrary-length computation purpose?
28.02.2026 11:59 β π 0 π 0 π¬ 0 π 0We now welcome OpenAI, Microsoft, the Australian Department of Industry, Science and Resourcesβ AI Safety Institute, the AI Safety Tactical Opportunities Fund, Sympatico Ventures, and Renaissance Philanthropy. β€οΈ
19.02.2026 20:54 β π 0 π 0 π¬ 0 π 0Thank you to all of our partners! The 2025 launch was backed by an international coalition including the Canadian AI Safety Institute, CIFAR, Schmidt Sciences, AWS, Anthropic, Halcyon Futures, the Safe AI Fund, UK Research and Innovation, and UK ARIA. β€οΈ
19.02.2026 20:54 β π 0 π 0 π¬ 1 π 0
I'm excited that AISI is announcing the first 60 Alignment Project grants, bringing more independent experts and ideas into AI alignment and control research! Since the RFP last year, we've grown the total funding to Β£27M. Which means more ideas will be explored! π§΅
www.aisi.gov.uk/blog/funding...
Unless you make multiple accounts!
18.02.2026 14:12 β π 1 π 0 π¬ 1 π 0It is unfortunately that arxiv has no mechanism for setting a social media image for a paper. I do not need the enormous ARXIV logo for this kind of link.
17.02.2026 22:07 β π 9 π 0 π¬ 0 π 0Oops, here's the real paper link: arxiv.org/abs/2502.14828
17.02.2026 22:06 β π 2 π 0 π¬ 1 π 0The scala replacements are truly cursed.
17.02.2026 22:05 β π 1 π 0 π¬ 0 π 0
1. No, BPJ is fully black box.
2. Yep, by many of the same people. :)
arxiv.org/html/2502.14...
Work by the AISI Red Team + advisors: Xander Davies, Giorgi Giglemiani, Edmund Lau, Eric Winsor, me, and Yarin Gal. The Red Team is hiring if you like breaking things to motivate stronger defences! π
17.02.2026 20:55 β π 9 π 0 π¬ 0 π 0
We believe this kind of attack will be very difficult to defend against with per-point jailbreak defences, but more tractable using defences that notice patterns across many queries as the method generates many failed attempts along the way.
arxiv.org/abs/2602.15001
New "boundary point jailbreaking" method against LLM safeguards (with prior disclosure to multiple labs) by using noised versions of harmful queries to turn sparse feedback from failed attacks into dense feedback. π§΅
www.aisi.gov.uk/blog/boundar...
Yeah, alas I need a rigorous, deterministic proof (for Lean purposes), so no randomness allowed. But also we should 100% get a standardised SNARK system into Lean as an option axiom. There are tricks for how to formalise that alongside a conventional kernel, but nothing that can't be surmounted.
14.02.2026 20:52 β π 1 π 0 π¬ 1 π 0Eigenvectors don't work as a certificate: SPD matrices are usually full rank, and checking that you have an eigenvector basis is not quadratic time.
14.02.2026 09:28 β π 1 π 0 π¬ 0 π 0They don't give you rigorous bounds, though. It's always possible you missed the critical eigenspace.
13.02.2026 23:06 β π 1 π 0 π¬ 1 π 0
Anyone know if there are certificates for any sparse, symmetric positive definition matrix to be positive definite, that can be checked in quadratic time? Emphasis on *any* such matrix, no structure or other assumptions allowed other than SPD.
scicomp.stackexchange.com/questions/45...
The nogil stuff is truly a thing of beauty or of baling wire, depending on one's perspective.
12.02.2026 20:32 β π 3 π 0 π¬ 0 π 0If he's not careful, soon he will turn into an Alan Kay graduate student.
12.02.2026 10:59 β π 2 π 0 π¬ 0 π 0
This doesn't treat obfuscated arguments: the provers are unbounded, and the verifier's choice of bits to read can be hard to compute. But hopefully a clean description of the unbounded setting helps get more people to think about the bounded case!
www.alignmentforum.org/posts/DGt9mJ...
The cross-examination trick is to have one prover simulate a naive verifier, and the other "cross-examine" the simulation so the actual verifier need only check one step of the overall computation. Reading only O(log n) bits then gets us all the way to PSPACE/poly.
12.02.2026 10:30 β π 1 π 0 π¬ 1 π 0
Joint with Jonah Brown-Cohen, Simon Marshall, Ilan Newman, Georgios Piliouras, and Mario Szegedy (now I have papers with multiple Szegedy brothers :)).
arxiv.org/abs/2602.08630
New complexity theory paper mapping the precise query complexity of debate, given unbounded provers. No new safety ideas: the goal is a self-contained presentation of debate + cross-examination, with the precise complexity class it achieves. π§΅
12.02.2026 10:30 β π 5 π 1 π¬ 1 π 0Apparently last year there was a cyberattack on the Irving Medical Center at Columbia University, which resulted in my name getting leaked. I...feel like they didn't need a cyberattack to do that in this case?
11.02.2026 22:13 β π 4 π 0 π¬ 0 π 0Ah, actually I had just failed to read correctly: yes, certainly the answer in this case would be expose as a feed.
09.02.2026 23:37 β π 0 π 0 π¬ 0 π 0If you have the models tweak NetNewsWire I can more easily inherit your improvements. :)
09.02.2026 23:01 β π 1 π 0 π¬ 1 π 0The random oracle hypothesis is mostly true.
03.02.2026 15:58 β π 2 π 0 π¬ 0 π 0
Today weβre releasing the International AI Safety Report 2026: the most comprehensive evidence-based assessment of AI capabilities, emerging risks, and safety measures to date. π§΅
(1/19)
Being one of the two Deputy Directors of AISI's Research Unit is a very central and important role! Please apply if interested!
> This isnβt your average Civil Service job. For 9β12 months, youβll co-lead one of the worldβs most influential AI safety research organisations.
x.com/nateburnikel...
Right, but it isnβt big enough: there are (2^n)^(2^n)! keyed permutations, which is doubly exponential, and only exponentially many polynomial size circuits.
01.02.2026 20:21 β π 0 π 0 π¬ 0 π 0