I am recruiting a PhD student to work with me, Peter Cholak, Anand Pillay, and Andy Yang @pentagonalize.bsky.social on transformers and logic/model theory (or related topics). If you are interested, please email me with "FLaNN" in the subject line!
30.10.2025 19:23 β π 0 π 0 π¬ 0 π 0
Read the cookbook: arxiv.org/abs/2510.00368
Join us for weekly seminars on formal language theory, ML, NLP, and more: flannseminars.github.io
03.10.2025 16:24 β π 1 π 2 π¬ 0 π 0
Thanks to all the chefs: @ccwatson.bsky.social, @antonxue.bsky.social, @satwik77.bsky.social, @ll4r3n4.bsky.social, @lambdaviking.bsky.social, Emile Dos Santos Ferreira, @anejsvete.bsky.social, @dchiang.bsky.social
03.10.2025 16:24 β π 2 π 2 π¬ 1 π 0
There is no better way to understand what transformers can do than to get your hands dirty and construct them, weight-by-weight. The Transformer Cookbook provides a guide for anyone aiming to understand the expressive power of transformers on such a formal level.
03.10.2025 16:24 β π 1 π 2 π¬ 1 π 0
The Transformer Cookbook
We present the transformer cookbook: a collection of techniques for directly encoding algorithms into a transformer's parameters. This work addresses the steep learning curve of such endeavors, a prob...
We present The Transformer Cookbook: a collection of recipes for programming algorithms directly into transformers!
Hungry for an induction head? Craving a Dyck language recognizer? We show you step-by-step how to cook up transformers for these algorithms and many more!
03.10.2025 16:24 β π 5 π 5 π¬ 1 π 0
Andy Yang, Christopher Watson, Anton Xue, Satwik Bhattamishra, Jose Llarena, William Merrill, Emile Dos Santos Ferreira, Anej Svete, David Chiang: The Transformer Cookbook https://arxiv.org/abs/2510.00368 https://arxiv.org/pdf/2510.00368 https://arxiv.org/html/2510.00368
02.10.2025 06:33 β π 0 π 2 π¬ 0 π 0
Andy Yang @pentagonalize.bsky.social drove the conceptualization, theory, and experiments of this work. I was just the checker and editor!
23.06.2025 18:50 β π 0 π 0 π¬ 0 π 0
Although there is a lot of wiggle room in defining rounding/precision, our theoretical predictions are confirmed by experiments surprisingly well!
23.06.2025 11:56 β π 0 π 0 π¬ 1 π 0
The separating languages are very simple: L_k is the language of k blocks of one or more repetitions of a symbol, e.g., L_3 contains strings aba, aabbbbaaaaaa, etc. More blocks require more depth.
23.06.2025 11:56 β π 0 π 0 π¬ 1 π 0
Further, we show that deeper programs/formulas in C-RASP are strictly more expressive than shallower programs/formulas. Together, these results imply that in the above-defined variant, deeper transformers are strictly more expressive than shallower transformers.
23.06.2025 11:56 β π 0 π 0 π¬ 1 π 0
C-RASP is a programmer-friendly version of "temporal logic with future-masked counting." We show both are exactly equivalent to soft-attention transformers with fixed precision outside attention but no rounding inside attention (to avoid under/overflow summing over sequence).
23.06.2025 11:56 β π 0 π 0 π¬ 1 π 0
New on arXiv: Knee-Deep in C-RASP, by @pentagonalize.bsky.social, @cadilhac.bsky.social, and me. The solid stepped line is our theoretical prediction based on what problems C-RASP can solve, and the numbers/colors are what transformers (no position embedding) can learn.
23.06.2025 11:56 β π 2 π 1 π¬ 1 π 0
(Out of the papers that Aarohi @aarsri.bsky.social has published while at Notre Dame, 80% have received an award!)
23.04.2025 13:30 β π 1 π 0 π¬ 1 π 0
In contrast, on text with variation involving new words or meanings (e.g., "lie" vs. "cap"), far more data is needed, but it leads to a massive breakthrough in performance.
23.04.2025 13:30 β π 1 π 0 π¬ 1 π 0
On text with character-level variation (e.g., "strategy" vs. "strat"), out-of-the-box performance improves even with a few additional training examples -- but approaches a plateau, suggesting that more data is not the solution.
23.04.2025 13:30 β π 1 π 0 π¬ 1 π 0
Congratulations to Aarohi Srivastava @aarsri.bsky.social on winning the Best Paper Award at W-NUT at NAACL 2025! This paper applies various interventions simulating noisy text or dialectal variation to discover how different interventions have different effects.
23.04.2025 13:30 β π 3 π 1 π¬ 1 π 0
Midwest Speech and Language Days 2025
If you're submitting an abstract to @colmweb.org, might as well submit it to MSLD too! nlp.nd.edu/msld25/
20.03.2025 03:05 β π 0 π 0 π¬ 0 π 0
Midwest Speech and Language Days 2025
Registration at Midwest Speech and Language Days is free, poster printing is free, and we will be able to provide free lodging to a limited number of students. nlp.nd.edu/msld25/
20.03.2025 02:55 β π 0 π 0 π¬ 0 π 0
Midwest Speech and Language Days 2025
The abstract submission deadline for Midwest Speech and Language Days is in two days, on March 20! Please submit an abstract! MSLD is non-archival, and submissions of both work-in-progress and previously published work are encouraged. nlp.nd.edu/msld25/
18.03.2025 15:50 β π 0 π 0 π¬ 2 π 0
Midwest Speech and Language Days 2025
The meeting will feature keynote addresses by
@mohitbansal.bsky.social, @davidrmortensen.bsky.social, Karen Livescu, and Heng Ji. Plus all of your great talks and posters! nlp.nd.edu/msld25
08.03.2025 18:35 β π 4 π 1 π¬ 0 π 0
Midwest Speech and Language Days 2025
Midwest Speech and Language Days will be held Apr 15-16 at
@NotreDame! Abstract submissions are due Mar 20, and registration deadline is Mar 27. Financial assistance for students (lodging, poster printing) is available. nlp.nd.edu/msld25
08.03.2025 18:35 β π 0 π 2 π¬ 1 π 0
Oops, and @sleyna.bsky.social
23.12.2024 22:56 β π 0 π 0 π¬ 0 π 0
Oops, should be @pentagonalize.bsky.social
23.12.2024 22:56 β π 0 π 0 π¬ 1 π 0
showing that arbitrary-precision average-hard attention transformers and poly(n)-precision softmax-attention transformers are in DLOGTIME-uniform TC0.
23.12.2024 22:55 β π 0 π 0 π¬ 0 π 0
Transformers in Uniform TC$^0$
Previous work has shown that the languages recognized by average-hard attention transformers (AHATs) and softmax-attention transformers (SMATs) are within the circuit complexity class TC$^0$. However,...
(3) "Transformers in TC0" arxiv.org/abs/2409.13629. Previous work by
@lambdaviking.bsky.social, Ashish Sabharwal, and @sleyna.bsky.social has shown that transformers with log(n) precision (where n is the input length) are in the circuit complexity class TC0. This paper improves these results,
23.12.2024 22:55 β π 1 π 0 π¬ 1 π 0
We study how transformers express formal language *transductions*. For example, unique-hard attention transformers are equivalent to star-free languages. But star-free languages don't have a transduction analogue; they have at least three! Which one is it?
23.12.2024 22:55 β π 0 π 0 π¬ 1 π 0
We also show how softmax-attention transformers can simulate many average-hard attention transformers (including Perez et al's well-known average-hard attention transformer simulating a Turing machine). But it's more difficult than often seems to be assumed!
23.12.2024 22:55 β π 0 π 0 π¬ 1 π 0
The world's leading venue for collaborative research in theoretical computer science. Follow us at http://YouTube.com/SimonsInstitute.
Computer science staff writer @quantamagazine.bsky.social, ex-physicist. More about me at benbrubaker.com. Banner art by Nico Roper β find more of their work at nicoroper.com. [Obligatory disclaimer about views being my own.]
science journalist | good physics, bad physics, and sometimes ugly physics
Signal: dgaristo.72
Email: digaristo@gmail.com
The Association for Computational Linguistics (ACL) is a scientific and professional organization for people working on Natural Language Processing/Computational Linguistics.
Hash tags: #NLProc #ACL2026NLP
asst prof of computer science at cu boulder
nlp, cultural analytics, narratives, communities
books, bikes, games, art
https://maria-antoniak.github.io
PhD Candidate at Notre Dame working on NLP
I make colorless green GPUs sleep brrriously. Computational phonology, morphology, language change models, speech/language technologies (especially for people with disabilities).
Living and working with First Nations people who are keeping their ancestral languages strong. He/they.
πLarrakia, Bininj, and Miriwoong country
http://linktr.ee/stevenbird
Computational Linguist (Saarland University, Germany).
Musician (barbershop singer, coach, and judge).
New to Bluesky.
Assistant professor at Yale Linguistics. Studying computational linguistics, cognitive science, and AI. He/him.
http://timvieira.github.io/blog
CS PhD student at Oxford | Ex - Research fellow at Microsoft Research India, Undergrad at BITS Pilani
satwikb.com
exchanging algorithms with ai
ekinakyurek.github.io
Postdoc, artiste, shitposter