's Avatar

@dchiang.bsky.social

27 Followers  |  40 Following  |  26 Posts  |  Joined: 29.11.2024  |  2.0797

Latest posts by dchiang.bsky.social on Bluesky

I am recruiting a PhD student to work with me, Peter Cholak, Anand Pillay, and Andy Yang @pentagonalize.bsky.social on transformers and logic/model theory (or related topics). If you are interested, please email me with "FLaNN" in the subject line!

30.10.2025 19:23 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Read the cookbook: arxiv.org/abs/2510.00368

Join us for weekly seminars on formal language theory, ML, NLP, and more: flannseminars.github.io

03.10.2025 16:24 β€” πŸ‘ 1    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Thanks to all the chefs: @ccwatson.bsky.social, @antonxue.bsky.social, @satwik77.bsky.social, @ll4r3n4.bsky.social, @lambdaviking.bsky.social, Emile Dos Santos Ferreira, @anejsvete.bsky.social, @dchiang.bsky.social

03.10.2025 16:24 β€” πŸ‘ 2    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

There is no better way to understand what transformers can do than to get your hands dirty and construct them, weight-by-weight. The Transformer Cookbook provides a guide for anyone aiming to understand the expressive power of transformers on such a formal level.

03.10.2025 16:24 β€” πŸ‘ 1    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Preview
The Transformer Cookbook We present the transformer cookbook: a collection of techniques for directly encoding algorithms into a transformer's parameters. This work addresses the steep learning curve of such endeavors, a prob...

We present The Transformer Cookbook: a collection of recipes for programming algorithms directly into transformers!

Hungry for an induction head? Craving a Dyck language recognizer? We show you step-by-step how to cook up transformers for these algorithms and many more!

03.10.2025 16:24 β€” πŸ‘ 5    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0

Andy Yang, Christopher Watson, Anton Xue, Satwik Bhattamishra, Jose Llarena, William Merrill, Emile Dos Santos Ferreira, Anej Svete, David Chiang: The Transformer Cookbook https://arxiv.org/abs/2510.00368 https://arxiv.org/pdf/2510.00368 https://arxiv.org/html/2510.00368

02.10.2025 06:33 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Andy Yang @pentagonalize.bsky.social drove the conceptualization, theory, and experiments of this work. I was just the checker and editor!

23.06.2025 18:50 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Knee-Deep in C-RASP: A Transformer Depth Hierarchy It has been observed that transformers with greater depth (that is, more layers) have more capabilities, but can we establish formally which capabilities are gained with greater depth? We answer this ...

Paper: arxiv.org/abs/2506.16055
Code: github.com/pentagonaliz...

23.06.2025 11:56 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Although there is a lot of wiggle room in defining rounding/precision, our theoretical predictions are confirmed by experiments surprisingly well!

23.06.2025 11:56 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The separating languages are very simple: L_k is the language of k blocks of one or more repetitions of a symbol, e.g., L_3 contains strings aba, aabbbbaaaaaa, etc. More blocks require more depth.

23.06.2025 11:56 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Further, we show that deeper programs/formulas in C-RASP are strictly more expressive than shallower programs/formulas. Together, these results imply that in the above-defined variant, deeper transformers are strictly more expressive than shallower transformers.

23.06.2025 11:56 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

C-RASP is a programmer-friendly version of "temporal logic with future-masked counting." We show both are exactly equivalent to soft-attention transformers with fixed precision outside attention but no rounding inside attention (to avoid under/overflow summing over sequence).

23.06.2025 11:56 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

New on arXiv: Knee-Deep in C-RASP, by @pentagonalize.bsky.social, @cadilhac.bsky.social, and me. The solid stepped line is our theoretical prediction based on what problems C-RASP can solve, and the numbers/colors are what transformers (no position embedding) can learn.

23.06.2025 11:56 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Preview
We're Calling an Intervention: Exploring Fundamental Hurdles in Adapting Language Models to Nonstandard Text We present a suite of experiments that allow us to understand the underlying challenges of language model adaptation to nonstandard text. We do so by designing interventions that approximate core feat...

arxiv.org/abs/2404.07304

23.04.2025 13:31 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

(Out of the papers that Aarohi @aarsri.bsky.social has published while at Notre Dame, 80% have received an award!)

23.04.2025 13:30 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

In contrast, on text with variation involving new words or meanings (e.g., "lie" vs. "cap"), far more data is needed, but it leads to a massive breakthrough in performance.

23.04.2025 13:30 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

On text with character-level variation (e.g., "strategy" vs. "strat"), out-of-the-box performance improves even with a few additional training examples -- but approaches a plateau, suggesting that more data is not the solution.

23.04.2025 13:30 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Congratulations to Aarohi Srivastava @aarsri.bsky.social on winning the Best Paper Award at W-NUT at NAACL 2025! This paper applies various interventions simulating noisy text or dialectal variation to discover how different interventions have different effects.

23.04.2025 13:30 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Midwest Speech and Language Days 2025

If you're submitting an abstract to @colmweb.org, might as well submit it to MSLD too! nlp.nd.edu/msld25/

20.03.2025 03:05 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Midwest Speech and Language Days 2025

Registration at Midwest Speech and Language Days is free, poster printing is free, and we will be able to provide free lodging to a limited number of students. nlp.nd.edu/msld25/

20.03.2025 02:55 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Midwest Speech and Language Days 2025

The abstract submission deadline for Midwest Speech and Language Days is in two days, on March 20! Please submit an abstract! MSLD is non-archival, and submissions of both work-in-progress and previously published work are encouraged. nlp.nd.edu/msld25/

18.03.2025 15:50 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
Midwest Speech and Language Days 2025

The meeting will feature keynote addresses by
@mohitbansal.bsky.social, @davidrmortensen.bsky.social, Karen Livescu, and Heng Ji. Plus all of your great talks and posters! nlp.nd.edu/msld25

08.03.2025 18:35 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Midwest Speech and Language Days 2025

Midwest Speech and Language Days will be held Apr 15-16 at
@NotreDame! Abstract submissions are due Mar 20, and registration deadline is Mar 27. Financial assistance for students (lodging, poster printing) is available. nlp.nd.edu/msld25

08.03.2025 18:35 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

Oops, and @sleyna.bsky.social

23.12.2024 22:56 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Oops, should be @pentagonalize.bsky.social

23.12.2024 22:56 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

showing that arbitrary-precision average-hard attention transformers and poly(n)-precision softmax-attention transformers are in DLOGTIME-uniform TC0.

23.12.2024 22:55 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Transformers in Uniform TC$^0$ Previous work has shown that the languages recognized by average-hard attention transformers (AHATs) and softmax-attention transformers (SMATs) are within the circuit complexity class TC$^0$. However,...

(3) "Transformers in TC0" arxiv.org/abs/2409.13629. Previous work by
@lambdaviking.bsky.social, Ashish Sabharwal, and @sleyna.bsky.social has shown that transformers with log(n) precision (where n is the input length) are in the circuit complexity class TC0. This paper improves these results,

23.12.2024 22:55 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We study how transformers express formal language *transductions*. For example, unique-hard attention transformers are equivalent to star-free languages. But star-free languages don't have a transduction analogue; they have at least three! Which one is it?

23.12.2024 22:55 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Transformers in Uniform TC$^0$ Previous work has shown that the languages recognized by average-hard attention transformers (AHATs) and softmax-attention transformers (SMATs) are within the circuit complexity class TC$^0$. However,...

(2) With @sleyna.bsky.social, Dana Angluin, Jon Rawski, and Ashish Sabharwal, "Transformers as Transducers" arxiv.org/abs/2404.02040. To appear in TACL.

23.12.2024 22:55 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We also show how softmax-attention transformers can simulate many average-hard attention transformers (including Perez et al's well-known average-hard attention transformer simulating a Turing machine). But it's more difficult than often seems to be assumed!

23.12.2024 22:55 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@dchiang is following 20 prominent accounts