Picture of a paragraph from Emma by Jane Austen. It reads: "Such an adventure as this, a fine young man and a lovely young woman thrown together in such a way, could hardly fail of suggesting certain ideas to the coldest heart and the steadiest brain. So Emma thought, at least. Could a linguist, could a grammarian, could even a mathematician have seen what she did, have witnessed their appearance together, and heard their history of it, without feeling that circumstances had been at work to make them peculiarly interesting to each other? How much more must an imaginist, like herself, be on fire with speculation and foresight? especially with such a groundwork of anticipation as her mind had already made."
According to Jane Austen, linguists are extraordinarily cold-hearted.
(Though at least we're not as bad as mathematicians!)
03.08.2025 16:10 โ ๐ 7 ๐ 0 ๐ฌ 0 ๐ 0
If all goes well, there will be a paper by you on there soon!
16.07.2025 20:24 โ ๐ 2 ๐ 0 ๐ฌ 2 ๐ 0
Home
So much research is being done about LLMs that it's hard to stay on top of the literature.
To help with this, I've made a list of all the most important papers from the past 8 years:
rtmccoy.com/pubs/
I hope you enjoy!
16.07.2025 16:35 โ ๐ 58 ๐ 5 ๐ฌ 2 ๐ 0
July the 4th be with you!
04.07.2025 14:45 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0
New paper: "Large Language Models and Emergence: A Complex Systems Perspective" (D. Krakauer, J. Krakauer, M. Mitchell).
We look at claims of "emergent capabilities" & "emergent intelligence" in LLMs from the perspective of what emergence means in complexity science.
arxiv.org/pdf/2506.11135
16.06.2025 13:15 โ ๐ 238 ๐ 57 ๐ฌ 6 ๐ 7
The word "laundry" contains both steps of the laundry process:
1. Undry
2. Dry
04.06.2025 19:14 โ ๐ 26 ๐ 2 ๐ฌ 1 ๐ 0
PragLM @ COLM '25
IMPORTANT DATES
Happy to announce the first workshop on Pragmatic Reasoning in Language Models โ PragLM @ COLM 2025! ๐
How do LLMs engage in pragmatic reasoning, and what core pragmatic capacities remain beyond their reach?
๐ sites.google.com/berkeley.edu/praglm/
๐
Submit by June 23rd
28.05.2025 18:21 โ ๐ 40 ๐ 18 ๐ฌ 1 ๐ 4
Had a fun visit to UChicago/TTIC over the past couple days - really great group doing NLP/CompLing there!
24.05.2025 14:44 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0
I've been excited about meta-learning lately, in part because its two levels of optimization provide a way that you can separately model evolution and development. (That said, existing approaches are not very evolutionarily-realistic in how the outer loop of optimization is realized).
22.05.2025 03:50 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0
Takeaways:
๐Bayesian methods & neural networks can work together, and are improved by doing so!
๐Neural networks can have strong priors - despite the common view that they are blank slates
๐Strong inductive biases do not require strong representational constraints
14/n
20.05.2025 19:16 โ ๐ 6 ๐ 0 ๐ฌ 1 ๐ 0
Left: A plot showing recursion results for standard and prior-trained neural networks. The x-axis shows levels of recursion ranging from 0 to 10, and the y-axis shows accuracy. As the levels of recursion increase, the accuracy drops for both models, but it drops much more rapidly for the standard model than the prior-trained model.
Right: A plot showing priming results. There are 4 sub-plots, for 4 types of sentences: short plausible, long plausible, short implausible, and long implausible. In all 4 plots, the prior-trained network shows a greater degree of priming than the standard neural network.
More dramatically, it substantially outperforms the standard neural network at learning recursion (left) and priming (right; a lower value on the y-axis shows a greater degree of priming).
13/n
20.05.2025 19:16 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
A plot showing perplexity values. Note that for perplexity, lower is better. A standard neural network achieves perplexity ranging from about 19.70 to 19.80, with a median around 19.75. A prior-trained neural network achieves perplexity ranging from about 19.63 to 19.74, with a median around 19.67. The best model from prior literature is indicated as having a perplexity of about 19.69.
Here, its perplexity is slightly better (i.e., lower) than that of a standard neural network.
12/n
20.05.2025 19:15 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
Because it has the flexibility of a neural network, the prior-trained model can also learn in a setting that is intractable for the Bayesian model: Learning aspects of English syntax from millions of words of naturalistic text.
11/n
20.05.2025 19:14 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
Plots showing results for formal languages. On the left is a line graph which has โnumber of training examplesโ as its x-axis and โF-scoreโ as its y-axis. Three models have lines in this plot: a Bayesian model, a standard neural network, and a prior-trained neural network. The Bayesian model and prior-trained neural network perform similarly, while the standard neural network does much worse than both of them.
On the right is a table showing the amount of training time used by each approach. The Bayesian model uses from 1 minute to 7 days of training time. The neural networks (whether standard or prior-trained) use from 10 milliseconds to 2.5 minutes.
Even though it is a neural network, the prior-trained model can learn formal languages from small numbers of examples - far outperforming a standard neural network, and matching a Bayesian model at a fraction of the computational cost.
10/n
20.05.2025 19:14 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0
We call the resulting system a *prior-trained neural network*, because it has been trained to have a particular prior.
9/n
20.05.2025 19:13 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
Two examples of formal languages. The first example shown is AnBn, described as โn copies of A followed by n copies of Bโ, with some example strings from the formal language being AB, AABB, and AAABBB.
The second example shown is XXX, described as โany string X repeated three timesโ, with some example strings from the formal language being AAA, BABABA, and ABBABBABB
Inspired by a model from Yang & @spiantado.bsky.social , the prior that we use is a distribution over formal languages (a formal language = a set of strings defined by an abstract rule). We have a neural network meta-learn by observing many formal languages sampled from this prior
8/n
20.05.2025 19:13 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
In MAML, a model is exposed to many tasks. After each task, the model's weights are adjusted so that, if it were taught the same task again, it would perform better. As MAML proceeds, the model converges to a state from which it can learn any task in the distribution.
7/n
20.05.2025 19:12 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
The key component is meta-learning aka โlearning to learnโ - a process in which a model is shown many tasks, giving it priors (inductive biases) that allow it to learn new tasks more easily. The type of meta-learning we use is MAML, from @chelseafinn.bsky.social, Abbeel, Levine
6/n
20.05.2025 19:08 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
A schematic diagram of our procedure. We start with a Bayesian model, here visualized with Bayesโ rule and some example grammatical rules that could be sampled from a Bayesian modelโs prior. Then, we sample several tasks from that Bayesian modelโs prior, which can serve as training data. Finally, we have a neural network meta-learn from these sampled tasks. The whole process is visualized, going from left to right as โBayesian modelโ, then an arrow labeled โsamplingโ, then โtraining dataโ, then an arrow labeled โmeta-learningโ, and finally โneural network.โ
Our approach (inductive bias distillation) has 3 steps:
1. Use a Bayesian model to define an inductive bias (a prior)
2. Sample learning tasks from the Bayesian model
3. Have a neural network meta-learn from these sampled tasks, to give it the Bayesian model's prior
5/n
20.05.2025 19:07 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0
In this work with Tom Griffiths @cocoscilab.bsky.social , we propose an approach for creating a system that has the strengths of both modeling traditions - the rapid learning of a Bayesian model combined with the flexible representations of a neural network.
4/n
20.05.2025 19:06 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
Left: A screenshot of ChatGPT describing itself as an AI language model developed by OpenAI. Right: A bar chart comparing the quantity of text seen by human children vs. GPT-3. The bar for GPT-3 is far higher than for humans, showing that neural networks get far more linguistic data than humans do.
Neural networks have flexible representations that allow them to handle noisy natural data - as evidenced by the success of large language models. However, they notoriously require huge numbers of examples.
3/n
20.05.2025 19:06 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
Screenshot of a demo of Bayesian word learning (Xu & Tenenbaum 2007). After a few examples, the Bayesian learner figures out that naysayer means โhorseโ (rather than being more specific โ โhorse number 4โ โ or more general โ โmammalโ).
Bayesian models can learn from few examples because they have strong inductive biases - factors that guide generalization. But the costs of inference and the difficulty of specifying generative models can make naturalistic data a challenge.
2/n
20.05.2025 19:05 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
A schematic of our method. On the left are shown Bayesian inference (visualized using Bayesโ rule and a portrait of the Reverend Bayes) and neural networks (visualized as a weight matrix). Then, an arrow labeled โmeta-learningโ combines Bayesian inference and neural networks into a โprior-trained neural networkโ, described as a neural network that has the priors of a Bayesian model โ visualized as the same portrait of Reverend Bayes but made out of numbers. Finally, an arrow labeled โlearningโ goes from the prior-trained neural network to two examples of what it can learn: formal languages (visualized with a finite-state automaton) and aspects of English syntax (visualized with a parse tree for the sentence โcolorless green ideas sleep furiouslyโ).
๐ค๐ง Paper out in Nature Communications! ๐ง ๐ค
Bayesian models can learn rapidly. Neural networks can handle messy, naturalistic data. How can we combine these strengths?
Our answer: Use meta-learning to distill Bayesian priors into a neural network!
www.nature.com/articles/s41...
1/n
20.05.2025 19:04 โ ๐ 154 ๐ 43 ๐ฌ 4 ๐ 1
At MIT for the day to speak at the NLP seminar! Say hi if you're around and need a break from NeurIPS drafting!
14.05.2025 14:32 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0
Great to hear! I'm always happy to spread trivia about that city
08.05.2025 17:18 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
Thank you!
08.05.2025 17:17 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
You must be on the same wavelength as me!
08.05.2025 17:17 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
Screenshot of the New York Times crossword page, saying "The Crossword. Wednesday, May 7, 2025. By Tom McCoy. Edited by Will Shortz."
I constructed today's NYT crossword!
This one has some personal connections, described at the WordPlay article by @samcorbin.bsky.social (contains spoilers): www.nytimes.com/2025/05/06/c...
I hope you enjoy!
07.05.2025 17:32 โ ๐ 21 ๐ 1 ๐ฌ 3 ๐ 0
(The NACLO solution rates are from high schoolers who competed in the contest. So it's probably skewed toward puzzle enthusiasts - the average NACLO participant might be better at these problems than the average human. But there are plenty of students who do the contest for fun, without any prep).
03.05.2025 22:28 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
Linguist, Cognitive Scientist, Occasional AI Researcher, Immigrant in NYC, Co-Author w/ Ingeborg Glimmer of 'Why We Fear AI' - out now: https://bookshop.org/a/114797/9781945335174
Cognitive scientist interested in the processing, acquisition and evolution of language; statistical learning; computational modeling.
Lab website: https://csl-lab.psych.cornell.edu
PhD Candidate, Psychological & Brain Sciences, Johns Hopkins University
concepts | language | plasticity | development | neuroscience
NLP Researcher at EleutherAI, PhD UC San Diego Linguistics.
Previously PleIAs, Edinburgh University.
Interested in multilingual NLP, tokenizers, open science.
๐Boston. She/her.
https://catherinearnett.github.io/
Postdoc @vectorinstitute.ai | organizer @queerinai.com | previously MIT, CMU LTI | rodent enthusiast | she/they
https://ryskina.github.io/
Postdoc at the Princeton Neuroscience Institute.
Planning in complex environments, RL and network science.
https://www.aekahn.com
PhD Candidate at Princeton University. Studying how people and machines teach.
Assistant Professor at @cs.ubc.caโฌ and โช@vectorinstitute.aiโฌ working on Natural Language Processing. Book: https://lostinautomatictranslation.com/
Linguistics and cognitive science at Northwestern. Opinions are my own. he/him/his
Postdoc at MIT. Research: language, the brain, NLP.
jmichaelov.com
Studying language in biological brains and artificial ones @MIT.
www.tuckute.com
Cognitive scientist @ UC Merced
http://raryskin.github.io
PI of Language, Interaction, & Cognition (LInC) lab: http://linclab0.github.io
she
Research in NLP (mostly LM interpretability & explainability).
Incoming assistant prof at UMD CS + CLIP.
Current postdoc @ai2.bsky.social & @uwnlp.bsky.social
Views my own.
sarahwie.github.io
Cognitive neuroscientist.
Professor at College de France in Paris.
Head of the NeuroSpin brain imaging facility in Saclay.
President of the Scientific Council of the French national education ministry (CSEN)
Roses are red, rabbits eat kale. Iโm studying for a PhD at Yale.
Social Cognition โข Comparative Cognition โข Theory of Mind
https://amandaroyka.github.io/
Interested in how & what the brain computes. Professor in Neuroscience & Statistics UC Berkeley
linguistiks knitting kids kats ceramiks and everything with /k/ or 'k', I just decided
she/her
Computational linguist @ รrnastofnun