LMsโ dative alternation preferences come from both direct evidence and more general properties of language. They donโt just memorizeโthey generalize! See the paper for details on animacy too (interestingly more complicated!)
31.03.2025 13:30 โ ๐ 4 ๐ 0 ๐ฌ 1 ๐ 0
LMs' length preference vs. perplexity on validation set. We see that models whose training set manipulation reduces exposure to short-first orderings are the ones which have weaker short-first preference.
Learned length preference changes with the input manipulation. That is, the more โlong-firstโ we make the input, the weaker the short-first preference. We think this shows the dative preferences in models come not just from datives but from general properties of English.
31.03.2025 13:30 โ ๐ 6 ๐ 0 ๐ฌ 1 ๐ 0
For example, โThe primates use tools to eat the green coconuts from the shopโ becomes:
-Short-first: [tools] use [the primates] [[to] eat [[the] [green] coconuts [from the shop]]]
-Long-first: [[[from the shop] [the] coconuts [green]] eat [to]] use [the primates] [tools]
31.03.2025 13:30 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
We think it plausibly comes not from the datives alone but from general properties of English (which is โshort-firstโ). To test that, we manipulate the global structure of the input, creating a corpus where every sentence is short-first and one where theyโre all long-first.
31.03.2025 13:30 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0
DO preference vs. length difference when we remove all datives (left) and all cases with 2 post-verbal arguments (right). The pearson correlation, r is now -0.24 for the no-datives condition, and -0.22 for no cases with 2postverbal arguments.
Now what if we get rid of datives, and further all constructions which have two postverbal arguments? Now we see the length preference is back again. Yes itโs smaller (direct evidence matters), but why is it there? Where does it come from if not the datives?
31.03.2025 13:30 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0
DO preference vs. length difference for the balanced and swapped-datives manipulations. Left: balanced, pearson correlation r = -0.33; right: swapped-datives, pearson correlation r = -0.03.
What if we modify the corpus such that for every DO there is a PO (balance direct evidence)? The preferences are still present! But what if now we SWAP every dative in the input so that every DO is now a PO, every PO a DO? The preference essentially disappears (but not flipped!)
31.03.2025 13:30 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
left: plot showing DO preference vs. Human Judgments โ Pearsonโs r = 0.5; right: plot showing the DO preference as a function of (log) length difference between the recipient and the theme, with pearsonโs r = -0.43, where the negative sign indicates short-first is preferred
To test this, we train small LMs on manipulated datasets where we vary direct (datives) and indirect (non-datives) evidence and test the change in their preferences. First, we see that we get human-like preferences on a model trained on our default BabyLM corpus.
31.03.2025 13:30 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
The English dative preferences come from more general features of the language: short constituents tend to appear earlier all over, not just in the dative. We hypothesize LMs rely on direct evidence from datives but also general word order preferences (e.g. โeasy firstโ) from non-datives.
31.03.2025 13:30 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
examples from direct and prepositional object datives with short-first and long-first word orders:
DO (long first): She gave the boy who signed up for class and was excited it.
PO (short first): She gave it to the boy who signed up for class and was excited.
DO (short first): She gave him the book that everyone was excited to read.
PO (long-first): She gave the book that everyone was excited to read to him.
LMs learn argument-based preferences for dative constructions (preferring recipient first when itโs shorter), consistent with humans. Is this from memorizing preferences in training? New paper w/ @kanishka.bsky.social , @weissweiler.bsky.social , @kmahowald.bsky.social
arxiv.org/abs/2503.20850
31.03.2025 13:30 โ ๐ 18 ๐ 8 ๐ฌ 1 ๐ 6
For example, โThe primates use tools to eat the green coconuts from the shopโ becomes:
- Short-first: [tools] use [the primates] [[to] eat [[the] [green] coconuts [from the shop]]]
- Long-first: [[[from the shop] [the] coconuts [green]] eat [to]] use [the primates] [tools]
31.03.2025 13:14 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
We think it plausibly comes not from the datives alone but from general properties of English (which is โshort-firstโ). To test that, we manipulate the global structure of the input, creating a corpus where every sentence is short-first and one where theyโre all long-first.
31.03.2025 13:14 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
DO preference vs. length difference when we remove all datives (left) and all cases with 2 post-verbal arguments (right). The pearson correlation, r is now -0.24 for the no-datives condition, and -0.22 for no cases with 2postverbal arguments.
Now what if we get rid of datives, and further all constructions which have two postverbal arguments? Now we see the length preference is back again. Yes itโs smaller (direct evidence matters), but why is it there? Where does it come from if not the datives?
31.03.2025 13:14 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
DO preference vs. length difference for the balanced and swapped-datives manipulations. Left: balanced, pearson correlation r = -0.33; right: swapped-datives, pearson correlation r = -0.03.
What if we modify the corpus such that for every DO there is a PO (balance direct evidence)? The preferences are still present! But what if now we SWAP every dative in the input so that every DO is now a PO, every PO a DO? The preference essentially disappears (but not flipped!)
31.03.2025 13:14 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
left: plot showing DO preference vs. Human Judgments โ Pearsonโs r = 0.5; right: plot showing the DO preference as a function of (log) length difference between the recipient and the theme, with pearsonโs r = -0.43, where the negative sign indicates short-first is preferred
To test this, we train small LMs on manipulated datasets where we vary direct (datives) and indirect (non-datives) evidence and test the change in their preferences. First, we see that we get human-like preferences on a model trained on our default BabyLM corpus.
31.03.2025 13:14 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
The English dative preferences come from more general features of the language: short constituents tend to appear earlier all over, not just in the dative. We hypothesize LMs rely on direct evidence from datives but also general word order preferences (e.g. โeasy firstโ) from non-datives.
31.03.2025 13:14 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Asst Prof at Johns Hopkins Cognitive Science โข Director of the Group for Language and Intelligence (GLINT) โจโข Interested in all things language, cognition, and AI
jennhu.github.io
Postdoc at ETH. Formerly, PhD student at the University of Cambridge :)
linguist, experimental work on meaning (lexical semantics), language use, representation, learning, constructionist usage-based approach, Princeton U https://adele.scholar.princeton.edu/publications/topic
Official feed of UCSB linguistics: https://www.linguistics.ucsb.edu/
1st year PhD Student at @gronlp.bsky.social ๐ฎ - University of Groningen
Language Acquisition - NLP
He teaches information science at Cornell. http://mimno.infosci.cornell.edu
Linguaphile, data nerd, ๐ง geek. Subrident problem solving and forays into affective neurolinguistics. ๐โโ๏ธ
๐ฅ LLMs together (co-created model merging, BabyLM, textArena.ai)
๐ฅ Spreading science over hype in #ML & #NLP
Proud shareLM๐ฌ Donor
@IBMResearch & @MIT_CSAIL
computational cognitive science he/him
http://colala.berkeley.edu/people/piantadosi/
Father, Professor and Chair of Linguistics at The University of Texas at Austin, Editor of Language, a journal of the Linguistic Society of America, syntactician, semanticist, guitarist, politics junkie (he/him/his)
Professor of philosophy UTAustin. Philosophical logic, formal epistemology, philosophy of language, Wang Yangming.
www.harveylederman.com
Philosopher at UT Austin who thinks about art, ethics, value. Dog dad. ๐ณ๏ธโ๐ https://robbiekubala.com
CS professor at UT Austin. Large language models and NLP. he/him
Assoc. Professor at UC Berkeley
Artificial and biological intelligence and language
Linguistics Lead at Project CETI ๐ณ
PI Berkeley SC Lab ๐ฃ๏ธ
College Principal of Bowles Hall ๐ฐ
https://www.gasperbegus.com
NYU professor, Google research scientist. Good at LaTeX.
The 2025 Conference on Language Modeling will take place at the Palais des Congrรจs in Montreal, Canada from October 7-10, 2025
Ph.D. student at the University of Texas in Austin. My interest is in NLP, RL and CogSci research focused on reasoning in AI models.
http://honglizhan.github.io
PhD Candidate ๐ค@UTAustin | previously @IBMResearch @sjtu1896 | NLP for social good
PhD @ UT Linguistics
Semantics/Pragmatics/NLP
https://asherz720.github.io/
Prev.@UoEdinburgh @Hanyang