๐จNew Paper: LLM developers aim to align models with values like helpfulness or harmlessness. But when these conflict, which values do models choose to support? We introduce ConflictScope, a fully-automated evaluation pipeline that reveals how models rank values under conflict.
(๐ท xkcd)
02.10.2025 16:04 โ ๐ 13 ๐ 4 ๐ฌ 1 ๐ 3
๐When LLMs solve tasks with a mid-to-low resource input or target language, their output quality is poor. We know that. But can we put our finger on what breaks inside the LLM? We introduce the ๐ฅ translation barrier hypothesis ๐ฅ for failed multilingual generation with LLMs. arxiv.org/abs/2506.22724
04.07.2025 17:04 โ ๐ 26 ๐ 7 ๐ฌ 2 ๐ 1
Thrilled to share that this is out in @pnas.org today! ๐
We show that linguistic generalization in language models can be due to underlying analogical mechanisms.
Shoutout to my amazing co-authors @weissweiler.bsky.social, @davidrmortensen.bsky.social, Hinrich Schรผtze, and Janet Pierrehumbert!
09.05.2025 18:29 โ ๐ 37 ๐ 6 ๐ฌ 1 ๐ 2
When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs:
๐งต1/9
09.06.2025 13:47 โ ๐ 70 ๐ 21 ๐ฌ 2 ๐ 2
RL boosts LLM reasoningโbut why stop at math & code? ๐ค
Meet Nemotron-CrossThinkโa method to scale RL-based self-learning across law, physics, social science & more.
๐ฅResulting in a model that reasons broadly, adapts dynamically, & uses 28% fewer tokens for correct answers!
๐งตโ
01.05.2025 17:41 โ ๐ 5 ๐ 3 ๐ฌ 1 ๐ 0
On my way to #NAACL2025 where I'll give a keynote at the noisy text workshop (WNUT), presenting some of the challenges & methods for dialect NLP + also discussing dialect speakers' perspectives!
๐จ๏ธ Beyond โnoisyโ text: How (and why) to process dialect data
๐๏ธ Saturday, May 3, 9:30โ10:30
29.04.2025 09:17 โ ๐ 27 ๐ 7 ๐ฌ 1 ๐ 1
Excited to announce our #NAACL2025 Oral paper! ๐โจ
We carried out the largest systematic study so far to map the links between upstream choices, intrinsic bias, and downstream zero-shot performance across 131 CLIP Vision-language encoders, 26 datasets, and 55 architectures!
29.04.2025 19:11 โ ๐ 21 ๐ 6 ๐ฌ 1 ๐ 0
Can self-supervised models ๐ค understand allophony ๐ฃ? Excited to share my new #NAACL2025 paper: Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment arxiv.org/abs/2502.07029 (1/n)
29.04.2025 17:00 โ ๐ 15 ๐ 10 ๐ฌ 2 ๐ 0
๐ Excited to share a new interp+agents paper: ๐ญ๐ฑ MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools appearing at #NAACL2025
This was work done @msftresearch.bsky.social last summer with Jason Eisner, Justin Svegliato, Ben Van Durme, Yu Su, and Sam Thomson
1/๐งต
29.04.2025 13:41 โ ๐ 12 ๐ 8 ๐ฌ 1 ๐ 2
When interacting with ChatGPT, have you wondered if they would ever "lie" to you? We found that under pressure, LLMs often choose deception. Our new #NAACL2025 paper, "AI-LIEDAR ," reveals models were truthful less than 50% of the time when faced with utility-truthfulness conflicts! ๐คฏ 1/
28.04.2025 20:36 โ ๐ 25 ๐ 9 ๐ฌ 1 ๐ 3
1/๐จ ๐ก๐ฒ๐ ๐ฝ๐ฎ๐ฝ๐ฒ๐ฟ ๐ฎ๐น๐ฒ๐ฟ๐ ๐จ
RAG systems excel on academic benchmarks - but are they robust to variations in linguistic style?
We find RAG systems are brittle. Small shifts in phrasing trigger cascading errors, driven by the complexity of the RAG pipeline ๐งต
17.04.2025 19:55 โ ๐ 9 ๐ 5 ๐ฌ 1 ๐ 2
THIS IS HUGE! Researchers at McMaster University have discovered a NEW peptide antibiotic that targets a broad range of disease-causing bacteria INCLUDING those RESISTANT to existing antibiotics. This discovery marks the first potential new class of antibiotics in NEARLY 30 YEARS. ๐งช๐งตโฌ๏ธ
31.03.2025 16:00 โ ๐ 9353 ๐ 2780 ๐ฌ 227 ๐ 284
CDS building which looks like a jenga tower
Life update: I'm starting as faculty at Boston University
@bucds.bsky.social in 2026! BU has SCHEMES for LM interpretability & analysis, I couldn't be more pumped to join a burgeoning supergroup w/ @najoung.bsky.social @amuuueller.bsky.social. Looking for my first students, so apply and reach out!
27.03.2025 02:24 โ ๐ 245 ๐ 13 ๐ฌ 35 ๐ 6
You should read Article 1 of the United States Constitution. It's a trip.
19.03.2025 04:49 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
There can be only one DB joke. And that is DB.
19.03.2025 04:29 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
Advancing the Database of Cross-Linguistic Colexifications with New Workflows and Data
Lexical resources are crucial for cross-linguistic analysis and can provide new insights into computational models for natural language learning. Here, we present an advanced database for comparative ...
New preprint by @annikatjuka.bsky.social, Robert Forkel, Christoph Rzymski, and myself available, presenting a new version of the Database of Cross-Linguistic Colexifications (CLICS).
"Advancing the Database of Cross-Linguistic Colexifications with New Workflows and Data"
arxiv.org/abs/2503.11377
17.03.2025 10:25 โ ๐ 7 ๐ 3 ๐ฌ 1 ๐ 0
Finally found a way to shorten faculty meetings.
16.03.2025 16:30 โ ๐ 262 ๐ 59 ๐ฌ 18 ๐ 3
No student anywhere in America has said something as antisemitic as this
12.03.2025 18:12 โ ๐ 127 ๐ 22 ๐ฌ 1 ๐ 1
Midwest Speech and Language Days 2025
The meeting will feature keynote addresses by
@mohitbansal.bsky.social, @davidrmortensen.bsky.social, Karen Livescu, and Heng Ji. Plus all of your great talks and posters! nlp.nd.edu/msld25
08.03.2025 18:35 โ ๐ 4 ๐ 1 ๐ฌ 0 ๐ 0
Iโve been thinking about this reading from Isaiah 58 since I heard it at the Ash Wednesday service today.
โIs not this the fast that I choose:
to loose the bonds of injustice,
to undo the thongs of the yoke,
to let the oppressed go free,
and to break every yoke?
06.03.2025 00:16 โ ๐ 193 ๐ 33 ๐ฌ 8 ๐ 3
Screenshot of Arxiv paper title, "Rejected Dialects: Biases Against African American Language in Reward Models," and author list: Joel Mire, Zubin Trivadi Aysola, Daniel Chechelnitsky, Nicholas Deas, Chrysoula Zerva, and Maarten Sap.
Reward models for LMs are meant to align outputs with human preferencesโbut do they accidentally encode dialect biases? ๐ค
Excited to share our paper on biases against African American Language in reward models, accepted to #NAACL2025 Findings! ๐
Paper: arxiv.org/abs/2502.12858 (1/10)
06.03.2025 19:49 โ ๐ 37 ๐ 11 ๐ฌ 1 ๐ 2
I read a paper about search, but I can't quite remember what it's called.
05.03.2025 15:30 โ ๐ 8 ๐ 1 ๐ฌ 1 ๐ 0
Tip of the Tongue Query Elicitation for Simulated Evaluation
Tip-of-the-tongue (TOT) search occurs when a user struggles to recall a specific identifier, such as a document title. While common, existing search systems often fail to effectively support TOT scena...
๐จNew Breakthrough in Tip-of-the-Tongue (TOT) Retrieval Research!
We address data limitations and offer a fresh evaluation method for these complex queries.
Curious how TREC TOT track test queries are created? Check out this thread ๐งต and our paper ๐: arxiv.org/abs/2502.17776
05.03.2025 01:32 โ ๐ 17 ๐ 7 ๐ฌ 2 ๐ 1
everything is so shitty, read this story about a genuinely good man who saw he had an opportunity to save millions of lives and threw himself into doing so. the world is full of heroes like him.
04.03.2025 11:16 โ ๐ 8459 ๐ 1830 ๐ฌ 61 ๐ 19
I humbly put this forward as a possible campaign for the new electric VW
04.03.2025 01:06 โ ๐ 648 ๐ 122 ๐ฌ 10 ๐ 5
Interdisciplinary research in Speech, Language and Hearing Sciences
Submit your work at https://rp.tandfonline.com/submission/create?journalCode=ILOG
Journal homepage https://www.tandfonline.com/journals/ilog20
PhD student @mainlp.bsky.social (@cislmu.bsky.social, LMU Munich). Interested in language variation & change, currently working on NLP for dialects and low-resource languages.
verenablaschke.github.io
Linguist, cognitive scientist at University of Stuttgart. I study language and how we understand it one word at a time.
Second Language Research Forum (SLRF) 2025 was hosted by NAU and held Sept 25-28, 2025, in Flagstaff, AZ. https://sites.google.com/nau.edu/slrf2025/. SLRF 2026 will be hosted by Universitรฉ de Montrรฉal.
Computational psycholinguistics PhD student @NYU lingusitics | first gen!
NLP Graduate Researcher at The University of Tehran #NLProc
PhD student at @gesis.org & @hhu.de, computational linguist, researching (annotation) disagreement and its impact on model behavior.
Postdoc at MIT BCS, interested in language(s) in humans and LMs
https://andrea-de-varda.github.io/
Lawyer for people at Kline & Specter. Teach @Duqklinelaw, โFaith and Democracy.โ Former representative, Marine, prosecutor. Town hall participant.
The real jbouie. Columnist for the New York Times Opinion section. Co-host of the Unclear and Present Danger podcast. b-boy-bouiebaisse on TikTok. jbouienyt on Twitch. National program director of the CHUM Group.
Send me your mutual aid requests.
Computation Cognition Learning
PhD@cmu
Ph.D. in Media & Communication | '24-'25 Carr-Ryan Center Fellow at Harvard Kennedy School | Knight Visiting Nieman Fellow '19 | IVLP Alumnus | IBM Certified AI Developer https://www.linkedin.com/in/emre-kizilkaya #Media #Humanities #AI #Data #Ethics
PhD at EPFL ๐ง ๐ป
Ex @MetaAI, @SonyAI, @Microsoft
Egyptian ๐ช๐ฌ
PhD-Student in the webis.de group. Interested in IR and NLP.
Assistant Prof. in Linguistics at Harvard. P-sider working on rhythm, tone, speech timing, conversation, and gesture, especially in African languages.
O zi ร ?
Just a passionate dev, learning from this community daily.
โจ Sharing the entire journey - bugs, breakthroughs, and banter. ๐
PhD in linguistics. Researching Impoliteness in children's fiction, and metaphors in virtual health discourses (esp. on dementia). ๐ฉท๐๐ they/she