Chantal's Avatar

Chantal

@chantalsh.bsky.social

PhD (in progress) @ Northeastern! NLP 🀝 LLMs she/her

57 Followers  |  60 Following  |  17 Posts  |  Joined: 03.12.2024  |  1.9093

Latest posts by chantalsh.bsky.social on Bluesky

Preview
Learning the Wrong Lessons: Syntactic-Domain Spurious Correlations in Language Models For an LLM to correctly respond to an instruction it must understand both the semantics and the domain (i.e., subject area) of a given task-instruction pair. However, syntax can also convey implicit i...

(4/n)

More info here!

Read our paper: arxiv.org/abs/2509.21155
Paper site: cshaib.github.io/syntax_domai...

Thank you to all my wonderful co-authors; happy to continue chatting about any of this!

24.10.2025 16:23 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

(3/n) Perhaps more strikingly, unintended syntactic-domain correlations can be exploited to bypass model refusals (e.g., OLMo-2-Instruct 7B here)

24.10.2025 16:23 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

(2/n) This has important implications for model generalization and safety! We show that this occurs in instruction-tuned models, and propose an evaluation to test for this type of brittleness.

24.10.2025 16:23 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

(1/n) Models learn to rely on *syntactic templates* (frequent patterns of POS tags) that co-occur with particular domains.

LLMs can inadvertently learn "If I see this syntactic pattern it’s domain X" rather than "If I see this semantic content, do task Y."

24.10.2025 16:23 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

Syntax that spuriously correlates with safe domains can jailbreak LLMs - e.g. below with GPT4o mini

Our paper (co w/ Vinith Suriyakumar) on syntax-domain spurious correlations will appear at #NeurIPS2025 as a ✨spotlight!

+ @marzyehghassemi.bsky.social, @byron.bsky.social, Levent Sagun

24.10.2025 16:23 β€” πŸ‘ 6    πŸ” 4    πŸ’¬ 3    πŸ“Œ 1

(7/7) For more details, please check out our pre-print!

24.09.2025 13:21 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

(6/7) LLMs are terrible at detecting their own slop: GPT-5, Deepseek-V3, and o3-mini rarely assign a label of "slop" (avg. 6% of documents), whereas humans marked 34% of texts as "slop."

24.09.2025 13:21 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1

(5/7) We lack good/reliable automatic text metrics for 3 of the 5 most important slop features: relevance, coherence, and tone. :-(

24.09.2025 13:21 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

(4/7) Different domains have different slop signatures. In news articles, coherence, density, relevance, and tone issues predict slop. In Q&A tasks, it's factuality and structure. Context matters!

24.09.2025 13:21 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

(3/7) Humans can spot "sloppy text", but may have differing thresholds on overall assessments. But our annotators consistently flagged the same problematic passages, suggesting we know it when we see it...

24.09.2025 13:21 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

(2/7) TL;DR: Measuring the construct of slop is difficult! While somewhat subjective and domain-dependent, it boils down to three key factors: information quality, density, and stylistic choices. We introduce a taxonomy for slop.

24.09.2025 13:21 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

"AI slop" seems to be everywhere, but what exactly makes text feel like "slop"?

In our new work (w/ @tuhinchakr.bsky.social, Diego Garcia-Olano, @byron.bsky.social ) we provide a systematic attempt at measuring AI "slop" in text!

arxiv.org/abs/2509.19163

🧡 (1/7)

24.09.2025 13:21 β€” πŸ‘ 31    πŸ” 16    πŸ’¬ 1    πŸ“Œ 1

(5/7) We lack good/reliable automatic text metrics for 3 of the 5 most important slop features: relevance, coherence, and tone. :-(

24.09.2025 13:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

(4/7) Different domains have different slop signatures. In news articles, coherence, density, relevance, and tone issues predict slop. In Q&A tasks, it's factuality and structure. Context matters!

24.09.2025 13:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

(3/7) Humans can spot "sloppy text", but may have differing thresholds on overall assessments. But our annotators consistently flagged the same problematic passages, suggesting we know it when we see it...

24.09.2025 13:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

(2/7) TL;DR: Measuring the construct of slop is difficult! While somewhat subjective and domain-dependent, it boils down to three key factors: information quality, density, and stylistic choices. We introduce a taxonomy for slop.

24.09.2025 13:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Oxford Word of the Year 2024 - Oxford University Press The Oxford Word of the Year 2024 is 'brain rot'. Discover more about the winner, our shortlist, and 20 years of words that reflect the world.

I'm searching for some comp/ling experts to provide a precise definition of β€œslop” as it refers to text (see: corp.oup.com/word-of-the-...)

I put together a google form that should take no longer than 10 minutes to complete: forms.gle/oWxsCScW3dJU...
If you can help, I'd appreciate your input! πŸ™

10.03.2025 20:00 β€” πŸ‘ 10    πŸ” 8    πŸ’¬ 0    πŸ“Œ 0
Preview
Who Taught You That? Tracing Teachers in Model Distillation Model distillation -- using outputs from a large teacher model to teach a small student model -- is a practical means of creating efficient models for a particular task. We ask: Can we identify a stud...

πŸ“’ Can we trace a small distilled model back to its teacher? πŸ€”New work (w/ @chantalsh.bsky.social, @silvioamir.bsky.social & @byron.bsky.social) finds some footprints left by LLMs in distillation! [1/6]

πŸ”— Full paper: arxiv.org/abs/2502.06659

11.02.2025 17:16 β€” πŸ‘ 8    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

@chantalsh is following 20 prominent accounts