Sophie Hao

Sophie Hao

@profsophie.bsky.social

Assistant professor of Linguistics and Data Science at Boston University. NLP, computational linguistics, interpretability, social bias and fairness. she/her. https://www.notaphonologist.com/

256 Followers 329 Following 7 Posts Joined Nov 2024
6 months ago
Preview
Ashima Suvarna🌻 on X: "1/ 🧡 New #EMNLP2025 Paper !! Toxicity detection is subjective; shaped by norms, identity, & context. Existing models and dataset overlook this nuance. Enter MODELCITIZENS: a new dataset designed to address this. βœ”οΈ 6.8K posts, 40K annotations across diverse groups βœ”οΈ" / X 1/ 🧡 New #EMNLP2025 Paper !! Toxicity detection is subjective; shaped by norms, identity, & context. Existing models and dataset overlook this nuance. Enter MODELCITIZENS: a new dataset designed to address this. βœ”οΈ 6.8K posts, 40K annotations across diverse groups βœ”οΈ

(3/3) See full thread on X: x.com/suvarna_ashi...

0 0 0 0
6 months ago

(2/3) Toxicity detection is shaped by norms, identity, & context, which existing approaches overlook. Enter MODELCITIZENS: a new dataset designed to address this.
βœ”οΈ 6.8K posts, 40K annotations across diverse groups
βœ”οΈ Context-augmented scenarios
βœ”οΈ New fine-tuned models that beat GPT-4o-mini by 5.5%

0 0 1 0
6 months ago
Preview
ModelCitizens: Representing Community Voices in Online Safety Automatic toxic language detection is critical for creating safe, inclusive online spaces. However, it is a highly subjective task, with perceptions of toxic language shaped by community norms and liv...

(1/3) Please check out our new paper with @skgabrie.bsky.social and her amazing students, to appear in #EMNLP2025!

(🚨 Offensive Content Warning)

arxiv.org/abs/2507.05455

2 1 1 0
10 months ago

I'm not personally attached to the generative linguistics apparatus per se, but I was asked by the journal to write this paper as a response to another paper, and that paper is primarily opining about the possible "end of (generative) linguistics as we know it."

1 0 1 0
10 months ago

I didn't say that social relevance will guarantee generative linguistics's survival (note that there is a subtle difference between "theoretical" and "generative"), but rather that social irrelevance will likely guarantee its demise.

1 0 1 0
10 months ago

I'm glad you liked it! (I am the author)

There are a couple of points of incommensurability between your reaction and my intentions in writing this piece, which I'll explain below.

1 0 0 0
1 year ago

I keep thinking "Bluesky" is a Slavic patronymic

4 0 0 0
1 year ago
Screenshot of the paper title "What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length"

πŸ’¬ Have you or a loved one compared LM probabilities to human linguistic acceptability judgments? You may be overcompensating for the effect of frequency and length!
🌟 In our new paper, we rethink how we should be controlling for these factors 🧡:

84 19 1 4