Bufan Gao's Avatar

Bufan Gao

@bufangao.bsky.social

Psychology PhD student @UChicago πŸ”— jouisseuse.github.io

5 Followers  |  3 Following  |  4 Posts  |  Joined: 09.09.2025  |  1.5298

Latest posts by bufangao.bsky.social on Bluesky

Excited to present at #EMNLP2025

Really appreciate @elisakreiss.bsky.social’s kind guidance and encouragement throughout this work πŸ™

11.09.2025 16:01 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Measuring Bias or Measuring the Task: Understanding the Brittle Nature of LLM Gender Biases As LLMs are increasingly applied in socially impactful settings, concerns about gender bias have prompted growing efforts both to measure and mitigate such bias. These efforts often rely on evaluation...

πŸ‘‰ Our results highlight the brittleness of current bias evaluations: small prompt changes can reverse conclusions.

πŸ“„ Paper: arxiv.org/abs/2509.04373
πŸ’» Code: github.com/jouisseuse/B...

11.09.2025 16:01 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Pronoun-Specific Shift Probabilities across models. Bars show the mean shift in token probability for each pronoun (he, she, they) across prompt conditions and attributes. Prompts with instructions and gender references increase preference for β€œthey” and decrease preference for β€œhe,” while β€œshe” varies between models.

Pronoun-Specific Shift Probabilities across models. Bars show the mean shift in token probability for each pronoun (he, she, they) across prompt conditions and attributes. Prompts with instructions and gender references increase preference for β€œthey” and decrease preference for β€œhe,” while β€œshe” varies between models.

When prompts contain cues typical of gender bias evaluation setups, models shift pronoun use: fewer β€œhe,” more β€œthey.”

This suggests that LLM benchmark behavior may generalize less and less to non-benchmark settings, raising new concerns about ecological validity.

11.09.2025 16:01 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Violin plots of Probability of Pronoun Shift. Models show significant sensitivity to prompt changes: when prompts highlight gender evaluation, pronoun use shifts, with decreased β€œhe” and increased β€œthey” use.

Violin plots of Probability of Pronoun Shift. Models show significant sensitivity to prompt changes: when prompts highlight gender evaluation, pronoun use shifts, with decreased β€œhe” and increased β€œthey” use.

🚨 New #EMNLP2025 paper!

Do LLMs exhibit distinct behavior when the prompt looks similar to common evaluation prompts? πŸ‘€

We show that prompts that signal bias evaluation can flip the measured bias. See below ⬇️

11.09.2025 16:01 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 2

@bufangao is following 3 prominent accounts