Excited to present at #EMNLP2025
Really appreciate @elisakreiss.bsky.socialβs kind guidance and encouragement throughout this work π
@bufangao.bsky.social
Psychology PhD student @UChicago π jouisseuse.github.io
Excited to present at #EMNLP2025
Really appreciate @elisakreiss.bsky.socialβs kind guidance and encouragement throughout this work π
π Our results highlight the brittleness of current bias evaluations: small prompt changes can reverse conclusions.
π Paper: arxiv.org/abs/2509.04373
π» Code: github.com/jouisseuse/B...
Pronoun-Specific Shift Probabilities across models. Bars show the mean shift in token probability for each pronoun (he, she, they) across prompt conditions and attributes. Prompts with instructions and gender references increase preference for βtheyβ and decrease preference for βhe,β while βsheβ varies between models.
When prompts contain cues typical of gender bias evaluation setups, models shift pronoun use: fewer βhe,β more βthey.β
This suggests that LLM benchmark behavior may generalize less and less to non-benchmark settings, raising new concerns about ecological validity.
Violin plots of Probability of Pronoun Shift. Models show significant sensitivity to prompt changes: when prompts highlight gender evaluation, pronoun use shifts, with decreased βheβ and increased βtheyβ use.
π¨ New #EMNLP2025 paper!
Do LLMs exhibit distinct behavior when the prompt looks similar to common evaluation prompts? π
We show that prompts that signal bias evaluation can flip the measured bias. See below β¬οΈ