Kshitish Ghate's Avatar

Kshitish Ghate

@kghate.bsky.social

PhD student @ UWCSE; MLT @ CMU-LTI; Responsible AI https://kshitishghate.github.io/

113 Followers  |  186 Following  |  21 Posts  |  Joined: 17.11.2024  |  2.5141

Latest posts by kghate.bsky.social on Bluesky

Happy to share that Iโ€™m presenting 3 research projects at AIES 2025 ๐ŸŽ‰

1๏ธโƒฃGender bias over-representation in AI bias research ๐Ÿ‘ซ
2๏ธโƒฃStable Diffusion's skin tone bias ๐Ÿง‘๐Ÿป๐Ÿง‘๐Ÿฝ๐Ÿง‘๐Ÿฟ
3๏ธโƒฃLimitations of human oversight in AI hiring ๐Ÿ‘ค๐Ÿค–

Let's chat if youโ€™re at AIES or read below/reach out for details!
#AIES25 #AcademicSky

21.10.2025 11:38 โ€” ๐Ÿ‘ 9    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Work done with amazing collaborators ๐Ÿ™
@andyliu.bsky.social @devanshrjain.bsky.social @taylor-sorensen.bsky.social @atoosakz.bsky.social @aylincaliskan.bsky.social @monadiab77.bsky.social @maartensap.bsky.social

14.10.2025 15:59 โ€” ๐Ÿ‘ 5    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
EVALUESTEER: Measuring Reward Model Steerability Towards Values and Preferences As large language models (LLMs) are deployed globally, creating pluralistic systems that can accommodate the diverse preferences and values of users worldwide becomes essential. We introduce EVALUESTE...

For more details about our experiments and findings --
Paper: arxiv.org/abs/2510.06370
Code and Data: github.com/kshitishghat...
Please feel free to reach out if you are interested in this work and would like to chat!

14.10.2025 15:59 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐ŸšจCurrent RMs may systematically favor certain cultural/stylistic perspectives. EVALUESTEER enables measuring this steerability gap. By controlling values and styles independently, we isolate where models fail due to biases and inability to identify/steer to diverse preferences.

14.10.2025 15:59 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Finding 3: All RMs exhibit style-over-substance bias. In value-style conflict scenarios:
โ€ข Models choose style-aligned responses 57-73% of the time
โ€ข Persists even with explicit instructions to prioritize values
โ€ข Consistent across all model sizes and types

14.10.2025 15:59 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Finding 2: The RMs we tested generally show intrinsic value and style-biased preferences for:
โ€ข Secular over traditional values
โ€ข Self-expression over survival values
โ€ข Verbose, confident, and formal/cold language

14.10.2025 15:59 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Finding 1: Even the best RMs struggle to identify which profile aspects matter for a given prompt query. GPT-4.1-Mini and Gemini-2.5-Flash have ~75% accuracy with full user profile context, while having >99% in the Oracle setting (only relevant info provided).

14.10.2025 15:59 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We generate pairs where responses differ only on value alignment or only on style, or when value and style preferences conflict between responses. This lets us isolate whether models can identify and adapt to the relevant dimension for each prompt despite facing confounds.

14.10.2025 15:59 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We need controlled variation of both values AND styles to test RM steerability.
We generate 165,888 synthetic preference pairs with profiles that systematically vary:
โ€ข 4 value dimensions from the World Values Survey
โ€ข 4 style dimensions (verbosity, confidence, warmth, reading difficulty)

14.10.2025 15:59 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Benchmarks like RewardBench test general RM performance in an aggregate sense. The PRISM benchmark has diverse human preferences but lacks ground-truth value/style labels for controlled evaluation.

arxiv.org/abs/2403.13787
arxiv.org/abs/2404.16019

14.10.2025 15:59 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

LLMs serve users with different values (traditional vs secular, survival vs self-expression) and style preferences (verbosity, confidence, warmth, reading difficulty). As a result, we need RMs that can adapt to individual preferences, not just optimize for an "average" user.

14.10.2025 15:59 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐ŸšจNew paper: Reward Models (RMs) are used to align LLMs, but can they be steered toward user-specific value/style preferences?
With EVALUESTEER, we find even the best RMs we tested exhibit their own value/style biases, and are unable to align with a user >25% of the time. ๐Ÿงต

14.10.2025 15:59 โ€” ๐Ÿ‘ 12    ๐Ÿ” 7    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐ŸšจNew Paper: LLM developers aim to align models with values like helpfulness or harmlessness. But when these conflict, which values do models choose to support? We introduce ConflictScope, a fully-automated evaluation pipeline that reveals how models rank values under conflict.
(๐Ÿ“ท xkcd)

02.10.2025 16:04 โ€” ๐Ÿ‘ 15    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 3

Honored to be promoted to Associate Professor at the University of Washington! Grateful to my brilliant mentees, students, collaborators, mentors & @techpolicylab.bsky.social for advancing research in AI & Ethics togetherโ€”and for the invaluable academic freedom to keep shaping trustworthy AI.

16.09.2025 03:20 โ€” ๐Ÿ‘ 13    ๐Ÿ” 3    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0
Preview
Intrinsic Bias is Predicted by Pretraining Data and Correlates with Downstream Performance in Vision-Language Encoders Kshitish Ghate, Isaac Slaughter, Kyra Wilson, Mona T. Diab, Aylin Caliskan. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: ...

๐Ÿ”— Paper: aclanthology.org/2025.naacl-l...

Work done with amazing collaborators
@isaacslaughter.bsky.social,
@kyrawilson.bsky.social, @aylincaliskan.bsky.social, and @monadiab77.bsky.social!

Catch our Oral presentation at Ballroom B, Thursday, May 1st, 14:00-15:30 pm!๐Ÿ“ทโœจ

29.04.2025 19:29 โ€” ๐Ÿ‘ 4    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Intrinsic Bias is Predicted by Pretraining Data and Correlates with Downstream Performance in Vision-Language Encoders Kshitish Ghate, Isaac Slaughter, Kyra Wilson, Mona T. Diab, Aylin Caliskan. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: ...

๐Ÿ”— Paper: aclanthology.org/2025.naacl-l...

Work done with amazing collaborators
@isaacslaughter.bsky.social,
@kyrawilson.bsky.social, @aylincaliskan.bsky.social, and @monadiab77.bsky.social!

Catch our Oral presentation at Ballroom B, Thursday, May 1st, 14:00-15:30 pm!๐Ÿ“ทโœจ

29.04.2025 19:29 โ€” ๐Ÿ‘ 4    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Excited to announce our #NAACL2025 Oral paper! ๐ŸŽ‰โœจ

We carried out the largest systematic study so far to map the links between upstream choices, intrinsic bias, and downstream zero-shot performance across 131 CLIP Vision-language encoders, 26 datasets, and 55 architectures!

29.04.2025 19:11 โ€” ๐Ÿ‘ 21    ๐Ÿ” 6    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐Ÿ–ผ๏ธ โ†”๏ธ ๐Ÿ“ Modality shifts biases: Cross-modal analysis reveals modality-specific biases, e.g. image-based 'Age/Valence' tests exhibit differences in bias directions; pointing to the need for vision-language alignment, measurement, and mitigation methods.

29.04.2025 19:11 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿ“Š Bias and downstream performance are linked: We find that intrinsic biases are consistently correlated with downstream task performance on the VTAB+ benchmark (r โ‰ˆ 0.3โ€“0.8). Improved performance in CLIP models comes at the cost of skewing stereotypes in particular directions.

29.04.2025 19:11 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

โš ๏ธ What data is "high" quality? Pretraining data curated through automated or heuristic-based data filtering methods to ensure high downstream zero-shot performance (e.g. DFN, Commonpool, Datacomp) tend to exhibit the most bias!

29.04.2025 19:11 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿ“Œ Data is key: We find that the choice of pre-training dataset is the strongest predictor of associations, over and above architectural variations, dataset size & number of model parameters.

29.04.2025 19:11 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

1. Upstream factors: โ€ŠHow do dataset, architecture, and size affect intrinsic bias?
2. Performance linkโ€Š: Does better zero-shot accuracy come with more bias?
3. Modality: Do images and text encode prejudice differently?

29.04.2025 19:11 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We sought to answer some pressing questions on the relationship between bias and model design choices and performance๐Ÿ‘‡

29.04.2025 19:11 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐Ÿ”ง Our analysis of intrinsic bias is carried out with a more grounded and improved version of the Embedding Association Tests with controlled stimuli (NRC-VAD, OASIS). We reduced measurement variance by 4.8% and saw ~80% alignment with human stereotypes in 3.4K tests.

29.04.2025 19:11 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐Ÿšจ Key takeaway: Unwanted associations in Vision-language encoders are deeply rooted in the pretraining data and how it is curated and careful reconsideration of these methods is necessary to ensure that fairness concerns are properly addressed.

29.04.2025 19:11 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Excited to announce our #NAACL2025 Oral paper! ๐ŸŽ‰โœจ

We carried out the largest systematic study so far to map the links between upstream choices, intrinsic bias, and downstream zero-shot performance across 131 CLIP Vision-language encoders, 26 datasets, and 55 architectures!

29.04.2025 19:11 โ€” ๐Ÿ‘ 21    ๐Ÿ” 6    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Gender, race, and intersectional bias in AI resume screening via language model retrieval Kyra Wilson and Aylin Caliskan examine gender, race, and intersectional bias in AI resume screening and suggest protective policies.

๐Ÿ—ž๏ธ Hot off the press! ๐Ÿ—ž๏ธ
@aylincaliskan.bsky.social and I wrote a blog post about how to make resume screening with AI more equitable based findings from our work presented at AIES in 2024. Major takeaways โฌ‡๏ธ (1/6)

www.brookings.edu/articles/gen...

25.04.2025 16:58 โ€” ๐Ÿ‘ 6    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Apply - Interfolio {{$ctrl.$state.data.pageTitle}} - Apply - Interfolio

UWโ€™s @techpolicylab.bsky.social and I invite applications for a 2-year Postdoctoral Researcher position in "AI Alignment with Ethical Principles" focusing on language technologies, societal impact, and tech policy.

Kindly share!
apply.interfolio.com/162834
Priority review deadline: 3/28/2025

19.02.2025 18:55 โ€” ๐Ÿ‘ 10    ๐Ÿ” 10    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Looking for all your LTI friends on Bluesky? The LTI Starter Pack is here to help!

go.bsky.app/NhTwCVb

20.11.2024 16:15 โ€” ๐Ÿ‘ 15    ๐Ÿ” 9    ๐Ÿ’ฌ 6    ๐Ÿ“Œ 1

@kghate is following 20 prominent accounts