Happy to share that Iโm presenting 3 research projects at AIES 2025 ๐
1๏ธโฃGender bias over-representation in AI bias research ๐ซ
2๏ธโฃStable Diffusion's skin tone bias ๐ง๐ป๐ง๐ฝ๐ง๐ฟ
3๏ธโฃLimitations of human oversight in AI hiring ๐ค๐ค
Let's chat if youโre at AIES or read below/reach out for details!
#AIES25 #AcademicSky
21.10.2025 11:38 โ ๐ 9 ๐ 2 ๐ฌ 1 ๐ 0
Work done with amazing collaborators ๐
@andyliu.bsky.social @devanshrjain.bsky.social @taylor-sorensen.bsky.social @atoosakz.bsky.social @aylincaliskan.bsky.social @monadiab77.bsky.social @maartensap.bsky.social
14.10.2025 15:59 โ ๐ 5 ๐ 1 ๐ฌ 0 ๐ 0
๐จCurrent RMs may systematically favor certain cultural/stylistic perspectives. EVALUESTEER enables measuring this steerability gap. By controlling values and styles independently, we isolate where models fail due to biases and inability to identify/steer to diverse preferences.
14.10.2025 15:59 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Finding 3: All RMs exhibit style-over-substance bias. In value-style conflict scenarios:
โข Models choose style-aligned responses 57-73% of the time
โข Persists even with explicit instructions to prioritize values
โข Consistent across all model sizes and types
14.10.2025 15:59 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Finding 2: The RMs we tested generally show intrinsic value and style-biased preferences for:
โข Secular over traditional values
โข Self-expression over survival values
โข Verbose, confident, and formal/cold language
14.10.2025 15:59 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
Finding 1: Even the best RMs struggle to identify which profile aspects matter for a given prompt query. GPT-4.1-Mini and Gemini-2.5-Flash have ~75% accuracy with full user profile context, while having >99% in the Oracle setting (only relevant info provided).
14.10.2025 15:59 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
We generate pairs where responses differ only on value alignment or only on style, or when value and style preferences conflict between responses. This lets us isolate whether models can identify and adapt to the relevant dimension for each prompt despite facing confounds.
14.10.2025 15:59 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
We need controlled variation of both values AND styles to test RM steerability.
We generate 165,888 synthetic preference pairs with profiles that systematically vary:
โข 4 value dimensions from the World Values Survey
โข 4 style dimensions (verbosity, confidence, warmth, reading difficulty)
14.10.2025 15:59 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Benchmarks like RewardBench test general RM performance in an aggregate sense. The PRISM benchmark has diverse human preferences but lacks ground-truth value/style labels for controlled evaluation.
arxiv.org/abs/2403.13787
arxiv.org/abs/2404.16019
14.10.2025 15:59 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
LLMs serve users with different values (traditional vs secular, survival vs self-expression) and style preferences (verbosity, confidence, warmth, reading difficulty). As a result, we need RMs that can adapt to individual preferences, not just optimize for an "average" user.
14.10.2025 15:59 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
๐จNew paper: Reward Models (RMs) are used to align LLMs, but can they be steered toward user-specific value/style preferences?
With EVALUESTEER, we find even the best RMs we tested exhibit their own value/style biases, and are unable to align with a user >25% of the time. ๐งต
14.10.2025 15:59 โ ๐ 12 ๐ 7 ๐ฌ 1 ๐ 0
๐จNew Paper: LLM developers aim to align models with values like helpfulness or harmlessness. But when these conflict, which values do models choose to support? We introduce ConflictScope, a fully-automated evaluation pipeline that reveals how models rank values under conflict.
(๐ท xkcd)
02.10.2025 16:04 โ ๐ 15 ๐ 4 ๐ฌ 1 ๐ 3
Honored to be promoted to Associate Professor at the University of Washington! Grateful to my brilliant mentees, students, collaborators, mentors & @techpolicylab.bsky.social for advancing research in AI & Ethics togetherโand for the invaluable academic freedom to keep shaping trustworthy AI.
16.09.2025 03:20 โ ๐ 13 ๐ 3 ๐ฌ 3 ๐ 0
Intrinsic Bias is Predicted by Pretraining Data and Correlates with Downstream Performance in Vision-Language Encoders
Kshitish Ghate, Isaac Slaughter, Kyra Wilson, Mona T. Diab, Aylin Caliskan. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: ...
๐ Paper: aclanthology.org/2025.naacl-l...
Work done with amazing collaborators
@isaacslaughter.bsky.social,
@kyrawilson.bsky.social, @aylincaliskan.bsky.social, and @monadiab77.bsky.social!
Catch our Oral presentation at Ballroom B, Thursday, May 1st, 14:00-15:30 pm!๐ทโจ
29.04.2025 19:29 โ ๐ 4 ๐ 3 ๐ฌ 0 ๐ 0
Intrinsic Bias is Predicted by Pretraining Data and Correlates with Downstream Performance in Vision-Language Encoders
Kshitish Ghate, Isaac Slaughter, Kyra Wilson, Mona T. Diab, Aylin Caliskan. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: ...
๐ Paper: aclanthology.org/2025.naacl-l...
Work done with amazing collaborators
@isaacslaughter.bsky.social,
@kyrawilson.bsky.social, @aylincaliskan.bsky.social, and @monadiab77.bsky.social!
Catch our Oral presentation at Ballroom B, Thursday, May 1st, 14:00-15:30 pm!๐ทโจ
29.04.2025 19:29 โ ๐ 4 ๐ 3 ๐ฌ 0 ๐ 0
Excited to announce our #NAACL2025 Oral paper! ๐โจ
We carried out the largest systematic study so far to map the links between upstream choices, intrinsic bias, and downstream zero-shot performance across 131 CLIP Vision-language encoders, 26 datasets, and 55 architectures!
29.04.2025 19:11 โ ๐ 21 ๐ 6 ๐ฌ 1 ๐ 0
๐ผ๏ธ โ๏ธ ๐ Modality shifts biases: Cross-modal analysis reveals modality-specific biases, e.g. image-based 'Age/Valence' tests exhibit differences in bias directions; pointing to the need for vision-language alignment, measurement, and mitigation methods.
29.04.2025 19:11 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
๐ Bias and downstream performance are linked: We find that intrinsic biases are consistently correlated with downstream task performance on the VTAB+ benchmark (r โ 0.3โ0.8). Improved performance in CLIP models comes at the cost of skewing stereotypes in particular directions.
29.04.2025 19:11 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
โ ๏ธ What data is "high" quality? Pretraining data curated through automated or heuristic-based data filtering methods to ensure high downstream zero-shot performance (e.g. DFN, Commonpool, Datacomp) tend to exhibit the most bias!
29.04.2025 19:11 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
๐ Data is key: We find that the choice of pre-training dataset is the strongest predictor of associations, over and above architectural variations, dataset size & number of model parameters.
29.04.2025 19:11 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
1. Upstream factors: โHow do dataset, architecture, and size affect intrinsic bias?
2. Performance linkโ: Does better zero-shot accuracy come with more bias?
3. Modality: Do images and text encode prejudice differently?
29.04.2025 19:11 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
We sought to answer some pressing questions on the relationship between bias and model design choices and performance๐
29.04.2025 19:11 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
๐ง Our analysis of intrinsic bias is carried out with a more grounded and improved version of the Embedding Association Tests with controlled stimuli (NRC-VAD, OASIS). We reduced measurement variance by 4.8% and saw ~80% alignment with human stereotypes in 3.4K tests.
29.04.2025 19:11 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
๐จ Key takeaway: Unwanted associations in Vision-language encoders are deeply rooted in the pretraining data and how it is curated and careful reconsideration of these methods is necessary to ensure that fairness concerns are properly addressed.
29.04.2025 19:11 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
Excited to announce our #NAACL2025 Oral paper! ๐โจ
We carried out the largest systematic study so far to map the links between upstream choices, intrinsic bias, and downstream zero-shot performance across 131 CLIP Vision-language encoders, 26 datasets, and 55 architectures!
29.04.2025 19:11 โ ๐ 21 ๐ 6 ๐ฌ 1 ๐ 0
Gender, race, and intersectional bias in AI resume screening via language model retrieval
Kyra Wilson and Aylin Caliskan examine gender, race, and intersectional bias in AI resume screening and suggest protective policies.
๐๏ธ Hot off the press! ๐๏ธ
@aylincaliskan.bsky.social and I wrote a blog post about how to make resume screening with AI more equitable based findings from our work presented at AIES in 2024. Major takeaways โฌ๏ธ (1/6)
www.brookings.edu/articles/gen...
25.04.2025 16:58 โ ๐ 6 ๐ 4 ๐ฌ 1 ๐ 0
Apply - Interfolio
{{$ctrl.$state.data.pageTitle}} - Apply - Interfolio
UWโs @techpolicylab.bsky.social and I invite applications for a 2-year Postdoctoral Researcher position in "AI Alignment with Ethical Principles" focusing on language technologies, societal impact, and tech policy.
Kindly share!
apply.interfolio.com/162834
Priority review deadline: 3/28/2025
19.02.2025 18:55 โ ๐ 10 ๐ 10 ๐ฌ 0 ๐ 0
Looking for all your LTI friends on Bluesky? The LTI Starter Pack is here to help!
go.bsky.app/NhTwCVb
20.11.2024 16:15 โ ๐ 15 ๐ 9 ๐ฌ 6 ๐ 1
One of the world's top schools in information science. Offering degrees in Informatics, Library and Information Science (MLIS), Information Management (MSIM), Museology, and doctorates in information science. Seattle, WA.
https://ischool.uw.edu/
multi-model @ ยฌโ | ex ai safety @LTI, CMU
PhD student at UW iSchool | ai fairness, evaluation, and decision-making | she/her ๐ฅ
kyrawilson.github.io/me
Google Chief Scientist, Gemini Lead. Opinions stated here are my own, not those of Google. Gemini, TensorFlow, MapReduce, Bigtable, Spanner, ML things, ...
PhD Student @ltiatcmu.bsky.social. Working on reasoning, code-gen agents and test-time compute.
UC Berkeley/BAIR, AI2 || Prev: UWNLP, Meta/FAIR || sewonmin.com
Research Scientist @allen_ai & Affiliate Assistant Prof @UW; Researching on LLM alignment, eval, synthetic data, reasoning, agent. Ex: Google, Meta FAIR;
Associate professor at CMU, studying natural language processing and machine learning. Co-founder All Hands AI
The School of Computer Science at Carnegie Mellon University is one of the world's premier institutions for CS and robotics research and education. We build useful stuff that works!
PhD-ing @ LTI, CMU; Intern @ NVIDIA. Doing Reasoning with Gen AI!
PhD Student at Carnegie Mellon University. Interested in the energy implications and impact of machine learning systems.
Prev: Northwestern University, Google, Meta.
PhD student @ LTI, CMU / working on text summarization / prev. at Bloomberg, Amazon, Microsoft Research
apratapa.xyz
MS in NLP (MIIS) @ LTI, CMU
https://dhruv0811.github.io/
Assistant prof at LTI CMU; Research scientist at Meta AI. Working on NLP: language interfaces, applied pragmatics, language-to-code, grounding. https://dpfried.github.io/
PhD in Language Technology @ CMU, working on NLP for Dialects | Formerly @ Cambridge & UC Berkeley
LTI PhD at CMU on evaluation and trustworthy ML/NLP, prev AI&CS Edinburgh University, Google, YouTube, Apple, Netflix. Views are personal ๐ฉ๐ปโ๐ป๐ฎ๐ฉ
athiyadeviyani.github.io