Thanks - super-interesting, and actually very relevant to some other work we're doing as well. Will pass along!
29.11.2025 16:53 β π 2 π 0 π¬ 0 π 0@emmapierson.bsky.social
Assistant professor of CS at UC Berkeley, core faculty in Computational Precision Health. Developing ML methods to study health and inequality. "On the whole, though, I take the side of amazement." https://people.eecs.berkeley.edu/~emmapierson/
Thanks - super-interesting, and actually very relevant to some other work we're doing as well. Will pass along!
29.11.2025 16:53 β π 2 π 0 π¬ 0 π 0We're excited about applications of our test to other datasets that have 1) perceptions of race, gender, etc and 2) multiple observations of the same person.
This work is led by the wonderful Nora Gera, in a great start to her PhD!
Full paper: www.science.org/doi/epdf/10....
See the paper for many robustness checks and discussion of nuances! Our finding persists when using alternate outcomes, statistical models, subsets of the data, and controls satisfying the criteria above.
5/
A benefit of our test is that it doesn't require us to control for all factors legitimately influencing searches. We only have to control for things that influence both searches and perceived race, vary for the same person across stops, and don't themselves suggest bias.
4/
9% of drivers stopped multiple times have inconsistently perceived race across different stops - most perceived as both white + Hispanic.
When perceived as Hispanic, the same driver is likelier to be searched/arrested. This gap is substantial (24% of overall search rate).
3/
Tests for racial bias often compare how two people of different races are treated.
But two people typically differ in many ways besides race.
So instead of comparing two different people, we study the *same person over time*, as perceptions of their race change.
2/
We have a new paper in Science Advances proposing a simple test for bias:
Is the same person treated differently when their race is perceived differently?
Specifically, we study: is the same driver likelier to be searched by police when they are perceived as Hispanic rather than white?
1/
New #NeurIPS2025 paper: how should we evaluate machine learning models without a large, labeled dataset? We introduce Semi-Supervised Model Evaluation (SSME), which uses labeled and unlabeled data to estimate performance! We find SSME is far more accurate than standard methods.
17.10.2025 16:29 β π 22 π 7 π¬ 1 π 4selfishly i wish we could keep divya in our lab forever but i guess it would be a disservice to the rest of the world π sheβs been such a wonderful mentor to meβiβve learned a lot from how thoughtful, creative, and knowledgeable she is about everything. sheβs also super funny and amazing at baking π€
14.10.2025 17:14 β π 6 π 1 π¬ 1 π 0Meeting Divya 5 years ago was one of the biggest strokes of luck in my faculty career - she is a brilliant scientist who has been foundational to so many of our lab's projects, and any institution would be lucky to hire her.
14.10.2025 16:01 β π 6 π 0 π¬ 1 π 0Apply here - aprecruit.berkeley.edu/JPF05028 by 11/15, but review of applications is ongoing so sooner is better! (Application deadline currently says 9/15 but will be extended).
22.08.2025 14:11 β π 3 π 0 π¬ 0 π 0Broad project areas include:
1) language modelling methods for scientific discovery (building on our recent work - arxiv.org/abs/2502.04382)
2) using language models to support equity (ai.nejm.org/doi/full/10....)
both in collaboration with health+social scientists.
2/3
π¨ New postdoc position in our lab at Berkeley EECS! π¨
(please reshare)
We seek applicants with experience in language modeling who are excited about high-impact applications in the health and social sciences!
More info in thread
1/3
π’New POSITION PAPER: Use Sparse Autoencoders to Discover Unknown Concepts, Not to Act on Known Concepts
Despite recent results, SAEs aren't dead! They can still be useful to mech interp, and also much more broadly: across FAccT, computational social science, and ML4H. π§΅
SF fog coming up to swallow us in time lapse.
07.07.2025 03:19 β π 7 π 0 π¬ 0 π 0Honored to win a #CHIL2025 best paper award for our work modeling inequality in disease progression, led by @ericachiang.bsky.social!
To the NIH: health inequality remains a vital topic to support the health of all Americans. As we prove, failing to account for it biases estimates for everyone.
For folks at @facct.bsky.social, our very own @cornellbowers.bsky.social student @emmharv.bsky.social will present the Best-Paper-Award-winning work she led on Wednesday at 10:45 AM in the "Audit and Evaluation Approaches" session!
In the meantime, π§΅ below and π here: arxiv.org/abs/2506.04419 !
assassinations, handcuffing a senator at press conference, marines detaining a civilian, and a military parade for the presidentβs birthday. rough week for democracy.
14.06.2025 15:14 β π 8413 π 2195 π¬ 90 π 51and... here is the actual GIF π
14.06.2025 17:04 β π 3 π 1 π¬ 0 π 0The first paper of @ericachiang.bsky.social's PhD, just accepted at #CHIL2025, proposes a model of disease progression which estimates and accounts for 3 types of health disparities to more accurately measure disease severity. See her full thread below!
01.05.2025 15:53 β π 11 π 0 π¬ 0 π 0Thanks, Megan!! This is kind :) hope youβre doing well.
26.04.2025 22:11 β π 2 π 0 π¬ 0 π 0The US government recently flagged my scientific grant in its "woke DEI database". Many people have asked me what I will do.
My answer today in Nature.
We will not be cowed. We will keep using AI to build a fairer, healthier world.
www.nature.com/articles/d41...
A pleasure to join the Tech Policy Press podcast with @natematias.bsky.social, @geomblog.bsky.social, and @justinhendrix.bsky.social to defend the consensus that AI bias is an important concern.
24.04.2025 16:20 β π 17 π 7 π¬ 0 π 0Lab had dogathon! Seminal dog discoveries ensued.
02.04.2025 15:10 β π 6 π 0 π¬ 0 π 0This work is led by @gsagostini.bsky.social, who gets more excited about geospatial data than anyone I've ever met, and with Rachel Young, Maria Fitzpatrick, and @nkgarg.bsky.social.
Paper: arxiv.org/abs/2503.20989
Website (and data): migrate.tech.cornell.edu
Thread: bsky.app/profile/gsag...
Migration data is critical in the health, environmental, and social sciences.
We're releasing a new dataset, MIGRATE: annual flows between 47 billion pairs of US Census areas. MIGRATE is:
- 4600x more granular than existing public data
- highly correlated with external ground-truth data
1/2
π‘New preprint & Python package: We use sparse autoencoders to generate hypotheses from large text datasets.
Our method, HypotheSAEs, produces interpretable text features that predict a target variable, e.g. features in news headlines that predict engagement. π§΅1/
This work is led by the wonderful @rajmovva.bsky.social and @kennypeng.bsky.social with coauthors @nkgarg.bsky.social and Jon Kleinberg. See Rajβs full thread for details, Python package, and project website!
bsky.app/profile/rajm...
HypotheSAEs outperforms strong LLM baselines, generates new discoveries even on well-studied datasets, and comes with easy-to-use code.
We hope this will be helpful not just to CS folks, but to many in social/health sciences - please reshare to help reach them.
We have a new method, HypotheSAEs, for identifying *interpretable text features that predict a target variable* (aka hypothesis generation).
What features of a headline predict engagement?
What features of a clinical note predict whether a patient will develop cancer?
1/