Denied a loan, an interview, or an insurance claim by machine learning models? You may be entitled to a list of reasons.
In our latest w @anniewernerfelt.bsky.social @berkustun.bsky.social @friedler.net, we show how existing explanation frameworks fail and present an alternative for recourse
24.04.2025 06:19 β π 16 π 7 π¬ 1 π 1
Weβll be at #ICLR2025, Poster Session 1 β #516!
Come chat if youβre interested in learning more!
This is work done with wonderful collaborators: Yang Liu, @fcalmon.bsky.social, and @berkustun.bsky.social.
19.04.2025 23:04 β π 3 π 0 π¬ 0 π 0
Our algorithm can improve safety and performance by flagging regretful predictions for abstention or data cleaning.
For example, we demonstrate that, by abstaining from prediction using our algorithm, we can reduce mistakes compared to standard approaches:
19.04.2025 23:04 β π 2 π 0 π¬ 1 π 0
We develop a method that trains models over plausible clean datasets to anticipate regretful predictions, helping us spot when a model is unreliable at the individual-level.
19.04.2025 23:04 β π 1 π 0 π¬ 1 π 0
We capture this effect with a simple measure: regret.
Regret is inevitable with label noise, but it can tell us where models silently fail, and how we can guide safer predictions
19.04.2025 23:04 β π 1 π 0 π¬ 1 π 0
This lottery breaks modern ML:
If we canβt tell which predictions are wrong, we canβt improve models, we canβt debug, and we canβt trust them in high-stakes tasks like healthcare.
19.04.2025 23:04 β π 1 π 0 π¬ 1 π 0
We can frame this problem as learning from noisy labels.
Plenty of algorithms have been designed to handle label noise by predicting well on average, but we show how they still fail on specific individuals.
19.04.2025 23:04 β π 1 π 0 π¬ 1 π 0
Many ML models predict labels that donβt reflect what we care about, e.g.:
β Diagnoses from unreliable tests
β Outcomes from noisy electronic health records
In a new paper w/@berkustun, we study how this subjects individuals to a lottery of mistakes.
Paper: bit.ly/3Y673uZ
π§΅π
19.04.2025 23:04 β π 12 π 2 π¬ 1 π 0
Weβll be at #ICLR2025, Poster Session 1 β #516!
Come chat if youβre interested in learning more! This is work done with wonderful collaborators: Yang Liu, @fcalmon.bsky.social, and @berkustun.bsky.social
19.04.2025 22:09 β π 2 π 0 π¬ 0 π 0
Our algorithm can improve safety and performance by flagging regretful predictions for abstention or for data cleaning. For example, we demonstrate how abstaining from prediction on these instances can reduce mistakes compared to standard approaches:
19.04.2025 22:09 β π 0 π 0 π¬ 1 π 0
We develop a method to anticipate regretful predictions by training models over plausible clean datasets.
This helps us spot when a model is unreliable at the individual-level.
19.04.2025 22:09 β π 0 π 0 π¬ 1 π 0
We capture this effect with a simple measure: regret.
Regret is inevitable with label noise -- it tells us where models silently fail, and how we can guide safer predictions.
19.04.2025 22:09 β π 0 π 0 π¬ 1 π 0
This lottery breaks modern ML:
If we canβt tell which predictions are wrong, we canβt improve models, we canβt debug, and we canβt trust them in high-stakes tasks like healthcare.
19.04.2025 22:09 β π 0 π 0 π¬ 1 π 0
We can frame this as learning from noisy labels.
Plenty of algorithms have been designed to handle label noise by predicting well on average β
But we show how they can still fail on specific individuals.
19.04.2025 22:09 β π 0 π 0 π¬ 1 π 0
π§ Key takeaway: Label noise isnβt staticβespecially in time series.
π¬ Come chat with me at #ICLR2025 Poster Session 2!
Shoutout to my amazing colleagues behind this work:
@tomhartvigsen.bsky.social
@berkustun.bsky.social
13.04.2025 17:40 β π 1 π 0 π¬ 0 π 0
π¬ Real-world demo:
We applied our method to stress detection from smartwatches where we have noisy self-reported labels vs. clean physiological measures.
π Our model tracks the true time-varying label noiseβreducing test error over baselines.
13.04.2025 17:40 β π 1 π 0 π¬ 1 π 0
We propose methods to learn this function directly from noisy data.
π₯ Results:
On 4 real-world time series tasks:
β
Temporal methods beat static baselines
β
Our methods better approximate the true noise function
β
They work when the noise function is unknown!
13.04.2025 17:40 β π 1 π 0 π¬ 1 π 0
π We formalize this setting:
A temporal label noise function defines how likely each true label is to be flippedβas a function of time.
Using this function, we propose a new time series loss function that is provably robust to label noise.
13.04.2025 17:40 β π 1 π 0 π¬ 1 π 0
π What is temporal label noise?
In many real-world time series (e.g., wearables, EHRs), label quality fluctuates over time
β‘οΈ Participants fatigue
β‘οΈ Clinicians miss more during busy shifts
β‘οΈ Self-reports drift seasonally
Existing methods assume static noise β they fail here
13.04.2025 17:40 β π 1 π 0 π¬ 1 π 0
Would be great to be added :)
22.12.2024 02:57 β π 1 π 0 π¬ 1 π 0
Psychiatrist, Cognitive-behavioural therapist, Clinical epidemiologist, who does systematic reviews, runs randomized clinical trials, performs meta-epidemiological studies and develops smartphone apps.
Director of the Duke University Center for Computational and Digital Health Innovation. Alfred Winborne Mordecai and Victoria Stover Mordecai Associate Professor of Biomedical Engineering. Research in HPC and digital twins for health.
Faculty at the University of Pennsylvania. Lifelong machine learning and AI for robotics and precision medicine: continual learning, transfer & multi-task learning, deep RL, multimodal ML, and human-AI collaboration. seas.upenn.edu/~eeaton
Manchester Centre for AI FUNdamentals | UoM | Alumn UCL, DeepMind, U Alberta, PUCP | Deep Thinker | Posts/reposts might be non-deep | Carpe espresso β
a mediocre combination of a mediocre AI scientist, a mediocre physicist, a mediocre chemist, a mediocre manager and a mediocre professor.
see more at https://kyunghyuncho.me/
Stanford Professor | Computational Health Economics & Outcomes | Fair Machine Learning | Causality | Statistics | Health Policy | Health Equity
drsherrirose.org
Lab manual: stanfordhpds.github.io/lab_manual
Personal account
Assistant professor of biomedical data science and dermatology at Stanford. AI for healthcare. Associate editor at NEJM AI and the Journal of Investigative Dermatology. Mother of a sassy girl and a baby boy.
Faculty at UC San Diego. Chief Health AI Officer at UC San Diego Health. #rstats. Creator of Tidier.jl #julialang. #GoBlue. Views own.
ML for healthcare and health equity. Assistant Professor at UC Berkeley and UCSF.
https://irenechen.net/
Assistant Professor at Stanford. Trustworthy, deployable ML/NLP for healthcare.
Assistant Professor at UC Berkeley and UCSF.
Machine Learning and AI for Healthcare. https://alaalab.berkeley.edu/
Ph.D. Student @uwstat; Research fellowship @Netflix; visiting researcher @UCJointCPH; M.A. @UCBStatistics - machine learning; calibration; semiparametrics; causal inference.
https://larsvanderlaan.github.io
Director of Responsible AI @GSK | Interested in Responsible AI, AI for healthcare, fairness and causality | Prev. Google DeepMind, Google Health, UCL, Stanford, ULiege | WiML board/volunteer. She/her.
Assistant Professor at Carnegie Mellon. Machine Learning and social impact. https://bryanwilder.github.io/
Health and AI Lab at Stevens Institute of Technology. Causal inference, precision nutrition, diabetes, decision making, and health informatics. http://www.healthailab.org
Medical AI / AI ethics. DE, previously DK. Father of twins, a dog, and a cat.
Research scientist at Google. Previously Stanford Biomedical Informatics. Researching #fairness #equity #robustness #transparency #causality #healthcare
Assistant Prof. of CS at Johns Hopkins
Visiting Scientist at Abridge AI
Causality & Machine Learning in Healthcare
Prev: PhD at MIT, Postdoc at CMU
AI accountability, audits & eval. Keen on participation & practical outcomes. CS PhDing @UCBerkeley.
Assistant prof at UMN CS. Human-centered AI, online communities, risky mental health behaviors. Mom, lifter, nerd, haver of opinions.