Imbalanced classification: pitfalls and solutions — Probabilistic calibration of cost-sensitive learning
Today at #EuroScipy2025, @glemaitre58.bsky.social and I presented a tutorial on pitfalls of machine learning for imbalanced classification problems.
We discussed what (not) to do when fitting a classifier and obtaining degenerate precision or recall values.
probabl-ai.github.io/calibration-...
19.08.2025 11:58 — 👍 23 🔁 10 💬 1 📌 0
✨️💥skrub: machine learning with dataframes
New release 💫 0.6
A huge one, with the super powerful new "DataOps", and many improvements all over the library.
Exciting!!
24.07.2025 16:16 — 👍 16 🔁 4 💬 0 📌 0
Python software engineer for tslearn
Offre d'emploi Inria
Come work with us on tslearn in beautiful Rennes!
(deadline for application is soon!)
jobs.inria.fr/public/class...
20.02.2025 09:59 — 👍 4 🔁 5 💬 1 📌 0
Open source software: how to live long and go far
An opinionated guide to building open-source software tools
with a focus on Python and science
A talk that I gave when I was stepping down as a lead…
Just put on line a talk I gave summarizing what I have learned across the years as a maintainer of open source.
It's _opinions_ (been there, done that), but I'm willing to defend them, having stewarded my share of successful open source projects.
speakerdeck.com/gaelvaroquau...
06.02.2025 20:31 — 👍 53 🔁 12 💬 3 📌 0
Our first flagship feature is the `EstimatorReport`. You feed it your scikit-learn compatible estimator and your dataset, and it displays a helper with metrics and plots to help you investigate your estimator. Computed for you in one-line of code. Blazing fast thanks to caching. Check out our docs!
23.01.2025 15:49 — 👍 1 🔁 0 💬 0 📌 0
YouTube video by scikit-learn
scikit-learn Version 1.6.0 Release Highlights
❄️ The Christmas release is here! ❄️
Introducing scikit-learn 1.6 with:
🟢 2 major features & 34 improvements
🔵 5 efficiency boosts & 21 enhancements
🟡 14 API changes
🔴 30 fixes
👥 160 amazing contributors
youtu.be/7wiHChpwJe8
20.12.2024 09:44 — 👍 64 🔁 22 💬 1 📌 1
This year, there are 16 positions at CNRS in computer science (8 in "applied" domains → ask me - 8 on "fundamental" domains → ask the other David).
@mathurinmassias.bsky.social has a good list of advice mathurinm.github.io/cnrs_inria_a...
Official 🔗 www.ins2i.cnrs.fr/en/cnrsinfo/...
Don't wait!
23.11.2024 19:33 — 👍 32 🔁 18 💬 2 📌 1
Sometimes you think you are right by doing everything "by the book." But sometimes the book is just a tiny part of the full story. Keep digging and writing a new chapter with more insights is actually fun...
05.12.2024 10:15 — 👍 1 🔁 1 💬 0 📌 0
🎉⚡️Release 0.4:
◼ Easily use deep learning for text entries
◼ TableVectorizer can remove columns with too many missing values
◼ TableReport more robust and prettier
...
1/5
27.11.2024 20:46 — 👍 11 🔁 4 💬 1 📌 0
A high-level summary diagram taken from the slides linked below. It shows the interplay of two main components: a probabilistic model and decision maker or planner.
Probabilistic predictions of an underfitting polynomial classifier on a noisy XOR task and the corresponding under-confident calibration curve.
Probabilistic predictions of an overfitting polynomial classifier and the resulting overconfident calibration curve on the same noisy XOR problem.
Simulation study to show the relative lack of stability of hyperparameter tuning when using hard metrics such as Accuracy or soft yet not probabilistic metrics such as ROC AUC compared to a strictly proper scoring rule such as the log-loss.
I recently shared some of my reflections on how to use probabilistic classifiers for optimal decision-making under uncertainty at @pydataparis.bsky.social 2024.
Here is the recording of the presentation:
www.youtube.com/watch?v=-gYn...
27.11.2024 14:17 — 👍 49 🔁 19 💬 1 📌 1
PhD student sup. by Frank Hutter; researching automated machine learning and foundation models for (small) tabular data!
Website: https://ml.informatik.uni-freiburg.de/profile/purucker/
Journaliste aux Echos || Passionné d'histoire des mondes musulmans, de vélo et de podcasts || 📧 alelievre@lesechos.fr
Research engineer at Inria Saclay, working on the Skrub library.
Python, data preparation, ML, tabular learning.
ORCID: 0000-0002-4448-2959
Hoshiyomi ☄️
https://www.riccardocappuzzo.com
https://github.com/rcap107
PhD student at MIT💻 I love Computer Graphics🎨 Previously at SNU, ETH Zurich, Nvidia, Inria and École polytechnique🔙 I run, cycle and 🍻 during my free time.
Updates on community and events, such as the TRL workshop at NeurIPS 2024 and ACL 2025.
Info: https://table-representation-learning.github.io/
Building the future of neural interfaces from EMG signals at Meta Reality Labs. Ex Research director at Inria, scikit-learn co-founder, mne-python creator.
I have launched Excel once
AI for Science, deep generative models, inverse problems. Professor of AI and deep learning @universitedeliege.bsky.social. Previously @CERN, @nyuniversity. https://glouppe.github.io
Official Twitter account of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases.
https://ecmlpkdd.org/
[bridged from https://blog.neurips.cc/ on the web: https://fed.brid.gy/web/blog.neurips.cc ]
Science integrity consultant and crowdfunded volunteer, PhD.
Ex-Stanford University. Maddox Prize/Einstein F Award winner
NL/USA/SFO.
#ImageForensics
@MicrobiomDigest on X.
Blog: ScienceIntegrityDigest.com
Support me: https://www.patreon.com/elisabethbik
🇫🇷 Professor Institut Polytechnique de Paris
I write books 📚 Waiting for Robots, University of Chicago Press (2025)📚 and documentaries 🎥In the Belly of AI (2025)🎥
Chief Technology Officer, Lila Sciences - Professor @ Harvard
Founding Editor NEJM AI, Co-host AI Grand Rounds 🎙️, Co-founder Generate Biomedicines, Inc
International Conference on Learning Representations https://iclr.cc/
Director, @stanforddel.bsky.social
Professor Stanford Institute for Human-centered AI, SIEPR, Stanford department of Economics and GSB
Author https://amazon.com/Second-Machine-Age-Prosperity-Technologies/dp/0393350649
Official Account for the European Conference on Computer Vision (ECCV) #ECCV2026, Malmo 🇸🇪 Hosted by @jbhaurum and @CSProfKGD
Official account for IEEE/CVF Conference on Computer Vision & Pattern Recognition. Hosted by @CSProfKGD with more to come.
📍🌎 🔗 cvpr.thecvf.com 🎂 June 19, 1983