Super hyped that it's finally out!
12.02.2026 14:05 β π 16 π 1 π¬ 2 π 0Super hyped that it's finally out!
12.02.2026 14:05 β π 16 π 1 π¬ 2 π 0It's perfect now. Thanksn
21.01.2026 11:47 β π 2 π 0 π¬ 0 π 0Thanks for sharing the blog post. However it's a bit hard to read the text on a mobile device and one has to zoom and pan around to read it. It would be nice to adopt a reflowing layout that adapts to small screen sizes instead.
24.12.2025 07:24 β π 3 π 0 π¬ 1 π 0
A new version of scikit-learn has been released π₯³ check out the highlights: scikit-learn.org/stable/auto_...
Thanks everyone who contributed to this release!
Let me know what you think of the experimental GPU support
JupyterLab 4.5 and Jupyter Notebook 7.5 are here! π
Highlights π
- Enhanced notebook scrolling behavior
- Native audio and video support
- New Terminal search
- Debugger, Notebook and File Browser improvements
Check out the blog post to learn more!
blog.jupyter.org/jupyterlab-4...
Thanks for sharing. I would be very curious to see if LeJEPA can successfully pretrain good encoders for other input modalities with different kinds of spatial structures and signal smoothness assumptions (audio, time series, signal from robotic sensors, natural language...).
14.11.2025 15:26 β π 0 π 0 π¬ 0 π 0
LeJEPA: a novel pretraining paradigm free of the (many) heuristics we relied on (stop-grad, teacher, ...)
- 60+ arch., up to 2B params
- 10+ datasets
- in-domain training (>DINOv3)
- corr(train loss, test perf)=95%
The Python Software Foundation was recommended for a $1.5M grant from the National Science Foundation. The terms of the award said PSF could not work on DEI, whether or not the grant funding was used for it.
PSF therefore declined the funding.
Science suffers, but commitment to core values remains
β‘ Release 0.6.2 is out β‘
github.com/skrub-data/s...
I will speak about probabilistic regressions, @skrub-data.bsky.social and skore contributors will also present their libraries. Come join us!
26.09.2025 08:55 β π 11 π 3 π¬ 0 π 0More info about free-threading here: py-free-threading.github.io
02.09.2025 16:51 β π 1 π 0 π¬ 0 π 0We set up some dedicated automated tests and discovered a bunch of thread-safety bugs, but they are now tracked by dedicated issues, and we have plans to fix them all, hopefully in time for 1.8.
02.09.2025 16:51 β π 0 π 0 π¬ 1 π 0
scikit-learn 1.8 will be the first scikit-learn release with native extensions that are officially marked as free-threading compatible.
github.com/scikit-learn...
Weβre happy to announce our Social Event, taking place on Tuesday 30th September at 6pm at the CitΓ© des sciences. A perfect opportunity to unwind and connect with fellow attendees after a day of interesting talks!
pydata.org/paris2025/so...
pydata.org/paris2025/ti...
Looking forward to attending PyData Paris 2025! I will give a talk about probabilistic predictions for regression problems (I need to start working on my slides ;)
28.08.2025 07:33 β π 7 π 1 π¬ 0 π 0
π JupyterLab and Jupyter Notebook users:
What's one thing you'd love to see improved in JupyterLab, Jupyter Notebook, or JupyterLite?
The team is prepping the upcoming 4.5/7.5 releases and wants to tackle some usability issues.
Drop your feedback below, this will help prioritize what gets fixed!π
The video recording is already live!
www.youtube.com/live/jvyWTa1...
However, the Elkan 2001 post-hoc prevalence correction can be used for any (well-specified) probabilistic classifier, including gradient boosting classifiers, assuming the training set is a uniform sample of the population conditionally on the class.
19.08.2025 11:58 β π 0 π 0 π¬ 1 π 0Interestingly, for logistic regression, this is equivalent to shifting the intercept by the difference of the logits of the prevalence of the positive class in the population and in the training set distributions, respectively.
19.08.2025 11:58 β π 0 π 0 π¬ 1 π 0Equivalently, we can append a monotonic post-hoc transformation to a naively trained classifier to get a prevalence-corrected classifier as a result as show in Theorem 2 of cseweb.ucsd.edu/~elkan/resca...
19.08.2025 11:58 β π 0 π 0 π¬ 1 π 0In this case, we can use weight-based training to correct the model's probabilistic predictions to stay well calibrated with respect to the target deployment setting.
19.08.2025 11:58 β π 0 π 0 π¬ 1 π 0This problem typically happens when the class of interest (positive class) is so rare (medical screening, predictive maintenance, fraud detection...) that collecting training features for the negative cases in the correct proportion would be too costly (or even illegal/unethical).
19.08.2025 11:58 β π 0 π 0 π¬ 1 π 0Diagram explaining the general flow of data science operations to correct for prevalence shifts. See the linked notebook for an exhaustive description of the setting.
We then discussed another common related problem: how to deal with a prevalence shift between observed data and the deployment setting?
probabl-ai.github.io/calibration-...
If you can, consider defining a business specific cost function and use that to tune the decision threshold automatically for your deployment setting.
We covered that precise setting in an earlier workshop:
probabl-ai.github.io/calibration-...
Instead, you should probably keep the well calibrated model and look at the influence of the decision threshold on your precision-recall trade-off. The default value of the cut-off is 0.5 in scikit-learn, but it's not necessarily meaningful to turn predicted probabilities into operational decisions.
19.08.2025 11:58 β π 0 π 0 π¬ 1 π 0Spoiler: rebalancing the training data is rarely the correct fix. You will break probabilistic calibration and can no longer relate the predicted class probabilities to your deployment setting.
19.08.2025 11:58 β π 0 π 0 π¬ 1 π 0
Today at #EuroScipy2025, @glemaitre58.bsky.social and I presented a tutorial on pitfalls of machine learning for imbalanced classification problems.
We discussed what (not) to do when fitting a classifier and obtaining degenerate precision or recall values.
probabl-ai.github.io/calibration-...
Photo of Riccardo presenting skrub DataOps in a lecture room to an audience of ~50 people.
Attending the @skrub-data.bsky.social tutorial by @riccardocappuzzo.com and @glemaitre58.bsky.social at #EuroScipy2025. They introduce the new DataOps feature released in skrub 0.6.
Here is the repo with the material for the tutorial: github.com/skrub-data/E...
It's an interesting new deep learning architecture that can be somewhat successfully trained to solve challenging reasoning tasks where other methods completely fail.
30.07.2025 16:30 β π 2 π 0 π¬ 0 π 0The paper gives no evidence that it's possible to unsupervised pre-train HRM modules and then do transfer learning on other reasoning tasks.
30.07.2025 16:20 β π 1 π 0 π¬ 1 π 0