Release Skrub release 0.7.0 Β· skrub-data/skrub
Release 0.7.0
β¨ Highlights
Data Ops can now be tuned with Optuna.
It is now possible to pass extra named arguments to an estimator through DataOps.skb.apply.
The TableReport now supports numpy arr...
Skrub 0.7.0 is here! π
β¨ Main highlights:
- Tune hyperparameter choices with Optuna
- Added support for Pandas 3.0
- Estimators in data ops can now take additional kwargs
16 new contributors helped with this release π₯
Check out the full changelog: github.com/skrub-data/s...
12.12.2025 09:55 β π 3 π 0 π¬ 0 π 0
YouTube video by dotconferences
Clean code in Data Science - Gael Varoquaux - Skrub DataOps, Probabl:
@skrub-data.bsky.social: better data-science primitives for clean code on dataframes
Watch my dotAI talk, it's fun (live coding)!
www.youtube.com/watch?v=bQS4...
skrub really makes it easy to do machine learning with dataframes
17.11.2025 17:07 β π 27 π 8 π¬ 0 π 0
For even more control over column selection, skrub provides a collection of selectors that let you partition dataframes by data type, column name, or user-specified functions.
08.10.2025 12:43 β π 0 π 0 π¬ 1 π 0
All these transformers can be concatenated and inserted in a scikit-learn pipeline to build a feature matrix with complex column selection operation, and can be seen as an alternative for the scikit-learn ColumnTransformer.
08.10.2025 12:43 β π 0 π 0 π¬ 1 π 0
ApplyToFrame selects columns in the same way, but then uses all of them at the same time as input to the transformer: this is useful for dimensionality reduction.
SelectCols and DropCols can be used as "filtering blocks" in a pipeline.
08.10.2025 12:43 β π 0 π 0 π¬ 1 π 0
Skrub includes a powerful set of transformers and selectors that allow to transform columns based on various conditions.
ApplyToCols lets you select a subset of columns in your dataframe, then applies a transformer to each selected column separately.
08.10.2025 12:43 β π 3 π 0 π¬ 1 π 0
On vous a déjà dit que Skrub c'est cool ? Et que l'intervention de @riccardocappuzzo.com était très chouette ? Hein, on vous l'a dit ?
skrub-data.org/skrub-materi...
07.10.2025 14:44 β π 3 π 1 π¬ 0 π 0
Thanks to @riccardocappuzzo.com , @glemaitre58.bsky.social and Jérôme Dockès for preparing the talk, and mentoring at the sprint!
07.10.2025 14:36 β π 0 π 0 π¬ 1 π 0
The sprint was also a big hit, with both new and old contributors working on issues and getting to know the repository.
And to cap it all off, thanks to P16 we have stickers now π
07.10.2025 14:36 β π 0 π 0 π¬ 1 π 0
The skrub sticker on the back of a laptop
@pydataparis.bsky.social 2025 is over, and it was a big success!
Our talk was very well received, and we got a lot of great questions, especially about scalability and how to interface with other libraries in production environments.
07.10.2025 14:36 β π 5 π 0 π¬ 1 π 1
What a banger is skrub @skrub-data.bsky.social !
Big thumbs up for the sklearn team & the maintainer of this package
01.10.2025 08:23 β π 14 π 4 π¬ 1 π 0
π
Less than a week away! The talk will be on Oct 1st at 10.05AM in room Louis Armand 1 - Est.
If you want to contribute to skrub, we will also have a sprint on Thursday.
See you there!
26.09.2025 08:50 β π 5 π 1 π¬ 0 π 1
π οΈ Main bugfixes
- Fixed the display of DataOp objects in Google Colab cell outputs.
- Fixed the range from which choose_float and choose_int sample values when log=False and n_steps is None.
- The SkrubLearner used to do a prediction on the train set during fit(), this has been fixed.
26.09.2025 08:48 β π 1 π 1 π¬ 0 π 0
π Changes and deprecations
- Ken embeddings are now deprecated.
- The accepted values for the parameter how of .skb.apply() have changed. The new values are "auto", "cols", "frame", and "no_wrap".
- The parameter splitter of .skb.train_test_split() has been renamed split_func.
26.09.2025 08:48 β π 1 π 1 π¬ 1 π 0
π New features
- The DataOp.skb.full_report() now displays the time each node took to evaluate.
- The User guide has been reworked and expanded.
26.09.2025 08:48 β π 1 π 1 π¬ 1 π 0
Reminder: skrub == cool
12.09.2025 13:34 β π 8 π 4 π¬ 0 π 0
Skrub DataOps applied to forecasting timeseries β Skrub DataOps applied to forecasting timeseries
The plot in the video was created for our EurosciPy 2025 tutorial on forecasting time series: skrub-data.org/EuroSciPy202...
12.09.2025 12:56 β π 1 π 1 π¬ 1 π 0
The plot is interactive: you can select a range of results, and it will highlight only the runs within that range, enabling you to refine your search further. It also tracks fit and score times, so you can identify which parameters most impact runtime.
12.09.2025 12:56 β π 0 π 0 π¬ 1 π 0
skrub DataOps help you construct complex and extensive hyperparameter search spaces. However, interpreting results from large grids can be challenging.
To address this, skrub generates a parallel coordinate plot that visualizes all runs and the parameters used to achieve specific results.
12.09.2025 12:56 β π 6 π 0 π¬ 1 π 1
Do you have to deal with numerical features that involve large outliers, and need to train linear models or neural networks?
Then you might want to try the skrub SquashingScaler. The SquashingScaler behaves like scikit-learn RobustScaler, but smoothly clips outliers to predefined boundaries.
05.09.2025 08:47 β π 1 π 1 π¬ 1 π 0
Our first talk tonight is from @gaelvaroquaux.bsky.social on @skrub-data.bsky.social.
Real tables are too messy for sklearn - skrub preprocesses them for you.
02.09.2025 18:28 β π 8 π 3 π¬ 1 π 1
Had a great PyData London tonight! Was a real treat to hear from @gaelvaroquaux.bsky.social on @skrub-data.bsky.social and the real world data pains its solving. (Try it if you havenβt already; super easy to get going!)
03.09.2025 00:06 β π 3 π 1 π¬ 0 π 0
Computational social sciences @ CREST/ENSAE
Python & Bike
Pyodide is an open-source Python distribution for the browser and Node.js based on WebAssembly/Emscripten.
π Find us at https://pyodide.org
π€ Contribute at https://github.com/pyodide/pyodide
Also, find us on Mastodon @pyodide@fosstodon.org
Multi-language interactive computing environments. Jupyter Notebook, JupyterLab and related projects β @mentions not monitored. Open issues on GitHub.
I'm CommitCanary, your daily(-ish) source for GitHub updates. I use AI to turn commit messages into concise summariesβthough I might occasionally hallucinate!
Dataframes powered by a multithreaded, vectorized query engine, written in Rust.
The PyData Global Conference is where users, contributors, and newcomers can share experiences to learn from one another and grow together. Want to meet the community? Join our Discord!
https://discord.gg/CjspHbE9xe
β community
β data science & open source
β videos: https://youtube.com/@dataumbrella
β events: https://meetup.com/data-umbrella
β news: dataumbrella.substack.com/
Markdown with superpowers for writing scientific π¨βπ¬ and technical πpapers. #OpenSource & maintained by Project #Jupyter.
Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
Machine learning in Python β’ Open Source
https://scikit-learn.org
Online #Python Training & Expert Community: Tutorials, Video Courses, Books, Quizzes...and More! Join 1M+ Pythonistas at http://realpython.com
π«π· Paris hub of the PyData global community!
Join our community meetup and the upcoming conference at CitΓ© des Sciences.
September 30 - October-1 2025 β’ CitΓ© des Sciences
PyData Paris
CFP deadline: April 27th 2025
A non-profit dedicated to helping communities create and share knowledge with open infrastructure for interactive computing. We work in the open, follow along with our blog here π https://2i2c.org/blog
Official account for the ArviZ project. We provide #FOSS tools for exploratory analysis of #Bayesian models in #Python and #JuliaLang
www.arviz.org
Probabilistic Programming and Bayesian Modeling in Python
The Galaxy Project powers global life sciences analytics by turning the worldβs most advanced cyberinfrastructure into a free, browser-based analysis platform for everyone. https://galaxyproject.org/
NumPy & SciPy for GPU.
π https://cupy.dev/
π οΈ https://github.com/cupy/cupy