Happy new year! πππ
Let's celebrate 2026 with a bugfix release that implements some fixes, brings some documentation improvements and adds a new dataset fetcher:
github.com/skrub-data/s...
@skrub-data.bsky.social
skrub is a Python library to ease preprocessing and feature engineering for tabular machine learning. Our long-term goal is to directly connect database tables to machine learning estimators. https://skrub-data.org https://discord.gg/ABaPnm7fDC
Happy new year! πππ
Let's celebrate 2026 with a bugfix release that implements some fixes, brings some documentation improvements and adds a new dataset fetcher:
github.com/skrub-data/s...
The course covers:
- How to explore and sanitize data with skrub
- How to use the skrub transformers for powerful and reliable feature engineering
- How to put everything together in a machine learning pipeline
Skrub Data Ops are not included (yet).
Do you want to learn how to use skrub like a pro? Then you're in luck!
Inria Academy is providing an introductory course on skrub aimed at IT personnel, engineers, data scientists, and data analysts.
www.inria-academy.fr/formation/sk...
The recording of the talk we did at @pydataparis.bsky.social 2025 is now available on the PyData Youtube channel! π
You can find it here, if you want to check it out π
www.youtube.com/watch?v=k9MN...
Skrub 0.7.0 is here! π
β¨ Main highlights:
- Tune hyperparameter choices with Optuna
- Added support for Pandas 3.0
- Estimators in data ops can now take additional kwargs
16 new contributors helped with this release π₯
Check out the full changelog: github.com/skrub-data/s...
@skrub-data.bsky.social: better data-science primitives for clean code on dataframes
Watch my dotAI talk, it's fun (live coding)!
www.youtube.com/watch?v=bQS4...
skrub really makes it easy to do machine learning with dataframes
Example: skrub-data.org/stable/auto_...
08.10.2025 12:43 β π 0 π 0 π¬ 1 π 0For even more control over column selection, skrub provides a collection of selectors that let you partition dataframes by data type, column name, or user-specified functions.
08.10.2025 12:43 β π 0 π 0 π¬ 1 π 0All these transformers can be concatenated and inserted in a scikit-learn pipeline to build a feature matrix with complex column selection operation, and can be seen as an alternative for the scikit-learn ColumnTransformer.
08.10.2025 12:43 β π 0 π 0 π¬ 1 π 0ApplyToFrame selects columns in the same way, but then uses all of them at the same time as input to the transformer: this is useful for dimensionality reduction.
SelectCols and DropCols can be used as "filtering blocks" in a pipeline.
Skrub includes a powerful set of transformers and selectors that allow to transform columns based on various conditions.
ApplyToCols lets you select a subset of columns in your dataframe, then applies a transformer to each selected column separately.
On vous a déjà dit que Skrub c'est cool ? Et que l'intervention de @riccardocappuzzo.com était très chouette ? Hein, on vous l'a dit ?
skrub-data.org/skrub-materi...
Slides:
skrub-data.org/skrub-materi...
Thanks to @riccardocappuzzo.com , @glemaitre58.bsky.social and Jérôme Dockès for preparing the talk, and mentoring at the sprint!
07.10.2025 14:36 β π 0 π 0 π¬ 1 π 0The sprint was also a big hit, with both new and old contributors working on issues and getting to know the repository.
And to cap it all off, thanks to P16 we have stickers now π
The skrub sticker on the back of a laptop
@pydataparis.bsky.social 2025 is over, and it was a big success!
Our talk was very well received, and we got a lot of great questions, especially about scalability and how to interface with other libraries in production environments.
What a banger is skrub @skrub-data.bsky.social !
Big thumbs up for the sklearn team & the maintainer of this package
π
Less than a week away! The talk will be on Oct 1st at 10.05AM in room Louis Armand 1 - Est.
If you want to contribute to skrub, we will also have a sprint on Thursday.
See you there!
π οΈ Main bugfixes
- Fixed the display of DataOp objects in Google Colab cell outputs.
- Fixed the range from which choose_float and choose_int sample values when log=False and n_steps is None.
- The SkrubLearner used to do a prediction on the train set during fit(), this has been fixed.
π Changes and deprecations
- Ken embeddings are now deprecated.
- The accepted values for the parameter how of .skb.apply() have changed. The new values are "auto", "cols", "frame", and "no_wrap".
- The parameter splitter of .skb.train_test_split() has been renamed split_func.
π New features
- The DataOp.skb.full_report() now displays the time each node took to evaluate.
- The User guide has been reworked and expanded.
β‘ Release 0.6.2 is out β‘
github.com/skrub-data/s...
Reminder: skrub == cool
12.09.2025 13:34 β π 8 π 4 π¬ 0 π 0Here's another example on how to tune ML models with skrub Data Ops: skrub-data.org/stable/auto_...
12.09.2025 12:56 β π 1 π 0 π¬ 1 π 0The plot in the video was created for our EurosciPy 2025 tutorial on forecasting time series: skrub-data.org/EuroSciPy202...
12.09.2025 12:56 β π 1 π 1 π¬ 1 π 0The plot is interactive: you can select a range of results, and it will highlight only the runs within that range, enabling you to refine your search further. It also tracks fit and score times, so you can identify which parameters most impact runtime.
12.09.2025 12:56 β π 0 π 0 π¬ 1 π 0skrub DataOps help you construct complex and extensive hyperparameter search spaces. However, interpreting results from large grids can be challenging.
To address this, skrub generates a parallel coordinate plot that visualizes all runs and the parameters used to achieve specific results.