And even more changes and improvements!
We hope you enjoy the new release, and if you do, don't forget to π the repo π
github.com/skrub-data/s...
@skrub-data.bsky.social
skrub is a Python library to ease preprocessing and feature engineering for tabular machine learning. Our long-term goal is to directly connect database tables to machine learning estimators. https://skrub-data.org https://discord.gg/ABaPnm7fDC
And even more changes and improvements!
We hope you enjoy the new release, and if you do, don't forget to π the repo π
github.com/skrub-data/s...
π’ We changed the default high cardinality encoder: now the StringEncoder is used as high cardinality encoder by default.
skrub-data.org/stable/refer...
βοΈ A global Skrub config has been introduced, which allows to set a number of parameters to customize the behavior of Skrub.
skrub-data.org/stable/refer...
ποΈ DropUninformative is a transformer that uses various heuristics to remove columns that are unlikely to bring information for training a model.
skrub-data.org/dev/referenc...
π The SquashingScaler has been added: it robustly rescales and smoothly clips numerical columns, enabling more robust handling of numerical columns with neural networks.
skrub-data.org/dev/auto_exa...
π οΈ selectors, ApplyToCols and ApplyToFrame are now available, providing utilities for selecting columns to which a transformer should be applied in a flexible way.
skrub-data.org/dev/auto_exa...
π The TableReport has been improved with many new features: series are now supported directly, it is possible to skip generating plots when the number of columns in the dataframe exceeds a user-defined threshold. Columns with high cardinality and sorted columns are now highlighted.
24.07.2025 15:55 β π 0 π 1 π¬ 1 π 0Get started with DataOps in the user guide:
skrub-data.org/dev/userguid...
Form complex DataOps plans to train and tune machine learning models, then export the plans as learners, standalone objects that can be used on new data.
Tune hyperparameters where they're defined, and explore the resulting space with a parallel coordinate plot
π Major feature! Skrub DataOps are a powerful new way of combining dataframe transformations over multiple tables with machine learning pipelines.
24.07.2025 15:55 β π 1 π 1 π¬ 1 π 0π Keep reading for the highlights, or check out the full changelog here: skrub-data.org/stable/CHANG...
24.07.2025 15:55 β π 0 π 1 π¬ 1 π 0π On top of that, we revamped most of the user guide, documentation, and API reference to make it easier to learn how to use the features of Skrub.
skrub-data.org/dev/document...
β‘ Release 0.6.0 is now out! β‘
π Major update! Skrub DataOps, various improvements for the TableReport, new tools for applying transformers to the columns, and a new robust transformer for numerical features are only some of the features included in this release.
π Finally, the DatetimeEncoder can also add periodic features: trigonometric (or circular) features, and b-spline features can be generated directly by setting the specific parameter. 4/4
Docs below!
π’ For more extensive feature engineering, the DatetimeEncoder parses datetimes, then converts each datetime part into a numerical column (hours, minutes, seconds, days etc.). Additional features such as weekdays and time since epoch can also be added. 3/
19.06.2025 12:45 β π 1 π 0 π¬ 1 π 0β±οΈ If you need to convert columns from strings to datetimes, then to_datetime() does that for you. It's also available as a scikit-learn compatible transformer as ToDatetime(). Both objects parse most common formats automatically, but can accept a specific time format if needed. 2/
19.06.2025 12:45 β π 0 π 0 π¬ 1 π 0π The skrub API includes various functions and objects that help with dealing with datetime strings. 1/
19.06.2025 12:45 β π 3 π 1 π¬ 1 π 1πβ‘ Release: 0.5.4:
Maintenance release!
This release makes skrub compatible with scikit-learn 1.7.
Changelog:
skrub-data.org/stable/CHANG...
In the dev docs you will find examples on how to use expressions, like this:
skrub-data.org/dev/auto_exa...
β οΈ As a disclaimer, expressions are still under development and things may change. However, if you're interested in learning more or testing them out, you can do so by checking the dev docs and examples, or by cloning the main branch of the skrub repo.
04.06.2025 12:46 β π 1 π 1 π¬ 1 π 0Finally, results can be shown with a parallel coordinate plot to find out the impact of different hyperparameters on the prediction task.
04.06.2025 12:46 β π 2 π 1 π¬ 1 π 0Once you're happy with the parameter grid, it's possible to either cross-validate it with default values, or run a full randomized or grid search on the parameter grid.
04.06.2025 12:46 β π 1 π 1 π¬ 1 π 0Even better, choices can be nested: an estimator in a choose_from can be defined with its own set of choices, which are then expanded by the library.
04.06.2025 12:46 β π 1 π 1 π¬ 1 π 0With the skrub expressions, it will be possible to build complex hyperparameter grids by composing "choose_" functions: choose from a list of values or estimators, generate a linear or logarithmic distribution of integers or floats, select boolean flags.
04.06.2025 12:46 β π 1 π 1 π¬ 1 π 0In scikit-learn, parameter grids are often built by setting all required parameters in a dictionary, then passing the dictionary to GridSearchCV or RandomizedSearchCV. This process adds a lot of redundant code and may lead to missing some configurations.
04.06.2025 12:46 β π 2 π 1 π¬ 1 π 0π This week's post will be another sneak peek into skrub expressions, an upcoming feature that will ease the preparation and execution of machine learning pipelines on dataframes.
This time we will focus on how expressions can simplify the construction of complex hyperparameter grids.
π Just remember that language models are expensive to run. In the worst case, there's always the skrub StringEncoder π
28.05.2025 08:43 β π 1 π 1 π¬ 1 π 0