Gaël Varoquaux's Avatar

Gaël Varoquaux

@gaelvaroquaux.bsky.social

Research & code: Research director @inria ►Data, Health, & Computer science ►Python coder, (co)founder of scikit-learn, joblib, & @probabl.bsky.social ►Sometimes does art photography ►Physics PhD

14,010 Followers  |  213 Following  |  597 Posts  |  Joined: 26.08.2023
Posts Following

Posts by Gaël Varoquaux (@gaelvaroquaux.bsky.social)

Preview
Skore Is Live: Track Your Data Science Skore by Probabl: The collaboration platform for data scientists. Evaluate models, automate reports, and bridge the gap from notebooks to production.

📝 Read the announcement: blog.probabl.ai/skore-is-live
🚀 Try Skore: skore.probabl.ai
👨💻 Explore the code: github.com/probabl-ai/s...
📖Read the docs: docs.skore.probabl.ai

05.03.2026 15:57 — 👍 1    🔁 0    💬 0    📌 0
Post image

Skore's tagline is “track your data science”. We’re relentlessly improving the tools, and we will push them to redefine the data-science experience.

We’re opening early, because at Probabl we believe in transparency.
This is just the beginning🔥

05.03.2026 15:57 — 👍 2    🔁 0    💬 1    📌 0
Post image Post image

Skore complements building blocks such as @scikit-learn.org, helping to track and validate the work.
It comes both as a powerful stand-alone open-source library, which can be used fully offline, and an online platform to track and share months of iterations
docs.skore.probabl.ai/stable/auto_...

05.03.2026 15:57 — 👍 0    🔁 0    💬 1    📌 0
Preview
Skore Is Live: Track Your Data Science Skore by Probabl: The collaboration platform for data scientists. Evaluate models, automate reports, and bridge the gap from notebooks to production.

At probabl, we're focused on improving the data-science stack.

We just made public the skore platform, augmenting the open-source library.
Our dream is that, with skore, a data scientist will not feel overwhelmed by the iterative process of data science, but empowered
blog.probabl.ai/skore-is-live

05.03.2026 15:57 — 👍 12    🔁 2    💬 1    📌 0
Preview
Notes de frais des maires : « Ce n’est pas à vous que je vais les donner » Les maires des cent villes les plus peuplées de France, dont certains sont candidats à leur réélection, sont-ils prêts à rendre leurs notes de frais publiques ? Plus de la moitié n’a pas répondu à la question de « Mediapart », ne respectant ainsi pas la loi. 

Les maires des cent villes les plus peuplées de France, dont certains sont candidats à leur réélection, sont-ils prêts à rendre leurs notes de frais publiques ? Plus de la moitié n’a pas répondu à la question de « Mediapart », ne respectant ainsi pas la loi. 

05.03.2026 06:09 — 👍 100    🔁 67    💬 2    📌 6
Preview
« La liberté académique est une condition de la démocratie elle-même » TRIBUNE. La France doit tirer une leçon de l’expérience américaine : sans statut juridique fort, la liberté académique ne résiste pas longtemps aux chocs politiques, explique, dans une tribune au « Mo...

www.lemonde.fr/idees/articl...

01.03.2026 16:35 — 👍 75    🔁 30    💬 0    📌 1

Ma montre altimètre indique typiquement 1500m.

Peut-être le progrès vient-il du fait que la pression descend plus lentement?

26.02.2026 17:25 — 👍 2    🔁 0    💬 0    📌 0

Ils s'intéressent beaucoup au pouvoir. Ça conduit naturellement à ce type de lectures

26.02.2026 14:36 — 👍 1    🔁 0    💬 0    📌 0

Je ne vois pas pourquoi l'épistémologie est pertinente uniquement pour les sciences expérimentales.

20.02.2026 06:31 — 👍 1    🔁 0    💬 0    📌 0
Post image

TabICLv2 is the next big step in foundation models for tabular data.
Try the package, read the paper.
Enjoy!
Installation: pypi.org/project/tabi...
Preprint: arxiv.org/abs/2602.11139
Open source code: github.com/soda-inria/t...

12.02.2026 13:26 — 👍 5    🔁 1    💬 0    📌 0
Post image

Looking beyond the median cost per sample, compute costs grow non-linearly with the data size.

Figure 6 shows the influence of sample size, revealing the marked benefit of TabICLv2 over TabPFN-2.5, that becomes more and more pronounced with sample size.
And TabICLv2 runs well on CPU.

12.02.2026 13:26 — 👍 3    🔁 0    💬 1    📌 0
Post image

TabICLv2 is state of the art, and fast.

Figure 1 in the paper gives the results on the standard TabArena benchmark, showing that TabICLv2 is the best predictor without needing any hyper-parameter tuning.

12.02.2026 13:26 — 👍 3    🔁 0    💬 1    📌 0
Preview
TabICLv2: A better, faster, scalable, and open tabular foundation model Tabular foundation models, such as TabPFNv2 and TabICL, have recently dethroned gradient-boosted trees at the top of predictive benchmarks, demonstrating the value of in-context learning for tabular d...

See arxiv.org/abs/2602.11139 for the technical information, including benchmarks and the “details” that make TabICLv2 work so well.

12.02.2026 13:26 — 👍 4    🔁 0    💬 1    📌 0
Post image Post image

🎉 Announcing TabICLv2: State-of-the art Table Foundation Model, fast and open source

A breakthrough for tabular ML: better prediction and faster runtime than alternatives, work by Jingang Qu, David Holzmüller @dholzmueller.bsky.social , Marine Le Morvan, and myself 👇

12.02.2026 13:26 — 👍 51    🔁 11    💬 1    📌 2
Using PyTorch (via skorch) in DataOps This example shows how to wrap a PyTorch model with skorch and plug it into a skrub DataOps plan. The main goal here is to show the integration pattern: PyTorch defines the model (an nn.Module), sk...

- A new example has been added to show how skrub Data Ops can be used with pytorch and skorch to solve an image classification task.

skrub-data.org/stable/auto_...

10.02.2026 13:32 — 👍 0    🔁 1    💬 0    📌 0

Main changes:
- The StringEncoder now exposes the vocabulary parameter, allowing it to be passed to the underlying TfidfVectorizer.
- The function compute_ngram_distance has been made private to reduce clutter.
- The repository wheel has been made smaller by removing some benchmarking material.

10.02.2026 13:32 — 👍 1    🔁 1    💬 1    📌 0
Preview
Release Skrub release 0.7.2 · skrub-data/skrub ✨ skrub version 0.7.2 has been released ✨ In this release we squashed more bugs, improved the API reference, and added a new example. Main changes: The StringEncoder now exposes the vocabulary par...

✨ skrub version 0.7.2 has been released ✨

In this release we squashed more bugs, improved the API reference, and added a new example.

github.com/skrub-data/s...

10.02.2026 13:32 — 👍 2    🔁 1    💬 1    📌 0
Tuning DataOps with Optuna This example shows how to use Optuna to tune the hyperparameters of a skrub DataOp. As seen in the previous example, skrub DataOps can contain “choices”, objects created with choose_from(), choose_...

Here is a full example on how to use skrub Data Ops with Optuna

skrub-data.org/stable/auto_...

05.02.2026 08:52 — 👍 0    🔁 1    💬 0    📌 0

At the end, you get a fully-fledged Optuna study to work
with. Of course, that includes support for the Optuna dashboard and access to the Optuna reporting and plotting interfaces.

05.02.2026 08:52 — 👍 0    🔁 1    💬 1    📌 0
Three snippets of python code showing how to use skrub Data Ops with the Optuna optimization library.The first snippet shows a standard randomized search with the Data Ops. The second snippet adds the parameter "backend", which is set to "optuna". The third snippet uses the Optuna visualization API to plot information from the study.

Three snippets of python code showing how to use skrub Data Ops with the Optuna optimization library.The first snippet shows a standard randomized search with the Data Ops. The second snippet adds the parameter "backend", which is set to "optuna". The third snippet uses the Optuna visualization API to plot information from the study.

Did you know that the skrub Data Ops support Optuna as backend to run hyperparameter search?

It's as easy as writing "backend='optuna'": this will set up a default Optuna study (and the TPE sampler) to replace the standard random sampler.

05.02.2026 08:52 — 👍 4    🔁 2    💬 1    📌 0
Post image Post image Post image Post image

On a relevé les compteurs vélo et on n'a pas été déçus 🤗 ! Des cumuls 2025 impressionnants sur les compteurs placés aux 4 coins d'Antony et des augmentations de fréquentation vraiment importantes ! 

Nous comptons bien nous appuyer sur cette affluence pour défendre les intérêts des cyclistes ! 🚴‍♂️🚴‍♀️

01.02.2026 08:10 — 👍 4    🔁 3    💬 1    📌 0

La tristesse de cette cycliste morte écrasée dans mon quartier + l'indécence de la réaction du ministre Tabarot indiquant "le renforcement de la signalisation des angles morts sur les poids-lourds" : un sticker sur un poids lourd n'a jamais empêché personne de se faire écraser !

27.01.2026 21:32 — 👍 51    🔁 15    💬 3    📌 1
A polar bear at a metallica concert

A polar bear at a metallica concert

Debuging a @pola.rs memory leak, metallica version

***

But the memory remains

...

Hashes to hashes
Rust to rust
Fade to black (no, to ruff!)

26.01.2026 18:02 — 👍 10    🔁 0    💬 1    📌 1

I consider such a behavior as pretty low standards on production of science. A clear preference for speed over correctness.

How many other aspects of the paper are like this? Did people vibe code their experiments and not check these?...

24.01.2026 19:29 — 👍 12    🔁 0    💬 2    📌 0

Definitely love "against method".
Cited it in a paper about LLMs and knowledge engineering:
hal.science/hal-05383445...

Sorry, the paper is both in French, and philosophy 😄

23.01.2026 21:34 — 👍 3    🔁 0    💬 0    📌 0

Because you have hard-working open-source developers, with huge amount of professionalism and process, who consolidate reusable building blocks.

True story, cf @scikit-learn.org @pytorch.org @python.org

21.01.2026 19:32 — 👍 10    🔁 0    💬 0    📌 0

Sauf les demandes de sécurité qui vont avec les accès... (la DSI Inria en rajoute d'ailleurs)

18.01.2026 19:29 — 👍 2    🔁 0    💬 1    📌 0
Post image Post image

Les élections municipales approchent. Voici les différents points sur lesquels nous souhaitons que les candidats d'Antony se positionnent.

Il faut que la volonté politique accompagne cette évolution des pratiques à Antony (+34% de participation au Baromètre vélo 2025)

17.01.2026 20:26 — 👍 4    🔁 2    💬 0    📌 0
Post image

Scikit-learn is on the @linuxfoundation.org Open Source Insights board, alongside with many other central projects:
insights.linuxfoundation.org/project/scik...

What an honor to contribute to the open-source tissue that cements the world!

16.01.2026 09:55 — 👍 25    🔁 2    💬 0    📌 0