@gaelvaroquaux.bsky.social
Research & code: Research director @inria ►Data, Health, & Computer science ►Python coder, (co)founder of scikit-learn, joblib, & @probabl.bsky.social ►Sometimes does art photography ►Physics PhD
Ma montre altimètre indique typiquement 1500m.
Peut-être le progrès vient-il du fait que la pression descend plus lentement?
Ils s'intéressent beaucoup au pouvoir. Ça conduit naturellement à ce type de lectures
26.02.2026 14:36 — 👍 1 🔁 0 💬 0 📌 0Je ne vois pas pourquoi l'épistémologie est pertinente uniquement pour les sciences expérimentales.
20.02.2026 06:31 — 👍 1 🔁 0 💬 0 📌 0
TabICLv2 is the next big step in foundation models for tabular data.
Try the package, read the paper.
Enjoy!
Installation: pypi.org/project/tabi...
Preprint: arxiv.org/abs/2602.11139
Open source code: github.com/soda-inria/t...
Looking beyond the median cost per sample, compute costs grow non-linearly with the data size.
Figure 6 shows the influence of sample size, revealing the marked benefit of TabICLv2 over TabPFN-2.5, that becomes more and more pronounced with sample size.
And TabICLv2 runs well on CPU.
TabICLv2 is state of the art, and fast.
Figure 1 in the paper gives the results on the standard TabArena benchmark, showing that TabICLv2 is the best predictor without needing any hyper-parameter tuning.
See arxiv.org/abs/2602.11139 for the technical information, including benchmarks and the “details” that make TabICLv2 work so well.
12.02.2026 13:26 — 👍 4 🔁 0 💬 1 📌 0
🎉 Announcing TabICLv2: State-of-the art Table Foundation Model, fast and open source
A breakthrough for tabular ML: better prediction and faster runtime than alternatives, work by Jingang Qu, David Holzmüller @dholzmueller.bsky.social , Marine Le Morvan, and myself 👇
- A new example has been added to show how skrub Data Ops can be used with pytorch and skorch to solve an image classification task.
skrub-data.org/stable/auto_...
Main changes:
- The StringEncoder now exposes the vocabulary parameter, allowing it to be passed to the underlying TfidfVectorizer.
- The function compute_ngram_distance has been made private to reduce clutter.
- The repository wheel has been made smaller by removing some benchmarking material.
✨ skrub version 0.7.2 has been released ✨
In this release we squashed more bugs, improved the API reference, and added a new example.
github.com/skrub-data/s...
Here is a full example on how to use skrub Data Ops with Optuna
skrub-data.org/stable/auto_...
At the end, you get a fully-fledged Optuna study to work
with. Of course, that includes support for the Optuna dashboard and access to the Optuna reporting and plotting interfaces.
Three snippets of python code showing how to use skrub Data Ops with the Optuna optimization library.The first snippet shows a standard randomized search with the Data Ops. The second snippet adds the parameter "backend", which is set to "optuna". The third snippet uses the Optuna visualization API to plot information from the study.
Did you know that the skrub Data Ops support Optuna as backend to run hyperparameter search?
It's as easy as writing "backend='optuna'": this will set up a default Optuna study (and the TPE sampler) to replace the standard random sampler.
On a relevé les compteurs vélo et on n'a pas été déçus 🤗 ! Des cumuls 2025 impressionnants sur les compteurs placés aux 4 coins d'Antony et des augmentations de fréquentation vraiment importantes !
Nous comptons bien nous appuyer sur cette affluence pour défendre les intérêts des cyclistes ! 🚴♂️🚴♀️
La tristesse de cette cycliste morte écrasée dans mon quartier + l'indécence de la réaction du ministre Tabarot indiquant "le renforcement de la signalisation des angles morts sur les poids-lourds" : un sticker sur un poids lourd n'a jamais empêché personne de se faire écraser !
27.01.2026 21:32 — 👍 51 🔁 15 💬 3 📌 1A polar bear at a metallica concert
Debuging a @pola.rs memory leak, metallica version
***
But the memory remains
...
Hashes to hashes
Rust to rust
Fade to black (no, to ruff!)
I consider such a behavior as pretty low standards on production of science. A clear preference for speed over correctness.
How many other aspects of the paper are like this? Did people vibe code their experiments and not check these?...
Definitely love "against method".
Cited it in a paper about LLMs and knowledge engineering:
hal.science/hal-05383445...
Sorry, the paper is both in French, and philosophy 😄
Because you have hard-working open-source developers, with huge amount of professionalism and process, who consolidate reusable building blocks.
True story, cf @scikit-learn.org @pytorch.org @python.org
Sauf les demandes de sécurité qui vont avec les accès... (la DSI Inria en rajoute d'ailleurs)
18.01.2026 19:29 — 👍 2 🔁 0 💬 1 📌 0
Les élections municipales approchent. Voici les différents points sur lesquels nous souhaitons que les candidats d'Antony se positionnent.
Il faut que la volonté politique accompagne cette évolution des pratiques à Antony (+34% de participation au Baromètre vélo 2025)
Scikit-learn is on the @linuxfoundation.org Open Source Insights board, alongside with many other central projects:
insights.linuxfoundation.org/project/scik...
What an honor to contribute to the open-source tissue that cements the world!
Vu, le "Robert Robichet" 😄
10.01.2026 13:57 — 👍 0 🔁 0 💬 0 📌 0
Oui, mais comme les règles des ZRRs sont décalés avec les besoins de la recherche, on va les violer en permanence (sans même s'en rendre compte), par exemple en laissant rentrer dans nos bureaux des gens qu'on ne devrait pas laisser rentrer.
Cela nous sera reproché
David Mauger, tête de liste du collectif Antony Terre Citoyenne, vous présente ses meilleurs vœux pour l’année 2026.
09.01.2026 17:12 — 👍 3 🔁 2 💬 0 📌 0