Samuel Müller sammuller - Bluesky Statics

The recent days have been horrific. We can't become numb to repeated instances of illegal and unconstitutional action by government agencies. It's even worse when public officials are blatantly lying in ways that contradict dozens of pieces of video evidence.

09.01.2026 21:40 — 👍 197 🔁 25 💬 5 📌 0

Position: The Future of Bayesian Prediction Is Prior-Fitted Training neural networks on randomly generated artificial datasets yields Bayesian models that capture the prior defined by the dataset-generating distribution. Prior-data Fitted Networks (PFNs) are a...

Check out our position paper and come to our ICML poster (Thursday 4:30 PM, East Exhibition Hall A-B E-606).

arxiv.org/abs/2505.23947 n/n

08.07.2025 20:03 — 👍 0 🔁 0 💬 0 📌 0

Show HN: TabPFN v2 – A SOTA foundation model for small tabular data | Hacker News

There are already early examples of this, that we discuss, in areas as diverse as biology, Bayesian optimization, time-series forecasting, and tabular data. The most prominent being TabPFN (Nature '25). 5/n

news.ycombinator.com/item?id=4264...

08.07.2025 20:03 — 👍 0 🔁 0 💬 1 📌 0

We go into detailed comparisons to other Bayesian methods and the trade-offs that lead us to the conclusion, that PFNs will become dominant for Bayesian prediction, and further that Bayesian prediction will become more important overall with better priors. 4/n

08.07.2025 20:03 — 👍 1 🔁 0 💬 1 📌 0

What's nice is that the model after training on this random data, will start to make sense of real-world data, too. It will approximate the posterior belonging to the prior of choice, e.g., a BNN, a GP, or in the most interesting cases a Bayesian model that doesn't exist yet. 3/n

08.07.2025 20:03 — 👍 1 🔁 0 💬 1 📌 0

Prior-data fitted networks (PFNs) do just that!

The PFN idea is to use a prior, e.g. a bayesian neural network (BNN) prior, sample datasets from that prior, and then train to predict the hold-out labels of these datasets. (no training on real-world data) 2/n

08.07.2025 20:03 — 👍 2 🔁 0 💬 1 📌 0

Compute is increasing much faster than data. How can we improve classical supervised learning long term (the underlying tech of most of GenAI)?

Our ICML position paper's answer: simply train on a bunch of artificial data (noise) and only do inference on real-world data! 1/n

08.07.2025 20:03 — 👍 9 🔁 3 💬 1 📌 0

Foundation Models for Structured Data

I am so proud to co-organize the workshop on foundation models for structured data at ICML. At this workshop, we will discuss on how to further extend the GenAI revolution to tabular data, time series forecasting etc. Consider submitting your work by May 19!
icml-structured-fm-workshop.github.io

28.04.2025 17:16 — 👍 4 🔁 0 💬 0 📌 0

Could it be that @fchollet.bsky.social is not Francois Chollet?? They have a lot of ML followers 😅

25.04.2025 21:25 — 👍 1 🔁 0 💬 2 📌 0

To then change it? In like „overhaul“?

15.04.2025 13:57 — 👍 0 🔁 0 💬 0 📌 0

GitHub - SamuelGabriel/LMARENA-GAMING Contribute to SamuelGabriel/LMARENA-GAMING development by creating an account on GitHub.

Find my full write up (including scenarios with bad actors, as well as the prompts used) plus the game here: github.com/SamuelGabrie...
If you think, my single person experiment is not to be trusted? You are right, try it yourself!

24.02.2025 13:17 — 👍 1 🔁 0 💬 0 📌 0

In combination with the large employee numbers at top AI labs and small numbers of votes on lmarena lead me to the conclusion that lmarena scores are probably dominated by biased votes.

24.02.2025 13:17 — 👍 0 🔁 0 💬 1 📌 0

In hard mode I attributed 13/20 completely correctly, much higher than the expected 3.3 of random guessing.
That is I could identify all 3 models correctly in 13/20 cases after practicing with 20 questions.
That means attributing responses to LLMs is super easy for humans.

24.02.2025 13:17 — 👍 0 🔁 0 💬 1 📌 0

I first played easy mode (see below), where I got two answers from each model and need to match them.
I used 20 interactions in the easy mode to learn the models' behaviors.
In hard mode (see prev post), you need to match three responses to the LLM name.

24.02.2025 13:17 — 👍 0 🔁 0 💬 1 📌 0

Second, employees are very likely able to tell models apart based on their gut feeling.
To figure out if this is the case, I created a game with two modes.
The game is about identifying which answer was provided by which LLM.

24.02.2025 13:17 — 👍 0 🔁 0 💬 1 📌 0

First, AI labs have enough employees to bias the benchmarks.
E.g. Grok 3 only has 10K votes and there are 2.7M votes in total on lmarena.
If half of e.g. OpenAI (2,000 employees) voted just once a day, they would make up > 10% of all 2.7M lmarena votes over its one-year existence.

24.02.2025 13:17 — 👍 0 🔁 0 💬 1 📌 0

Chatbot Arena (formerly LMSYS): Free AI Chat to Compare & Test Best AI Chatbots

I believe lmarena.ai scores are not to be trusted, as the people voting are likely to come from the AI labs in the leaderboard and push their own models unintentionally. A thread 🧵

24.02.2025 13:17 — 👍 1 🔁 1 💬 1 📌 0

How wrong do you think are the lmarena scores? Grok must be very easy to distinguish from other models in a blind evaluation

23.02.2025 10:04 — 👍 0 🔁 0 💬 0 📌 0

seems to beat boosting there, too, but prob a bit early to make definitive statements

07.02.2025 08:43 — 👍 1 🔁 0 💬 0 📌 0

TabPFN for Chemistry, check it out

07.02.2025 08:42 — 👍 3 🔁 0 💬 1 📌 0

What did you think was interesting? The interview had such bad timing, a few days before the r1 launch

26.01.2025 07:12 — 👍 0 🔁 0 💬 1 📌 0

MiniMax-01 takeaways

- 7 of 8 layers are linear att
- implemented a flash-variant of linear attention + ring-att
- post-norm is back in large models! (using deepnorm)
- prob. wrong scaling laws, as lr schedule is not adapted (see Chinchilla)

Let's see how it fares in the arena!

15.01.2025 10:21 — 👍 0 🔁 0 💬 0 📌 0

GitHub - robintibor/R-tabpfn Contribute to robintibor/R-tabpfn development by creating an account on GitHub.

We have an r implementation under development currently. See here github.com/robintibor/R...

11.01.2025 14:10 — 👍 1 🔁 0 💬 1 📌 0

Thank you :) So far, we only open source the model itself and how to use it. We do not open source how to train it exactly, sorry for that :| there is a company starting based on the model, thus it is kinda its mode

09.01.2025 11:53 — 👍 0 🔁 0 💬 1 📌 0

Los modelos preentrenados para datos tabulares (TabPFN) podrían ser el nuevo state of the art para regresión y clasificación. 🙄 Habrá que probarlo. El GitHub al final del hilo. Éste en concreto es enorme y si se comporta como dicen es un gran salto adelante en el state of the art del campo.

09.01.2025 09:59 — 👍 4 🔁 1 💬 0 📌 0

Groundbreaking work, congrats to the team!! 🎉 When I started my PhD 3 years ago, our tabular benchmark showed tree-based models miles ahead of neural networks. On the same benchmark, TabPFN v2 now reaches in 10s what CatBoost achieves in 4h of tuning 🤯

09.01.2025 09:21 — 👍 4 🔁 1 💬 1 📌 0

The tabular foundation model TabPFN v2 is finally public 🎉🥳
This is excellent news for (small) tabular ML! Checkout our Nature article (nature.com/articles/s41...) and code (github.com/PriorLabs/Ta...)

09.01.2025 08:33 — 👍 11 🔁 1 💬 0 📌 0

GitHub - PriorLabs/TabPFN: ⚡ TabPFN: Foundation Model for Tabular Data ⚡ ⚡ TabPFN: Foundation Model for Tabular Data ⚡. Contribute to PriorLabs/TabPFN development by creating an account on GitHub.

Thanks to all of my collaborators, without you this would not have been possible. 🙏🏼🙏🏼 If you want to use our model, check out the following:

Nature article: nature.com/articles/s41...
Try on free cloud: github.com/PriorLabs/ta...
Try locally (gpu recommended): github.com/PriorLabs/Ta...

08.01.2025 18:00 — 👍 4 🔁 0 💬 0 📌 0

The new TabPFN even outperforms Autogluon, which is a tool that mixes the best already existing methods (e.g. boosted trees and random forests). See plot (c) for classification results and (d) for regression results.

08.01.2025 18:00 — 👍 3 🔁 0 💬 1 📌 0

Posts by Samuel Müller (@sammuller.bsky.social)