Actual problems like AI in space?
www.spacex.com/updates#xai-...
@antonbaumann.bsky.social
Actual problems like AI in space?
www.spacex.com/updates#xai-...
Excited to share our new work on Self-Distillation Policy Optimization (SDPO)!
SDPO is a simple algorithm that turns textual feedback into logit-level learning signals, enabling sample-efficient RL from runtime errors, LLM judges, and even binary feedback.
Preprint: arxiv.org/abs/2601.20802
SDPO enables RL agents to learn from rich feedback (i.e., not only whether an attempt failed, but why it failed, such as error messages). Even without such rich feedback, SDPO can reflect on past attempts and outperform GRPO. SDPO also accelerates solution discovery at test time!
30.01.2026 07:17 โ ๐ 6 ๐ 1 ๐ฌ 0 ๐ 0Training LLMs with verifiable rewards uses 1bit signal per generated response. This hides why the model failed.
Today, we introduce a simple algorithm that enables the model to learn from any rich feedback!
And then turns it into dense supervision.
(1/n)
This has now been accepted at @iclr-conf.bsky.social !
26.01.2026 15:52 โ ๐ 34 ๐ 2 ๐ฌ 2 ๐ 0It's really hard to tell nowadays what is a made-up joke and what is reality.
16.01.2026 09:41 โ ๐ 7 ๐ 2 ๐ฌ 0 ๐ 0The Nobel Prize committee should announce the World Cup winner tomorrow
06.12.2025 04:29 โ ๐ 38765 ๐ 7495 ๐ฌ 510 ๐ 302Super interesting! Will the talk be recorded, or will the slides be available afterward?
06.12.2025 16:09 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0I am hiring a PhD & postdoc to work together with me at KTH on probabilistic machine learning. Both positions are fully funded and part of WASP.
I will be attending @euripsconf.bsky.social, if you are around and want to talk about the positions or what we do at KTH, then ping me and we can meet.
Want to work on Trustworthy AI? ๐
I'm seeking exceptional candidates to apply for the Digital Futures Postdoctoral Fellowship to work with me on Uncertainty Quantification, Bayesian Deep Learning, and Reliability of ML Systems.
The position will be co-advised by Hossein Azizpour or Henrik Bostrรถm.
Unfortunately, our submission to #NeurIPS didnโt go through with (5,4,4,3). But because I think itโs an excellent paper, I decided to share it anyway.
We show how to efficiently apply Bayesian learning in VLMs, improve calibration, and do active learning. Cool stuff!
๐ arxiv.org/abs/2412.06014
I'm very excited to share notes on Probabilistic AI that I have been writing with @arkrause.bsky.social ๐ฅณ
arxiv.org/pdf/2502.05244
These notes aim to give a graduate-level introduction to probabilistic ML + sequential decision-making.
I'm super glad to be able to share them with all of you now!
Tomorrow Iโll be presenting our recent work on improving LLMs via local transductive learning in the FITML workshop at NeurIPS.
Join us for our โจoralโจ at 10:30am in east exhibition hall A.
Joint work with my fantastic collaborators Sascha Bongni, @idoh.bsky.social, @arkrause.bsky.social
I will present โ๏ธ BDU workshop papers @ NeurIPS: one by Rui Li (looking for internships) and one by Anton Baumann.
๐ to extended versions:
1. ๐ "How can we make predictions in BDL efficiently?" ๐ arxiv.org/abs/2411.18425
2. ๐ "How can we do prob. active learning in VLMs" ๐ arxiv.org/abs/2412.06014