I'm in Oslo this week for WATOC, hit me up if you wanna have a coffee ☕
22.06.2025 20:22 — 👍 1 🔁 0 💬 1 📌 0@stevain.bsky.social
theoretical chemist and ml person 日本語おK خلض
I'm in Oslo this week for WATOC, hit me up if you wanna have a coffee ☕
22.06.2025 20:22 — 👍 1 🔁 0 💬 1 📌 09/ Check it yourself:
🔗: github.com/khaledkah/tv...
📄: www.arxiv.org/abs/2502.08598
Thanks to Khaled, Winnie, Oliver, Klaus, and Shin for the cool collaboration as well as @bifold.berlin, TU Berlin, RIKEN, and DeepMind
8/ Takeaway: Exploding TV isn’t needed. Control TV + SNR separately for faster, better sampling. Method generalizes across domains (molecules, images).
12.03.2025 15:55 — 👍 1 🔁 0 💬 1 📌 07/ Why it works? Our empirical analysis shows:
1. Straight trajectories near data (t ≈ 0) are important (see in the inset plot)
2. Broad support of pₜ(𝐱) early on → robust to errors (note how SMLD goes from small to huge range instead of staying the same)
6/ Images: Matches EDM with uniform grid
No fancy time grids like in EDM needed! VP-ISSNR on CIFAR-10/FFHQ ≈ EDM but with fewer hyperparameters!
5/ Molecules in 8 Steps:
VP-ISSNR achieves 74% stability with 8 steps, 95% with 64 (SDE). Beats all baselines!
4/ We propose a new VP schedule 📈:
Exponential inverse sigmoid SNR (ISSNR)→ rapid decay at start/end. Generalizes Optimal Transport Flow Matching.
3/ VP variants improve existing schedules:
Take SMLD/EDM (exploding TV) → force TV=1. Result: +30% stability for molecules with 8 steps
(x-axis is NFE=number of function evals).
2/ Most schedules (like EDM by Karras or SMLD (Song & Ermon) let TV explode (VE=variance exploding).
We show constant TV (variance preserving, VP) + optimized SNR works better (ISSNR)!
(it's a wild table, sorry, but notice our VP variants I circled)
1/ Problem: Diffusion models are slow due to repeated evals but reducing steps hurts quality if the noise schedule isn’t optimal. Other schedules passively adjust variance. Can we do better?
🔑Insight: control Total Variance (TV) and signal-to-noise-ratio (SNR) independently!
We have a new paper on diffusion!📄
Faster diffusion models with total variance/signal-to-noise ratio disentanglement! ⚡️
Our new work shows how to generate stable molecules in sometimes as little 8 steps and match EDM’s image quality with a uniform time grid. 🧵
We are already on Day 3 of the workshop "Density Functional Theory and Artificial Intelligence learning from each other" in sunny CECAM-HQ.
This afternoon, Fang Liu (Emory University) & Michael Herbst (EPFL) will present their talks.
our GPU cluster tonight after the ICML deadline
31.01.2025 17:55 — 👍 4 🔁 0 💬 0 📌 0Figure comparing automatic differentiation (AD) and automatic sparse differentiation (ASD). (a) Given a function f, AD backends return a function computing vector-Jacobian products (VJPs). (b) Standard AD computes Jacobians row-by-row by evaluating VJPs with all standard basis vectors. (c) ASD reduces the number of VJP evaluations by first detecting a sparsity pattern of non-zero values, coloring orthogonal rows in the pattern and simultaneously evaluating VJPs of orthogonal rows. The concepts shown in this figure directly translate to forward-mode, which computes Jacobians column-by-column instead of row-by-row.
You think Jacobian and Hessian matrices are prohibitively expensive to compute on your problem? Our latest preprint with @gdalle.bsky.social might change your mind!
arxiv.org/abs/2501.17737
🧵1/8
We submitted our stuff to the ICML today at lunch and I am so happy about it. It's juicy, it might get rejected, but it has the heart in the right place :]
31.01.2025 14:39 — 👍 3 🔁 0 💬 1 📌 0acab includes the raclette police 💡
25.12.2024 23:44 — 👍 4 🔁 0 💬 0 📌 0i guess their claim would be that it blows up for mysterious NN reasons rather than integrator or time step. 2 fs is a bit of chonky step, i agree, but if it explodes at say 0.1 fs i'd start wondering about the NN more than about the time step
17.12.2024 19:11 — 👍 0 🔁 0 💬 1 📌 0stuff like this, (Ala)_2 at 2 fs or water at 1 fs? at <=0.5 fs they wouldn't explode for curl free forces, i assume?
17.12.2024 16:50 — 👍 0 🔁 0 💬 1 📌 0why was that paper bad? i thought it was more of a benchmark than proposing their own thing anyways?
17.12.2024 16:14 — 👍 1 🔁 0 💬 1 📌 0i raise you to ful midammis oml
26.11.2024 16:46 — 👍 1 🔁 0 💬 0 📌 0A mapping of how Bluesky is becoming the new Scientific Twitter
mikeyoungacademy.dk/bluesky-is-e...
Seems like it refers to DFT. They actually give an intuition from metallurgy in the appendix about simulated annealing. I didn't know it's about removing impurities 🙉
20.11.2024 18:44 — 👍 0 🔁 0 💬 0 📌 0Excellent question. Alas, I can't say i like mcmc (or tmcmc?) as a word either. A string of non-descript names is just ... Eugh
20.11.2024 17:09 — 👍 0 🔁 0 💬 1 📌 0Ah yes, 'annealing', i do it every day and have a super intuitive understanding of what it is. In fact, im annealing right now.
20.11.2024 16:23 — 👍 0 🔁 0 💬 1 📌 0Is "simulated annealing" really the best word for what it describes?
20.11.2024 15:58 — 👍 1 🔁 0 💬 2 📌 0So cool, have fun!
20.11.2024 11:45 — 👍 1 🔁 0 💬 1 📌 0Tihi ^-^
20.11.2024 11:16 — 👍 0 🔁 0 💬 0 📌 0أيوةةة، حلو! أنا بتعلّم سوداني عربي.✨ ماشاءالله،يا يوليا!
20.11.2024 10:34 — 👍 1 🔁 0 💬 1 📌 0The Pillar of Autumn (i know, i know)
19.11.2024 22:40 — 👍 2 🔁 0 💬 0 📌 0Week 2 actively using Bluesky
19.11.2024 16:23 — 👍 229 🔁 15 💬 6 📌 4