Nicholas M. Boffi nmboffi - Bluesky Statics

thanks for this -- it's a very useful resource. it occupies a slightly different niche than i'm looking for, though. what if you want to train on CIFAR-10, ImageNet 64x64, etc., so the latent space isn't needed? are there any standard UNet implementations around in jax, like huggingface diffusers?

18.02.2025 14:18 — 👍 1 🔁 0 💬 1 📌 0

Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models Consistency models (CMs) are a powerful class of diffusion-based generative models optimized for fast sampling. Most existing CMs are trained using discretized timesteps, which introduce additional hy...

this paper is a pretty impressive tour de force in neural network training: arxiv.org/abs/2410.11081

pretty inspiring to me -- network isn't converging? rigorously monitor every term in your loss to identify where in the architecture something is going wrong!

13.02.2025 12:52 — 👍 9 🔁 1 💬 0 📌 0

Emergence of collective oscillations in massive human crowds - Nature Analysis of the confined crowds at the San Fermín festival in Spain shows that dense crowds can self-organize into macroscopic chiral oscillators, coordinating the orbital motion of hundreds of indivi...

Oscillations naturally emerge in large crowds. That's because your brain works by oscillating.
Emergence of collective oscillations in massive human crowds
www.nature.com/articles/s41...
#neuroscience

10.02.2025 18:23 — 👍 72 🔁 14 💬 3 📌 4

PS: i was on Vikas's team at brain back in the day, so nice to interact with you @froystig.bsky.social! (7/7)

05.02.2025 15:15 — 👍 3 🔁 0 💬 1 📌 0

but certainly interested in tpus too.

anyways, thanks for listening! i really think it would help the ai + scientific computing community to have models implemented. jax is just so much easier for almost all tasks except actually using SoTA models, in my opinion. (6/n)

05.02.2025 15:15 — 👍 1 🔁 0 💬 1 📌 0

note that these comments on graph networks also apply to applications of generative models to molecular sciences, which is now a huge area.

i mostly use gpus, but have played with tpus through google research cloud. i'd probably prioritize gpus (they're mostly what's available in academia) (5/n)

05.02.2025 15:14 — 👍 2 🔁 0 💬 1 📌 0

applications of diffusion-based techniques to particle systems in statistical physics and quantum mechanics, where graph networks are very natural. jraph is not nearly as mature as pytorch geometric, which makes it hard to use jax. otherwise, jax is clearly much better for the entire workflow. (4/n)

05.02.2025 15:13 — 👍 3 🔁 0 💬 1 📌 0

code

this meant we couldn't really use jax even for cifar-10, and had to switch to pytorch. personally i release all my code in jax (nmboffi.github.io/code/) so this was a big disappointment to me.

another example is graph neural networks. there's jraph, but it's very "roll your own". i work on (3/n)

05.02.2025 15:12 — 👍 1 🔁 0 💬 1 📌 0

or the huggingface diffusers implementation of a u-net. these are often used in large-scale experiments by people at meta, in some of yang song's works, etc. when i was playing with the jax implementation of these models by huggingface last year, the same model parameters would perform worse. (2/n)

05.02.2025 15:11 — 👍 1 🔁 0 💬 1 📌 0

i work mostly on generative models and scientific applications right now, so for me, good implementations of things like u-nets, transformers, etc. would help a lot. for example, if you want to train a diffusion model in pytorch, there's the lucidrains implementation, (1/n)

05.02.2025 15:10 — 👍 2 🔁 0 💬 1 📌 0

i absolutely love jax, but an issue that often comes up in practice is the lack of available pre-implemented models (SoTA UNets, transformers, etc).

is there any plan for google to release a package with model implementations? their absence seems to be the dominant issue for scaling jax in research

05.02.2025 00:49 — 👍 2 🔁 0 💬 1 📌 0

it's a bummer because i much prefer jax, but writing your own everything is not always a viable option

03.02.2025 13:48 — 👍 0 🔁 0 💬 0 📌 0

what's the status on using jax for image experiments (e.g. with diffusion models)? it seems like standard packages like huggingface diffusers have much less robust implementations of the same neural networks than the pytorch counterpart?

03.02.2025 13:48 — 👍 0 🔁 0 💬 2 📌 0

very nice post!

01.02.2025 15:37 — 👍 1 🔁 0 💬 0 📌 0

here the l_2 norm picks up a \sqrt{d} dependence while the infinity norm does not. this is where choosing the mirror map to be the entropy comes into play, recovering exponential weights methods

29.01.2025 13:16 — 👍 1 🔁 0 💬 0 📌 0

the point is that it doesn't change the 1/t convergence rate of gradient descent but it can change the dimensionality dependence. the canonical example is if you take a problem where the gradients tend to be the same in each component (such as problems over the probability simplex)

29.01.2025 13:16 — 👍 1 🔁 0 💬 1 📌 0

isn't the basic idea that you want the bregman divergence to be strongly convex with respect to a norm \Vert\cdot\Vert such that the gradients are bounded in the corresponding dual norm? see below for a slide from my phd thesis -- this is covered in the cited textbook by Nemirovsky and Yudin

29.01.2025 13:14 — 👍 1 🔁 0 💬 2 📌 0

📣 Excited to receive the 💥 NSF CAREER 💥 Award. Our group is looking for PhD students and postdocs interested in the Nonlinear Dynamics and the Physics of Living Systems, with support from NSF, HFSP, and other sources. We’d appreciate your help in spreading the word! @ucsdphysci.bsky.social

16.01.2025 21:08 — 👍 31 🔁 16 💬 1 📌 1

how do these techniques, which approach the HJB equation directly, compare to more traditional RL algorithms? can they be used in RL pipelines, such as for problems in robotics? can RL algorithms be flipped on their head and used to solve classes of high-d PDEs? (3/n)

16.01.2025 14:07 — 👍 0 🔁 0 💬 0 📌 0

we know high-d HJB equations characterize solutions to optimal control problems, and this is precisely what RL aims to solve. so, RL must be implicitly approximating the solution to a high-dimensional HJB equation. (2/n)

16.01.2025 14:06 — 👍 0 🔁 0 💬 1 📌 0

in the spirit of using this platform for scientific discussion, i'll post a question i've been wondering about that may or may not be very well formulated

techniques like the above method can be used, in principle, to solve high-dimensional HJB equations (1/)

16.01.2025 14:05 — 👍 0 🔁 0 💬 1 📌 0

Deep Picard Iteration for High-Dimensional Nonlinear PDEs We present the Deep Picard Iteration (DPI) method, a new deep learning approach for solving high-dimensional partial differential equations (PDEs). The core innovation of DPI lies in its use of Picard...

some really nice new work by @jiequnh.bsky.social and collaborators arxiv.org/abs/2409.08526

of course i'm super biased, but i think that figuring out how to solve high-dimensional scientific computing problems with ML has potentially very high impact. i'd love to see more work like this

16.01.2025 14:03 — 👍 3 🔁 0 💬 1 📌 1

mark is a hero -- extremely kind, unpretentious individual too.

15.01.2025 18:44 — 👍 3 🔁 1 💬 0 📌 0

this was what i thought as well; morally speaking you could say learning surrogate models for the "ground-truth" MD sampler?

i've read your recent ITO papers to try to learn more about this: what's the right assumption on the data? do we have it, or do we need to sample given U but with no data?

10.01.2025 13:50 — 👍 0 🔁 0 💬 1 📌 0

what are the most important open problems in molecular simulation right now that stand to benefit from ML-based methods? any good reviews or references to get up to speed rapidly? @gcorso.bsky.social @hannes-stark.bsky.social?

10.01.2025 13:07 — 👍 6 🔁 0 💬 1 📌 0

i'd guess newton's method for root-finding might be a good choice here, as the convergence theory relies on bounds on the derivatives, though of course the convergence conditions are pretty non-prescriptive so you'll just have to try it. could also combine with an initial bisection warm start.

09.01.2025 14:26 — 👍 1 🔁 0 💬 0 📌 0

your thesis was very far ahead of its time, much deserved :)

08.01.2025 15:14 — 👍 1 🔁 0 💬 0 📌 0

wow, at least a couple of hours per day! that's quite impressive.

i've recently tried to initiate a 1.5 hour per day reading block at night, because the benefits do seem hard to ignore. but there is always a psychological pull back to just writing more papers or code...

08.01.2025 13:25 — 👍 1 🔁 0 💬 1 📌 0

there's a lot of interest right now in using diffusion and flow matching models for sampling (i.e., no data but access to the energy). is anyone aware of works using diffusion models or score-based approaches for data assimilation? seems like it could be a natural evolution of kalman filter ideas.

07.01.2025 13:22 — 👍 1 🔁 0 💬 0 📌 0

i guess at the end of the day what i'm describing is basically a classic exploitation-exploration tradeoff, and the optimal solution is probably some kind of directed reading with a bit of randomness sprinkled in.

06.01.2025 13:08 — 👍 5 🔁 0 💬 1 📌 0

Posts by Nicholas M. Boffi (@nmboffi.bsky.social)