bskyView

Max Zhdanov

@maxxxzdn.bsky.social

PhD candidate at AMLab with Max Welling and Jan-Willem van de Meent. Research in physics-inspired and geometric deep learning.

528 Followers | 234 Following | 86 Posts | Joined: 17.11.2024 | 2.7062

Latest posts by maxxxzdn.bsky.social on Bluesky

thats a wrap!

you can find the code here: github.com/maxxxzdn/erwin

and here is a cute visualisation of Ball Tree building:

21.06.2025 16:22 — 👍 1 🔁 0 💬 0 📌 0

To scale Erwin further to gigantic industrial datasets, we combine Transolver and Erwin by applying ball tree attention over latent tokens.

The key insight is that by using Erwin, we can afford larger bottleneck sizes while maintaining efficiency.

Stay tuned for updates!

21.06.2025 16:22 — 👍 1 🔁 0 💬 1 📌 0

To improve the processing of long-range information while keeping the cost sub-quadratic, we combine the ball tree with Native Sparse Attention (NSA), which align naturally with each other.

Accepted to Long-Context Foundation Models at ICML 2025!

paper: arxiv.org/abs/2506.12541

21.06.2025 16:22 — 👍 2 🔁 0 💬 1 📌 0

There are many possible directions for building upon this work. I will highlight two on which I have been working together with my students.

21.06.2025 16:22 — 👍 1 🔁 0 💬 1 📌 0

On top of the initial experiments in the first version of the paper, we include two more PDE-related benchmarks, where Erwin shows strong performance, achieving state-of-the-art on multiple tasks.

21.06.2025 16:22 — 👍 1 🔁 0 💬 1 📌 0

Original experiments: cosmology, molecular dynamics and turbulent fluid dynamics (EAGLE).

21.06.2025 16:22 — 👍 2 🔁 0 💬 1 📌 0

Ball tree representation comes with a huge advantage of contiguous memory layout, making all operations extremely simple and efficient.

21.06.2025 16:22 — 👍 1 🔁 0 💬 1 📌 0

Implemented naively, however, the model will not be able to exchange information between partitions - and thus unable to capture global interactions.

To overcome this, we adapt the idea from the Swin Transformer, but instead of shifting windows, we rotate trees.

21.06.2025 16:22 — 👍 1 🔁 0 💬 1 📌 0

For that reason, we impose a regular structure onto irregular data via ball trees.

That allows us to restrict attention computation to local partitions, reducing the cost to linear.

21.06.2025 16:22 — 👍 1 🔁 0 💬 1 📌 0

Erwin follows the second approach as we believe that working at full resolution and the original representation should be done for as long as possible - let the data guide the model to decide which information to use and which to discard.

21.06.2025 16:22 — 👍 1 🔁 0 💬 1 📌 0

In the context of modeling large physical systems, this is critical, as the largest applications can include millions of points, which makes full attention unrealistic.

There are two ways: either change the data representation or the data structure.

21.06.2025 16:22 — 👍 1 🔁 0 💬 1 📌 0

When it comes to irregular data, however, there is no natural ordering of points, hence standard sparse attention mechanisms break down as there is no guarantee that the locality they were based upon still exists.

21.06.2025 16:22 — 👍 1 🔁 0 💬 1 📌 0

We start with sparse (sub-quadratic) attention coming in different flavors but all following the same idea - exploiting the regular structure of the data to model interactions between tokens.

21.06.2025 16:22 — 👍 2 🔁 0 💬 1 📌 0

🤹 New blog post!

I write about our recent work on using hierarchical trees to enable sparse attention over irregular data (point clouds, meshes) - Erwin Transformer, accepted to ICML 2025

blog: maxxxzdn.github.io/blog/erwin/
paper: arxiv.org/abs/2502.17019

Compressed version in the thread below:

21.06.2025 16:22 — 👍 19 🔁 5 💬 1 📌 0

Can only speak for my ICML reviewing batch, but the hack of putting scary, convoluted and wrong math still works.

25.03.2025 07:13 — 👍 2 🔁 0 💬 0 📌 0

And that is a wrap! 🫔

We believe models like Erwin will enable the application of deep learning to physical tasks that handle large particle systems and where runtime was previously a bottleneck.

for details, see the preprint: arxiv.org/abs/2502.17019

05.03.2025 18:04 — 👍 2 🔁 0 💬 0 📌 0

Bonus: to minimize the computational overhead of ball tree construction, we develop a fast, parallelized implementation in C++.

15/N

05.03.2025 18:04 — 👍 1 🔁 0 💬 1 📌 0

On the large-scale EAGLE benchmark (1M meshes, each with ~3500 nodes), we achieve SOTA performance in simulating unsteady fluid dynamics.

14/N

05.03.2025 18:04 — 👍 1 🔁 0 💬 1 📌 0

At the molecular dynamics task, Erwin pushes the Pareto frontier in performance vs runtime, proving to be a viable alternative to message-passing based architectures.

13/N

05.03.2025 18:04 — 👍 1 🔁 0 💬 1 📌 0

The cosmological data exhibits long-range dependencies. As each point cloud is relatively large (5,000 points), this poses a challenge for message-passing models. Erwin, on the other hand, is able to capture those effects.

12/N

05.03.2025 18:04 — 👍 1 🔁 0 💬 1 📌 0

We validated Erwin's performance on a variety of large-scale tasks, including:
- cosmology (5k nodes per data point)
- molecular dynamics (~1k nodes per data point)
- turbulent fluid dynamics (~3.5k nodes per data point)

11/N

05.03.2025 18:04 — 👍 1 🔁 0 💬 1 📌 0

Due to the simplicity of implementation, Erwin is blazing fast. For a batch of 16 point clouds, 4096 points each, it only takes ~30 ms to compute the forward pass!

10/N

05.03.2025 18:04 — 👍 1 🔁 0 💬 1 📌 0

The ball tree is stored in memory contiguously - at each level of the tree, points in the same ball are stored next to each other.
This property is critical and allows us to implement the key operations described above simply via .view() or .mean()!

9/N

05.03.2025 18:04 — 👍 1 🔁 0 💬 1 📌 0

As Ball Tree Attention is local, we progressively coarsen and then refine the ball tree while keeping the ball size for attention fixed, following a U-Net-like architecture.
This allows us to learn multi-scale features and effectively increase the model's receptive field.

8/N

05.03.2025 18:04 — 👍 2 🔁 0 💬 1 📌 0

The main idea of the paper is to compute attention within the ball tree partitions.
Once the tree is built, one can choose the level of the tree and compute attention (Ball Tree Attention, BTA) within the balls in parallel.

7/N

05.03.2025 18:04 — 👍 1 🔁 0 💬 1 📌 0

Video thumbnail

Erwin organizes the computation via a ball tree - a hierarchical data structure that recursively partitions points into nested sets of similar size, where each set is represented by a ball that covers all the points in the set.

6/N

05.03.2025 18:04 — 👍 2 🔁 0 💬 1 📌 0

In this paper, we combine the scalability of tree-based algorithms and the efficiency of attention to build a transformer that:

- scales linearly with the number of points;
- captures global dependencies in data through hierarchical learning across multiple scales.

5/N

05.03.2025 18:04 — 👍 2 🔁 0 💬 1 📌 0

While highly successful in numerical simulations, tree-based algorithms synergize poorly with GPUs.
On the other hand, attention is highly hardware-optimized, yet it suffers from the problem tree-based algorithms solve - quadratic cost of all-to-all computations.

4/N

05.03.2025 18:04 — 👍 1 🔁 0 💬 1 📌 0

The problem was studied extensively in the 1980s in many-body physics.
To avoid the brute-force computation, tree-based algorithms were introduced that hierarchically organize particles, and the resolution of interactions depends on the distance between particles.

3/N

05.03.2025 18:04 — 👍 1 🔁 0 💬 1 📌 0

Physical systems with complex geometries are often represented using irregular meshes or point clouds.
These systems often exhibit long-range effects that are computationally challenging to model, as calculating all-to-all interactions scales quadratically with node count.

2/N

05.03.2025 18:04 — 👍 1 🔁 0 💬 1 📌 0

@maxxxzdn is following 20 prominent accounts

Lev Telyatnikov
@levtelyatnikov

Adam Goliński
@adamgol

Apple  ML Research in Barcelona, prev OxCSML InfAtEd, part of MLinPL & polonium_org 🇵🇱, sometimes funny

Jeongwhan Choi
@jeongwhanchoi

Ph.D Student @Yonsei Univ. Graph machine learning & Transformer jeongwhanchoi.me

@frisodekruiff

Dominik Sturm
@domsturm

PhD student @mosaicgroup.bsky.social working on machine learning for understanding the spatial organization of living systems from images.

@psteinb

Using Machine Learning for Matter Research @helmholtzai.bsky.social

David Marx
@digthatdata

I like to post papers. They're not always new. I read a lot: https://dmarx.github.io/papers-feed/ Chaos and Complexity Representation Learning Morality As Cooperation Ontic Structural Realism Epistemic Justice YIMBY, UBI Research MLE Former FireFighter

Sung Kim
@sungkim

A business analyst at heart who enjoys delving into AI, ML, data engineering, data science, data analytics, and modeling. My views are my own. You can also find me at threads: @sung.kim.mw

Fiona Lippert
@fionalippert

Postdoc at SRON | Previously PhD at AMLab & AI4Science Lab, University of Amsterdam Interested in AI for Earth science & ecology, hybrid modeling, geospatial machine learning

Jorge Bravo Abad
@bravo-abad

AI/ML for Science & DeepTech | PI of the AI for Materials Lab | Prof. of Physics at UAM. https://bravoabad.substack.com/

Inigo Quilez
@iquilezles

Math & Art Videos. * https://youtube.com/Inigo_Quilez * https://iquilezles.org Created Shadertoy, Pixar's Wondermoss, Quill, and more.

derek guy
@dieworkwear

Menswear writer. Editor at Put This On. Words at The New York Times, The Washington Post, The Financial Times, Esquire, and Mr. Porter. If you have a style question, search: https://dieworkwear.com/ | https://putthison.com/start-here/

Daniel Shiffman
@shiffman.lol

Choo choo!

Melanie Mitchell
@melaniemitchell

Professor, Santa Fe Institute. Research on AI, cognitive science, and complex systems. Website: https://melaniemitchell.me Substack: https://aiguide.substack.com/

Ramith Hettiarachchi
@ramith.fyi

PhD Student @CMUPittCompBio.bsky.social / @SCSatCMU.bsky.social Interested in ML for science/Compuational drug discovery/AI-assisted scientific discovery 🤞 from 🇱🇰🫶 https://ramith.fyi

Surya Ganguli
@suryaganguli

Professor of Applied Physics at Stanford | Venture Partner a16z | Research in AI, Neuroscience, Physics

Marta Skreta
@martaowesyou

UofT CompSci PhD Student in Alán Aspuru-Guzik's #matterlab and Vector Institute | prev. Apple

Kirill Neklyudov
@k-neklyudov

Assistant Professor at Mila and UdeM https://necludov.github.io/

Alison Pouplin
@alisometry

ML Research scientist. Interested in geometry, information theory and statistics 🧬 Opinions are my own. :)

Mona Schirmer
@monaschir

PhD candidate @amlab.bsky.social @ellis.eu Probabilistic Machine Learning | Sequence Models