Morten Vassvik's Avatar

Morten Vassvik

@vassvik.bsky.social

Simulation and rendering nerd. Co-founder and CTO @JangaFX Working on EmberGen and more. Discord: vassvik @vassvik@mastodon.gamedev.place

1,847 Followers  |  1,667 Following  |  842 Posts  |  Joined: 30.07.2023  |  2.0874

Latest posts by vassvik.bsky.social on Bluesky

All the above on a desktop 4090.

Pretty shitty image quality though, hard to read anything xD

10.12.2025 13:43 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

2026 will be mostly about fleshing out the remaining features and actually shipping something usable. Shouldn't be that much longer now, hopefully.

10.12.2025 13:37 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

At the moment I'm really happy about the performance itself, which is close to match the performance of a typical dense solver despite actually being sparse. Gives a lot of room to add features that ultimately take some of that performance away by having a really fast baseline.

10.12.2025 13:37 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

There's also an efficient downsampler, doing 32x32x32 -> 16x16x16 -> 8x8x8 -> 4x4x4 -> 2x2x2 -> 1x1x1 downsamples in one pass at high effective bandwidth.

10.12.2025 13:37 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

There's are other neat bits, like a single-pass vorticity confinement that runs at slightly above 50% bandwidth, which is alright since it combines two passes that each would probably had run at 80-90% bandwidth.

10.12.2025 13:37 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

The other big part of the sim itself: The advection.

There's cubic interpolation on the velocity, smoke and flames components, while the fuel and temperature components are linearly interpolated.

Runs at roughly 1/6 occupancy, and roughly 1/6 peak bandwidth, which seems alright given what it does

10.12.2025 13:37 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The tradeoffs have been in favor of larger sims, which makes it fairly easy to push towards high effective bandwidths even with a sparse solver, which is based on a relatively simple brick map structure. For smaller simulations, especially in a realtime game setting the tradeoffs would be different.

10.12.2025 13:37 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

For the tool itself, whose main purpose will be generating assets in a semi-offline manner, the tradeoffs have been in favor of maximizing fidelity (broadly voxel count) and performance (making it run fast). The voxel resolution in this capture is roughly 50-100x larger than what you'd use in a game

10.12.2025 13:37 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

The solver has 3 main parts:
- Advection (movement)
- Injection
- Projection (vortices)

This shows an nsight capture of the last part, the projection.

In this case it's simulating a dense 512x256x256 grid, even though the solver is sparse. Effective bandwidth close to 90% across the board.

10.12.2025 13:37 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

It's roughly 4 years since I started to work on the 3D sparse fluid solver for what will ultimately become EmberGen 2.0.

A quick look at the current performance state of the solver.

A previous retrospective thread from a couple years ago on that other place: x.com/vassvik/stat...

10.12.2025 13:37 β€” πŸ‘ 23    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0
πŸ‡ meshoptimizer v1.0 Mesh optimization library that makes meshes smaller and faster to render

After nine years of development, meshoptimizer has reached its first major version, 1.0!

This release focuses on improvements in clusterization and simplification as well as stabilization. Here's a release announcement with more details on past, present and future; please RT!

meshoptimizer.org/v1

08.12.2025 16:56 β€” πŸ‘ 243    πŸ” 75    πŸ’¬ 8    πŸ“Œ 1

This lets me do a 2x2 and 4x4 reduction efficiently using very few subgroup shuffles, as long as the subgroup size is 16 or bigger. With a little work and using texture filtering I can do even more work with very little effort.

08.12.2025 21:12 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

For example, if I have a 256x1x1 workgroup I can use decode_morton2_8b to map gl_LocalInvocationID.x to values in a 16x16 grid, where contiguous groups of 4 threads span 2x2 values, and contiguous groups of 16 span 4x4 values, and so on.

08.12.2025 21:12 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Swizzling compute shader threads for doing mip downsampling, and swizzling between workgroups as well. At least that's what spurred all of this

08.12.2025 21:12 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Final version for the 2D Morton decode functions. Fairly happy with the structure in general.

A 64-bit version should be easy enough to produce, by just using the 32-bit version twice and combining the results.

08.12.2025 20:01 β€” πŸ‘ 34    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0
Post image

Similar for the packed 16-bit and 8-bit versions:

07.12.2025 23:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

GPUs have "left shift and add" instructions, e.g. V_LSHL_ADD_U32 on RDNA (and likely the same on NV/SASS), and all the muls mentioned so far are basically x + (x << a) anyway.

So we could use those instead since the inputs are generally 32 bit in the generalized 32-bit case:

07.12.2025 23:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Spent some more time on this one, with some input from ryg as well.

So general integer multiplication isn't really ideal if either of the inputs are 32 bit, and it'll effectively be worse than a shift and an add no matter the platform

If both inputs are 24bits it's usually fast on the GPU, though

07.12.2025 23:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

gah, typo in the comment

07.12.2025 21:20 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Took a bit of time before I could go over these. Really nice trick!

Can obviously apply these to the "specialized" packed versions as well, e.g. could get rid of 3 ops for decode_morton2_256x256, and so on, mostly replacing the | and >> combinations with a single multiply.

07.12.2025 21:20 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

Static images really doesn't do it justice πŸ˜…

07.12.2025 08:38 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Everything about this could at least be half the size/length, still

07.12.2025 08:37 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Would seeing it in person feel any less surreal?

07.12.2025 08:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Or, I get stuck in a critical loop trying to evaluate and find the right context for and light to see the results in

Does this make sense?
Are these results good?
Do these results make sense?

Evaluating things in the context for which they were written and not my own is often very hard, too

05.12.2025 23:55 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Often I'm reading a paper, but then get myself repeatedly side tracked from mind tangents seeded by something I read, or I get stuck trying to parse something that just so happened to miss some crucial context that's necessary for it to click. Actually reaching the end can be a lot of work πŸ˜…

05.12.2025 23:49 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Feels like I've come full circle.

I'm sort of curious how common or inevitable this is.

In a way I feel very much "content" in my own little bubble of niche interests and expertise, exploring things on my own without external influence or pressure, maybe to a fault.

05.12.2025 23:37 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

Looking back at the time when I worked on a PhD (which I never finished) I was very much feeling rather academically "lonely" in the sense that very few others had the same level of interest and expertise in the things I worked on to give genuine feedback and have a serious discussion on topic

05.12.2025 23:37 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

At conferences most of the technical content barely interest me, unless it's an interesting story or generally just an excellent presentation (typically on a topic I do know something about in the first place).

The people and interactions are generally what I'm there for instead these days.

05.12.2025 23:37 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

In almost everything I do these days I get an idea first, spend a lot of time on it in isolation, and then when I'm done or nearly done I might stumble upon something similar or maybe even the exact same thing in the wild.

Makes it hard to empathize with other people's work.

05.12.2025 23:37 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I can't really recall the last time I actually read a paper/article for the sake of learning something, or for staying up to date, and I can barely muster the motivatation to skim new research papers even on topics I genuinely like.

Anything that isn't explicitly "mine" is really hard to focus on

05.12.2025 23:37 β€” πŸ‘ 10    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@vassvik is following 20 prominent accounts