Been a while since I had a good old fashioned LL vs. LR vs. PEG debate
(LR is the best, don't @ me)
Gone too long without saying this so to reiterate yet again:
• Reference counting does not prevent GC pauses; you can deallocate a lot of objects at once
• It is possible to write RC that cleans up cycles and as a user you should demand this
• Most tracing GCs are bad and you should demand better
Maybe it's just me, but it feels like, more than any project I've worked on, the Discourse around Bevy is disproportionately just incorrect.
Not sure if it's because:
* Bevy moves so fast (aside from the editor);
* Bevy is relatively niche;
* Gamedev attracts a lot of confidently incorrect people
Often times when people say "oh, there's room for X and Y to coexist", that's just wishful thinking and either X or Y ends up totally displacing the other in the end. But regarding Go and Rust, that's actually ended up being true! Definitely a nice outcome.
e.g. You do leave some performance on the table if you don't use meshlets in Bevy, but you keep the workflow artists are used to, and you still get quite good perf, though not the absolute best. UE5's philosophy is "just use Nanite"--they're all in on the high end.
Both philosophies are defensible.
I'd say Bevy has a different philosophy. Unreal achieves megaworld scale by introducing new opt-in systems (Nanite, MassEntity) that are radical overhauls, while keeping the old as the default case. Bevy tends to focus on making the default systems as fast as possible.
My goal is to make Bevy "just work" when scaling to millions of entities. You shouldn't have to use a special DOTS/MassEntity/etc. system to scale. There's just one kind of entity, and the ECS scales seamlessly from small to large.
Really excited about the performance benefits coming in Bevy for mega-worlds. With all my patches we're starting to be able to render millions of mesh instance entities with hundreds of thousands in view with just a handful of drawcalls.
🧵 I've been experimenting with caching the best lights in world space to improve NEE sampling. Inspired by ReGIR, MegaLights, and www.yiningkarlli.com/projects/cac....
hacked and adapted Bevy's existing atmosphere support in last night, re-enabled lighting, and made a few more tweaks. enjoy some sunsets #bevy
The GPU clustering itself runs at about 110 μs, a speedup of 30x over Bevy 0.18 on this stress test. It automatically resizes the clusters as necessary for best performance. There are no arbitrary limits on the number of lights or clusters. The same system handles light probes and decals too.
In my GPU clustering branch, which is making its way through review, Bevy 0.19 can render ~8,000 visible lights (of 100k total) at about 200 FPS on my laptop 4070. This also adds the infrastructure for particle systems to emit lights entirely from GPU without any CPU involvement at all.
I'm taking no position on the technical merits of Godot here, but I will say that this is the exact kind of thing people used to say about GCC right up until it and LLVM killed all the other compilers because they couldn't keep up.
Landed light probe falloff and blending in Bevy 0.19: github.com/bevyengine/b...
Along with parallax correction, I think that's the last of the features that are needed to make light probes really usable. Still would be nice to have in-engine baking, of course.
TIL about Autoconf quadrigraphs and I'm screaming
My fork of Bevy Hanabi, Hanabi-Batched, has been updated to support 0.18 and has many more improvements, such as lookup textures, PBR particles, and GPU mergesort for ribbons: github.com/pcwalton/bev...
If you're looking for a way to use Hanabi on 0.18, feel free to grab it!
Bevy 0.18 is out! My main contribution to this one was portals and mirrors: github.com/bevyengine/b...
Strangest issue I've encountered in the wild when fuzzing: `vaddps xmm0,xmm0,xmm1` and `vaddps xmm0,xmm1,xmm0` are *not* the same on x86 when it comes to which NaN payload it chooses. But LLVM will reorder the arguments anyway!
Lesson learned: always canonicalize your NaNs when fuzzing.
16 different versions of glam in my Bevy project. The ecosystem *might* want to improve this a bit :)
I dusted off an old patch and landed the infrastructure for portals and mirrors in Bevy for 0.18: github.com/bevyengine/bevy/pull/13797
This builds the Lengyel oblique clip plane technique into the engine, which is the fastest way to do the clipping necessary for mirrors to work.
Landed normal maps, metallic/roughness maps, and emissive maps for clustered decals in Bevy 0.18: github.com/bevyengine/b...
They compose with other decals and whatever maps are on the base material, if any. Additionally, in a custom shader you can use these textures for whatever you want.
After nine years of development, meshoptimizer has reached its first major version, 1.0!
This release focuses on improvements in clusterization and simplification as well as stabilization. Here's a release announcement with more details on past, present and future; please RT!
meshoptimizer.org/v1
Congrats!
"Analyzing the Performance of WebAssembly vs. Native Code" places a lot of the blame for the worse performance of wasm on register spills, esp. with JS engines' reserved registers. Sounds like APX could actually help by bumping the register count from 16 to 32? ar5iv.labs.arxiv.org/html/1901.09...
Seriously considering putting a bounty on x86-64 support for copy.sh/v86: github.com/copy/v86/iss...
A proper modern JITting emulator on the Web platform (including non-jailbroken iOS) would be amazing! I'm amazed how fast the jitcode can be, even with softmmu.
Noooo, even with APX the DIV and IDIV instructions are hardwired to rdx:rax :(
Note that these instructions by themselves won't enable fast emulation without the virtual memory proposal.
I filed an issue on what I think the minimum set of instructions that wasm needs to efficiently implement JITs of systems with MMUs is: try-load and try-store. github.com/WebAssembly/... Interested in feedback.
(From what I see, this would enable fast emulation on non-jailbroken iOS.)
Yeah, I looked it over. I filed a couple of issues to add things that v86 would need: github.com/WebAssembly/... and github.com/WebAssembly/...
Well, I guess nommu doesn't help that much when you still need to invalidate jitcode. You're paying most of the cost of a TLB in that case...
Wish wasm had better support for setting page permissions and catching faults.