Ah that's something I didn't think of! This will be the next optimization I'll try on my renderer, now that I've saved a lot of memory I can afford some padding.
16.10.2025 09:05 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0@zino2201.bsky.social
21 | Engine Programmer @ Asobo Studio I write a lot of C++ and Rust
Ah that's something I didn't think of! This will be the next optimization I'll try on my renderer, now that I've saved a lot of memory I can afford some padding.
16.10.2025 09:05 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Been a long time since I've written a blogpost.
"Heterogenous AoS instance encoding for a GPU-driven renderer"
zino2201.substack.com/p/heterogeno...
Are there engines or apps out there that uses vulkan as their sole gfx API /RHI and provide homemade vk implementations for other backends/platforms (with homemade extensions too) ?
16.09.2025 18:00 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 025H2*
15.09.2025 23:42 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0So it seems like a miracle happend in Win11 24H2. UpdateTileMappings() doesn't seems as unpredictable and slow on NVidia. I only did simple tests though, with buffers and not giga textures.
15.09.2025 23:40 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0Graphics Programming weekly - Issue 381 - March 2nd, 2025 www.jendrikillner.com/post/graphic...
03.03.2025 15:18 โ ๐ 87 ๐ 24 ๐ฌ 0 ๐ 1helps a lot to generalize systems. Combining with ECS, I think it is a very powerful mindset for games that are heavily data-driven/have A LOT of data and interactions.
26.01.2025 16:59 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0ajmmertens.medium.com/why-it-is-ti...
This resonate with me a lot, recently I am prototyping on a game that involves a lot of data-driven logic and relationships between entities, factions, resources etc... Seeing this world has a database, a flat array of entities with relationships to each other
TIL MSVC only apply NRVO when using using /O2, /std:c++20 or later, /permissive- or /Zc:nrvo. We were wondering why a copy constructor was called when returning from a function and the return value was not constructed as part of the return statement.
devblogs.microsoft.com/cppblog/impr...
21 yo... why is time accelerating ?
22.01.2025 14:40 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0A table showing the experimental results of applying 6 different compute shader versions of a simple 3x3x3 box blur on a 512x512x512 texture using either GL_R16F or GL_R32F internal format for storage for a eight different GPUs spanning several GPU architectures and vendors. The upper table shows the absolute effective bandwidth (measured as the sum of total bytes read and written divided by execution time), whereas the lower table shows the effective bandwidth relative to the theoretical bandwidth as a percentage. Each row corresponds to a specific shader variant (except for the "theoretical" row, which displays the theoretical bandwidth according to the GPU specification), and each column corresponds to a specific GPU. The color coding is per column in the upper table, and it's a single color coding on the entire lower table. Each version will be explained in detail in the subsequent posts. Version 6 applies uses half precision floating point for the shared memory cache, and the relevant extension does not exist in the Intel drivers for Windows. Likewise this version is not applied to the GL_R32F internal format benchmarks since that would destroy the precision of the backing format anyway. The code was written and initially tested on a desktop 4090 (the first column), which naturally skews the results a bit since everything was evaluated and tested on that GPU. Had I used another GPU I might have picked slightly different compromises, and the results would have been slightly different. One interesting observation is that the RTX 4000 series (Ada Lovelace architecture) significantly overperform everything else, with 7900 XTX (RDNA3) slightly behind. A large part of these overwhelmingly efficient results is due to the massive caches these devices sport (72 MiB on the desktop 4090, 64 MiB on the laptop 4090, etc.), which really helps reach peak bandwidth a lot easier.
Let's wrap up this lovely week with a nice technical post
This is the "case study" from my Masterclass at GPC, where I apply a series of optimizations to improve the effective bandwidth of a 3x3x3 blur (a proxy for a huge set of operations on volumetric data)
Check ALT text for (a lot of) context.
Some of you may remember Itanium.
09.01.2025 16:46 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Graphics programming changed my PoV about CPU architectures. I want to see a more ILP/SIMD approach to software. I saw some architectures having what's called SPM (scratchpad memory) which is similar to groupshared memory. Wonder what computing would look like if we went full VLIW early.
09.01.2025 16:45 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0The actor model is very elegant and message-passing in general always worked great for me to exchange data between threads. I would love to generalize my engine towards more actors, but I'm still skeptical about the overhead VS classical fibers/job graph with fine-grained locking (what I have).
28.12.2024 23:47 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0oops avatar: frontiers of pandora* I mixed two different things haha
19.12.2024 04:53 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0as rendering 30 low triangles electrical boxes. rasterizing & shading a big open forest is not comparable to rendering a city. and that's only the technical side of the problem. I hate these unfair comparaisons. it reminds me of UE Blueprints vs C++ perf videos doing 1 billion iterations. why??
19.12.2024 04:21 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0yea so I watched again some videos of threat interactive, I agree on some of his points, but no, nanite is not a perf disaster. no serious game developer will use something without evalutating its tradeoffs with its content. rendering avatar: the last frontier is not the same
19.12.2024 04:21 โ ๐ 4 ๐ 0 ๐ฌ 3 ๐ 0First look at Wakanda in 'Marvel 1943: The Rise of Hydra'
It will be a playable location
"We wanted to tell a globe-trotting story ... There are yet-to-be-revealed locations in between"
(via EW)
(2/2)
- Offset the AABB depth using the gradients and test, with a fallback to a conservative test for steeps gradients.
Could it work ๐ค
(1/2)
Anyone experimented or have resources about depth gradient based occlusion culling ? I wonder how it could looks in a meshlet renderer.
This is how I see it:
- Store (or compute ?) in your hierarchical depth buffer the gradients
Repost if BlueSky is now your primary social media site.
28.11.2024 20:06 โ ๐ 22171 ๐ 13247 ๐ฌ 553 ๐ 476Iโm doing so much photomode in games and I donโt have my virtual gallery website anymore, so here is a thread.
I will try to dump a game a day.
Today Plague Tale: Requiem.
The unsafe part is usable and I think can cover a big part of the library. But making it safe is quite hard since OpenUSD use a lot of mutability and shared pointers, in Rust you need somehow to attach a Mutex or RefCell to OpenUSD's smart pointers, or add an indirection. IMO too ugly.
27.11.2024 18:02 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Follow up on that, I started writing the safe part of the bindings and I hit several issues due to the differences between C++ and Rust, specially around mutability. So yesterday I started writing a USDA parser... let's see where this goes
27.11.2024 18:00 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0yay I've been able to bind a small part of the OpenUSD api to Rust, only unsafe bindings for now
24.11.2024 17:28 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 1:o !!
21.11.2024 18:13 โ ๐ 5 ๐ 0 ๐ฌ 0 ๐ 0France !
21.11.2024 15:41 โ ๐ 0 ๐ 0 ๐ฌ 3 ๐ 0something* oops
21.11.2024 15:37 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0Still, I'm so grateful to be able to work as a engine programmer. Truly like what I do and being paid to do it is sometime I hope everyone can experience in their life.
Time will tell if I made the right choice ;)