bind groups suck because they are essentially descriptor sets, and descriptor sets suck because they make a thing that should be easy, extremely difficult
20.03.2025 01:45 โ ๐ 4 ๐ 1 ๐ฌ 1 ๐ 0@track33r.bsky.social
Graphics @ Blizzard
bind groups suck because they are essentially descriptor sets, and descriptor sets suck because they make a thing that should be easy, extremely difficult
20.03.2025 01:45 โ ๐ 4 ๐ 1 ๐ฌ 1 ๐ 0In today's new blog post, I try to provide some intuition and clear up some misconceptions about early Z cull behavior on desktop GPUs. Have a read if that interests you!
therealmjp.github.io/posts/to-ear...
In case it's useful, a while back I wrote a cubemap prefilter shader for GGX: github.com/voithos/quar...
I can't recall if I verified its output with any rigor, but it looked reasonable enough at the time (I mostly wrote it as a learning experience).
Oblivion isn't my favorite TES but the remaster feels so damn fresh after the recent RPGs, and I see people on Steam share my opinion. Plus I'm getting like 200 FPS on a laptop!
Great job, Bethesda, now do Morrowind!
Graphics Programming weekly - Issue 389 - April 27th, 2025 www.jendrikillner.com/post/graphic...
28.04.2025 13:16 โ ๐ 52 ๐ 18 ๐ฌ 0 ๐ 0In game screenshot of the game X-Com above, with a blocky screenshot showing the collision view below it approximating the solidness of the visual view
In game screenshot of the game X-Com above, with a blocky screenshot showing the collision view below it approximating the solidness of the visual view. The screenshot is highlighting a glitch where diagonals allow peering into closed areas due to a quirk of how collision is done
A graphic showing how a treeโs collision in X-Com is defined by vertical slices of predefined patterns
The 112 predefined pattern slices X-Com uses for collision
this is pretty cool. the original X-Com was a 2D isometric game but the collision system itself was 3D. objects in the world would have collisions defined by vertical stacks of predefined patterns. this allowed the game to do accurate line of sight and hit detection. very cool!
28.12.2024 15:47 โ ๐ 369 ๐ 110 ๐ฌ 7 ๐ 10ShaderGlass 1.0 has been released! Big update! ๐จ
Apply shaders to any app on your desktop: gaming, pixel art and video. ๐บ
Free and open source! Links in reply.
Graphics Programming weekly - Issue 378 - February 9th, 2025 www.jendrikillner.com/post/graphic...
10.02.2025 14:29 โ ๐ 88 ๐ 27 ๐ฌ 0 ๐ 1Introduction to Computer Graphics course notes touching upon topics such as rasterisation, PBR, raytracing, image and geometry processing and fluid simulation perso.liris.cnrs.fr/nicolas.bonn...
29.01.2025 20:53 โ ๐ 54 ๐ 13 ๐ฌ 1 ๐ 1TerminateProcess() monitored by a another watchdog process?
31.01.2025 04:44 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0More like a sign of a burnout.
31.01.2025 04:42 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0OK, AMD's Reshape is amazing. Clearing out so many errors that I would have previously struggled for days and days to catch in the past. Thanks so much @gpuopen.bsky.social, @aurolou.bsky.social, @miguel-oenp.bsky.social and everyone who had a hand in creating this one โค๏ธ
29.01.2025 03:18 โ ๐ 16 ๐ 1 ๐ฌ 1 ๐ 0Judging by the Blackwell whitepaper, RTX Mega Geometry is an implementation of ideas from this HPG 2023 paper by Carsten Benthin and me. It is cool to see this broadly deployed so soon. Hopefully, cross-vendor standardization will be just as swift.
momentsingraphics.de/HPG2023.html
I love how Microsoft deprecated _ReadWriteBarrier(), but there's no alternative for it except "use std::atomic instead".
So std::atomic has to push/pop deprecation warnings every time it needs to call it.
My team is hiring careers.amd.com/careers-home...
01.12.2024 19:29 โ ๐ 6 ๐ 5 ๐ฌ 0 ๐ 0RESTIR is one of those moments in computer graphics where a method is too useful to ignore, but too complex to fit in a textbook paragraph - and with time, the explanation simplifies and it becomes canon. This course by Chris Wyman is a huge effort to explain and simplify
20.11.2024 15:13 โ ๐ 59 ๐ 9 ๐ฌ 2 ๐ 1HLSL++ reached 600 stars today. I started the project because I didn't like the interface of the math library we had at Tt. I understand why though, it's a lot of effort. Thank you to everyone who finds it useful and contributed.
github.com/redorav/hlslpp
Still one of the most accessible explanations of the rendering equation out there
18.11.2024 20:26 โ ๐ 144 ๐ 31 ๐ฌ 1 ๐ 0Graphics Programming weekly - Issue 366 - November 17th, 2024 www.jendrikillner.com/post/graphic...
18.11.2024 16:05 โ ๐ 141 ๐ 37 ๐ฌ 2 ๐ 1Signal boosting from last week
gpuopen.com/learn/work-g...
Provides a great introduction to GPU work graphs.
Nice summary of occupancy as a way to hide mem latency, but it is not the only (and sometimes not the best) way. Helping the compiler to add more instructions between the mem read issue and the data use by for eg partially unrolling a loop might help as well (from gpu-primitives-course.github.io)
17.11.2024 12:46 โ ๐ 81 ๐ 10 ๐ฌ 2 ๐ 1valve has published a 2-hour documentary for the 20th anniversary of hl2. currently watching it.
i was jazzed from literally the first seconds, when i saw the shots and heard that eastern european accent, because i knew it was viktor antonov.
www.youtube.com/watch?v=YCjN...
tiny_bvh.h 0.9.1 adds support for using a custom alloc/free:
github.com/jbikker/tiny...
As the Dutch say: "Voor u verandert er verder niets", i.e. the lib has good default options that it will use transparently unless you insist on exerting that kind of control.
I feel the need to shout out this awesome D3D12 feature table that a few people from the DirectX discord put together: d3d12infodb.boolka.dev/FeatureTable...
18.11.2024 02:23 โ ๐ 122 ๐ 38 ๐ฌ 1 ๐ 2"Real-time denoising of importance sampled direct lighting", MSc thesis describing the denoising approach used for ReSTIR DI in Northlight engine for Alan Wake 2, also nice summary and reference for various denoising techniques aaltodoc.aalto.fi/server/api/c...
18.11.2024 17:59 โ ๐ 31 ๐ 3 ๐ฌ 0 ๐ 0tiny_bvh.h, speedtest app: First OpenCL BVH traversal kernel is now available, for the 'Aila & Laine' 2-way GPU-friendly format. About 200M rays/s for the sphere flake on Intel Xe integrated graphics.
github.com/jbikker/tiny...
A table showing the experimental results of applying 6 different compute shader versions of a simple 3x3x3 box blur on a 512x512x512 texture using either GL_R16F or GL_R32F internal format for storage for a eight different GPUs spanning several GPU architectures and vendors. The upper table shows the absolute effective bandwidth (measured as the sum of total bytes read and written divided by execution time), whereas the lower table shows the effective bandwidth relative to the theoretical bandwidth as a percentage. Each row corresponds to a specific shader variant (except for the "theoretical" row, which displays the theoretical bandwidth according to the GPU specification), and each column corresponds to a specific GPU. The color coding is per column in the upper table, and it's a single color coding on the entire lower table. Each version will be explained in detail in the subsequent posts. Version 6 applies uses half precision floating point for the shared memory cache, and the relevant extension does not exist in the Intel drivers for Windows. Likewise this version is not applied to the GL_R32F internal format benchmarks since that would destroy the precision of the backing format anyway. The code was written and initially tested on a desktop 4090 (the first column), which naturally skews the results a bit since everything was evaluated and tested on that GPU. Had I used another GPU I might have picked slightly different compromises, and the results would have been slightly different. One interesting observation is that the RTX 4000 series (Ada Lovelace architecture) significantly overperform everything else, with 7900 XTX (RDNA3) slightly behind. A large part of these overwhelmingly efficient results is due to the massive caches these devices sport (72 MiB on the desktop 4090, 64 MiB on the laptop 4090, etc.), which really helps reach peak bandwidth a lot easier.
Let's wrap up this lovely week with a nice technical post
This is the "case study" from my Masterclass at GPC, where I apply a series of optimizations to improve the effective bandwidth of a 3x3x3 blur (a proxy for a huge set of operations on volumetric data)
Check ALT text for (a lot of) context.
Maybe now is the time to share!
A quick way to reduce bandwidth A LOT when dealing with Deferred Renderer on mobile
Possibly you have non-optimal order of passes when dealing with shadowed lights. This saves 120+MB/frame with 7 lights on iPad
More: www.gamedeveloper.com/game-platfor...
If you all donโt mind, Iโm going go shill this blog post I wrote two years ago about dealing with memory pools and uploading from SysRAM->VRAM. Itโs one I link to frequently for people starting out in D3D12. And it has fun graphs!
therealmjp.github.io/posts/gpu-me...
Graphics Programming weekly - Issue 364 - November 3rd, 2024 www.jendrikillner.com/post/graphic...
04.11.2024 15:00 โ ๐ 62 ๐ 20 ๐ฌ 0 ๐ 0