Theyโll have to pry the idea of accumulating samples from my cold dead hands ๐
12.07.2025 22:33 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0@osor.io.bsky.social
Senior VFX/Graphics Programmer working @ Rockstar Games All posts/opinions/views my own :)
Theyโll have to pry the idea of accumulating samples from my cold dead hands ๐
12.07.2025 22:33 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0Thanks so much Albert! ๐งก
๐ฅ๐ฉ๐ฆ Give us the sub-pixels!!! ๐ฅ๐ฉ๐ฆ
I don't have plans to do it right now but maybe at some point in the future if time allows ๐
Part of the message I'm trying to send here is that this isn't too hard to do. I'd love for people to have a go with their own implementations and share improvements if they can! ๐งก
Itโs an honour to be featured ๐ค Thank you so much! โค๏ธ
18.06.2025 16:48 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Screenshots not a great strength either I see ๐
The high-res stuff is in the post anyway, like this one with a bunch of different fonts rendered at high res straight out of the test app I was running this on:
osor.io/text/lorem_i...
Oh no! ๐
12.06.2025 21:45 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Image of a joke fake article over a blue background, it reads: Breaking: Graphics Programmer Does Text Again! The rabbithole police has been called to the codebase after sightings of an anonymous graphics programmer onanistically replacing their whole text rendering implementation. 'It's just a waste of time' said a witness of the large changelists submitted to the repository; "SDFs looked pretty good, I really don't understand" said a former colleage of the suspect. An in-depth article will follow expanding on the alleged wrongdoings of the perpetrator. More news at eleven.
Video is still not Bluesky's forte eh? Here's a screenshot! ๐
12.06.2025 16:46 โ ๐ 5 ๐ 0 ๐ฌ 1 ๐ 0My first one one got an unexpected amount of interest. Huge thanks to everyone who read it! (Especially @jendrikillner.bsky.social since he was probably the biggest reason ๐)
This topic gets way more coverage but I've never seen it done/presented like this, so trying to make my contribution ๐
Hola again graphics peeps! ๐
I found myself with enough bits and pieces related to text rendering to warrant a write-up. So here it is! ๐ฑ
osor.io/text
Spiced up with direct vector rendering, sub-pixel anti-aliasing, run-time atlas packing, temporal accumulation, and more!
I hope you enjoy it! ๐งก
The most sensible approach is obviously that half res and quarter res *both* mean half in each axis / quarter of the pixels.
๐งจ๐งจ๐งจ๐งจ๐งจ๐งจ๐งจ
Hey! Thanks so much! ๐
Unfortunately it was a one-off build since I donโt have much time these days for this kind of project ๐
I would encourage people to attempt their own custom builds though. Or look for custom controller/arcade-stick builders directly since thereโs some already out there ๐
Thanks man! ๐
06.05.2025 20:54 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0Gracias Andrรฉs! โค๏ธ
06.05.2025 20:52 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Been quiet but busy
Hope you all like it! ๐งก
rockstargames.com/VI
www.youtube.com/watch?v=VQRL...
Brain played it with the exact cadence and slapped the music right after ๐
06.03.2025 06:29 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0The first active thread of the wave does the atomic and retrieves the global offset for the wave, WaveReadLaneFirst then broadcasts it.
The local offset within the wave comes from WavePrefixCountBits, since it's just the count of how many threads with a lower index are also writing one element.
If you need the correct index per-thread, as you do when you're going to write the samples to the buffer, there's some more wave-ops involved, since you also need to calculate the local offset for each thread on the wave.
WaveReadLaneFirst/WavePrefixCountBits sorts you out, here is how it'd look:
Oh! Also worth mentioning. In this sort of system you'll see a lot of contention when writing to shared counters.
It's a good idea to minimize this by doing the global write once per wave or group.
A neat trick is to also scalarize on the shader/draw for when a wave sees different values there ๐
Paying my respects with a video rendering 10% of the pixels each frame (hacking this in just now so turning all denoising and TAA off, no reprojection of "empty" pixels either ๐).
(Prepare for the bsky video butchering though)
@adrien-t.bsky.social also made me aware of @h3r2tic.bsky.social's amazing presentation in h3.gd/a-deferred-m.... Super cool to see the per-draw lists and all the spatial and temporal VRS experiments โค๏ธ
31.12.2024 14:23 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0And because this approach ends up compacting the list of pixels per-draw, it responds really well to scenes with heavy dithering into the visibility buffer.
Some of the other tile-based approaches I tried struggled with this, since they'd need to dispatch multiple resolve tiles per tile on-screen.
Plus you also can select variations of the shaders to further optimize!
If a pixel is not seeing any local lights, you can dispatch a version of the resolve shader that has all the local light code compiled out. Or if a pixel is fully in shadow from directional light, nuke all that code too, etc...
I *really* like the flexibility of this approach while keeping the resolve waves full.
With this you can do software VRS easily, both spatially and temporally, which is super cool! You can just write any logic to decide how the visibility buffer values map to a sample to resolve.
There's only a few waves per-frame that see two or more draws when resolving, which are the only cases where the waves aren't fully utilized.
This is as good as it can be anyway, if those waves weren't going to shade another draw, they would have been inactive in a "wave-perfect" resolve anyway.
This can include a lot of data like per-draw transforms, tinting, material stuff, and notoriously also including the bindless texture indices. Nice not to need any of that NonUniformResourceIndex uglyness.
31.12.2024 14:12 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Because the list is not only sorted per-shader but also per-draw, most of the waves will just see one draw!
To take advantage of this, I scalarize based on the draw index, so all of the draw-related data can be uniform and stay in scalar registers. With a noticeable win in occupancy too ๐
The resolve is dispatched per-shader, not per-draw, so it's still a very manageable amount of dispatch indirect calls.
As in, a single one for all your variations of standard, anisotropic, whatever... so there isn't a ton of concerns about empty dispatch indirect arguments.
Tried writing them once unsorted then sorting, but like I imagined, is just too expensive.
I do believe you could make something like this work if you tiled the whole resolve and kept the working set of in local memory though. You need enough tiles/work in flight to feed a big GPU though.
To collect the lists, I first count how many pixels each draw and shader want, then allocate space in a big buffer for each shader, then allocate space for each draw within the chunk allocated for their shader.
Then repeat the same logic but writing the pixels to the buffer with the right offset.
Bunch of blurry colours together, a fun bug while looking at strategies for resolving visibility buffer.
Tried a few ways to resolve a visibility buffer as a nightly adventure a few days ago ๐
So far my favourite I could come up with is collecting lists of pixels per shader, internally sorted per draw, then a dispatch indirect per shader scalarizing on the draw index.
Plus it can have cute bugs! ๐
๐งต