Paul Bass's Avatar

Paul Bass

@crispypotatobarrel.bsky.social

SWE. I like writing fast, correct programs. Currently interested in cuda, file systems, formal methods. https://github.com/BassP97

44 Followers  |  462 Following  |  15 Posts  |  Joined: 24.08.2023  |  2.0042

Latest posts by crispypotatobarrel.bsky.social on Bluesky

Post image

Describing a JVM deployable as “a secure file called the jar” is amazing, I’m gonna start doing that at work now (from @matt-levine.bsky.social’s newsletter today)

21.01.2025 19:17 — 👍 2    🔁 0    💬 0    📌 0

Are they stuck into shared memory (sorry for the nvidia-specific concept) when possible, or is it possible that they might reside in main memory? I assume it's implementation specific... but in most cases can I assume "accessing constants is speedy", or is that not a safe assumption?

03.01.2025 05:07 — 👍 0    🔁 0    💬 1    📌 0

👋

Loved the blog post, it answered a few questions of mine :)

Something I'm still a little confused about: where do constants(const-expressions evaluated at shader creation time) "live" on the GPU, memory wise?

03.01.2025 05:07 — 👍 1    🔁 0    💬 1    📌 0

Next up:
- Speed things up! The simulation slows down (runs at <60 fps) at around ~10k particles on my GTX 1650 mobile, and ~50k particles on my M1 max. I'm guessing using barnes hut (rather than a naive n^2 approach) will help, fingers crossed.
- Add a third dimension to the simulation!

31.12.2024 22:45 — 👍 0    🔁 0    💬 0    📌 0
A picture of a n body problem simulation - a cloud of particles, some of which are clumping together. There are controls along the bottom that allow users to control the simulation's speed, the number of particles/planets, and the strength of gravity

A picture of a n body problem simulation - a cloud of particles, some of which are clumping together. There are controls along the bottom that allow users to control the simulation's speed, the number of particles/planets, and the strength of gravity

Done with my MVP of my webgpu accelerated n-body problem simulation (bassp97.github.io/N-Body-Probl...) :)

It not only renders with webgpu, it also performs velocity/force/movement calculations on the gpu using compute shaders

Code: github.com/BassP97/N-Bo...

31.12.2024 22:45 — 👍 4    🔁 0    💬 1    📌 0
Golden circles randomly scattered across a canvas in a web browser; some of the circles are bright gold, others are dim, and others are almost black.

Golden circles randomly scattered across a canvas in a web browser; some of the circles are bright gold, others are dim, and others are almost black.

Progress on my webgpu n-body problem simulation: I got a bunch of circles to render 😄. The brighter a given body/planet is, the more mass it has

31.12.2024 16:06 — 👍 1    🔁 0    💬 0    📌 0

I'm sure I'm missing something here - maybe the 256 thread/group limit is rooted in some intel/amd driver limitation? I wonder if I should have branching logic that changes the workgroup size based on the detected adapter

30.12.2024 22:20 — 👍 0    🔁 0    💬 0    📌 0

on memory stalls to achieve high throughput. If you're not running hundreds of threads per SM, you're probably leaving performance on the table because your SMs have nothing to do while waiting on memory reads!

30.12.2024 22:20 — 👍 0    🔁 0    💬 1    📌 0

But most GPUs can run way more than 64 threads at once. On nvidia chips, each SM can execute up to 128 threads at once, and even cheap nvidia chips have a double digit number of SMs. Also, you want to be running way more than 32-64 threads per SM, because SMs context switch between threads...

30.12.2024 22:19 — 👍 0    🔁 0    💬 1    📌 0

The documentation about this is also pretty confusing! Eg (webgpufundamentals.org/webgpu/lesso...):

> the general advice for WebGPU is to choose a workgroup size of 64 unless you have some specific reason to choose another size. Apparently most GPUs can efficiently run 64 things in lockstep.

30.12.2024 21:50 — 👍 0    🔁 0    💬 1    📌 0

WebGPU compute is cool, but I'm confused about some of the limits; eg why can't I have >256 threads per workgroup? Running 256 threads/block doesn't achieve high occupancy on nvidia chips, leaving performance on the table. Maybe WG != block and the compiler is doing some clever stuff under the hood?

30.12.2024 21:31 — 👍 0    🔁 0    💬 1    📌 0
Video thumbnail

I’m implementing a Barnes-Hut N body problem simulation, and the outputs are mesmerizing

I particularly like this one; watching the orderly circle descend into chaos is so neat 😊

05.12.2024 19:51 — 👍 3    🔁 1    💬 0    📌 0

The real downside of Linux on the desktop is tripping every single website’s risk-o-meter and having to constantly 2FA

03.12.2024 01:29 — 👍 0    🔁 0    💬 0    📌 0

I loved reading this paper - the key insight (ordering matters at read time, not write time) feels so obvious in hindsight, but I’d have never thought of it in a million years; really clever stuff

11.11.2024 22:16 — 👍 2    🔁 1    💬 0    📌 0

The other site is pretty terrible so. Here I am!

11.11.2024 03:38 — 👍 2    🔁 0    💬 1    📌 0

@crispypotatobarrel is following 20 prominent accounts