I think we both (sadly) already know the answer.
14.01.2026 22:05 β π 1 π 0 π¬ 0 π 0@willkill07.bsky.social
HPC / Runtimes / Programming Models / C++ I like making things go fast. Senior SW Engineer @ NVIDIA trying to make agentic systems go fast. ex-Voltron Data. ex-NextSilicon. The views and opinions expressed in this account are those of my own.
I think we both (sadly) already know the answer.
14.01.2026 22:05 β π 1 π 0 π¬ 0 π 0Ehhh. They arenβt resisting at all.
βSpotifyβs stance has not meaningfully shifted in the slightest. The ads are gone only because the money is, and the company remains careful not to say what it will not do.β
www.pastemagazine.com/music/spotif...
Bonus: also the worldβs first standard protocol across all languages
06.01.2026 03:28 β π 1 π 0 π¬ 0 π 0This is a wonderful community, so if you (1) are a part of it or (2) generally have interest in supporting an HPC-related community, please consider donating!
22.12.2025 12:48 β π 2 π 2 π¬ 0 π 0I had the pleasure of being mentored by Ian while in graduate school. Queenβs is incredibly lucky to have him!
18.12.2025 22:37 β π 1 π 0 π¬ 0 π 0Day 12 of #AdventOfCode was done twice (cudf and OpenACC)!
github.com/willkill07/A...
github.com/willkill07/A...
This was a very fun year with the shorter duration and increased challenge of a new GPU programming model per day.
Day 11 of #AdventOfCode was done in C++ with std::execution!
github.com/willkill07/A...
Neither the cleanest nor the shortest, but I had to get this programming model in for the event :(
Day 10 of #AdventOfCode was done in CUDA C++!
github.com/willkill07/A...
Parallel BFS for Part 1.
Wrote my own solver for Part 2.
Runs in less than 1.3ms π
Day 9 of #AdventOfCode was done in C++ with OpenMP Offload!
github.com/willkill07/A...
Part 1 and Part 2 are similar structurally, with Part 2 effectively checking for bounding box intersection between the candidate and each edge.
Runs in less than 130 microseconds.
Day 8 of #AdventOfCode was done in CUDA with CCCL!
github.com/willkill07/A...
Uses CCCL to precompute and sort all pairs of nodes by distance and then uses a disjoint set on the GPU (single threaded) to compute the results for part 1 and part 2 separately.
Day 7 of #AdventOfCode was done in ISO C++23 with C++ Standard Parallelism!
github.com/willkill07/A...
Day 6 of #AdventOfCode was done in numba-CUDA! A single kernel was used for both parts.
github.com/willkill07/A...
Day 5 of #AdventOfCode was done in NVIDIAβs Python CCCL library!
github.com/willkill07/A...
A tiny bit of cupy was used, but almost all of the algorithms were dispatched with cccl
Day 4 of #AdventOfCode was so much fun that I did it twice! OpenACC and cudf
OpenACC: github.com/willkill07/A...
cudf: github.com/willkill07/A...
Each are relatively idiomatic. I almost wrote a custom cupy stencil kernel for cudf, but I felt like that was cheating.
Day 3 of #AdventOfCode done using Warp!
github.com/willkill07/A...
The kernel itself yields an array of integer values in the correct order. numpy then reconstructs the number.
The GPU kernel generated runs on a single block where each thread gets its own line of input. Runs in <100us on device.
Day 2 of #AdventOfCode done using cupy!
github.com/willkill07/A...
I tried to avoid nasty repeated division and modulus on the GPU and aimed to do a lot of array-based programming.
With kernel fusion this would be much faster but it still works! 10 more days means 10 GPU programming models left!
Day 1 done using CuTe!
github.com/willkill07/A...
I realize this is kind-of cheating in the sense that it only runs on one GPU thread, but I do take advantage of CuTe Tensors and compile-time JIT via Constexpr!
I do wish that there was an easy way to return a simple value from a CuTe kernel.
My #AdventOfCode challenge to myself is to ensure all solutions run on my GPU in 12 different programming models officially supported by NVIDIA. Iβve only programmed in half of them π
README is live:
github.com/willkill07/A...
The Advent of Code Day 1 countdown showing 23:46:29 remaining until the first puzzle unlocks.
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
30.11.2025 05:15 β π 149 π 14 π¬ 4 π 9Definitely one of those
βYour scientists were so preoccupied with whether or not they could, they didn't stop to think if they should.β
situations
Day 2 of #sc25 evening events (official and unofficial)
HPC Ignites Plenary in Americaβs Ballroom 5:30PM-6:45PM
Ribbon Cutting 6:45PM (Hall 4 corner)
Opening Gala in Exhibit Hall 7:00PM-9:00PM
Beowulf Bash afterwards at City Museum (conference badge required, ~7 blocks west of convention center)
Day 2 of #SC25
Workshops and Tutorials continue to be in full swing today!
Student Programming Events (Room 263)
- Resume Workshop 9:00-9:45AM
- Portfolio Workshop 10:00-10:45AM
- Navigating Education Systems Internationally Panel 11AM-12PM
- 1:1 Career Coaching (preregistration required) 1PM-4PM
The longer you are in the HPC space, the smaller it becomes.
17.11.2025 12:01 β π 3 π 0 π¬ 1 π 0#HPC #SC24. Current registration is 15,446. Peak Scinet bandwidth is 14.72 TBPS. 560 exhibitors.
16.11.2025 17:48 β π 10 π 3 π¬ 3 π 0Iβm unsure due to my obligations with the Student Programming events
16.11.2025 15:12 β π 1 π 0 π¬ 0 π 0Day 1 of #sc25
Tutorials and Workshops kick off this morning!
Some Student Programming events are also running today:
- HPC/AI Crash Course - 8:30AM until 3:30PM in Room 263. Please note this required preregistration
- Career Panel - 3:45PM until 4:45PM in Room 263
Wonderful view of the St Louis skyline coming in on my flight. Eagerly anticipating this yearβs #sc25 #hpcignites
15.11.2025 18:09 β π 6 π 0 π¬ 0 π 0Frontiere ran on Frontier. This alone is worth recognition.
The TLDR is that a really big HACC job ran on Frontier with a bunch of new algorithmic optimizations. Neato.
Safe (and uneventful) travels!
15.11.2025 11:16 β π 1 π 0 π¬ 0 π 0Pinned to my lanyard as well :)
15.11.2025 01:11 β π 2 π 0 π¬ 0 π 0