Will Killian's Avatar

Will Killian

@willkill07.bsky.social

HPC / Runtimes / Programming Models / C++ I like making things go fast. Senior SW Engineer @ NVIDIA trying to make agentic systems go fast. ex-Voltron Data. ex-NextSilicon. The views and opinions expressed in this account are those of my own.

172 Followers  |  69 Following  |  64 Posts  |  Joined: 23.11.2023  |  1.4786

Latest posts by willkill07.bsky.social on Bluesky

I think we both (sadly) already know the answer.

14.01.2026 22:05 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Spotify won’t say it’s done with ICE Paste Magazine is your source for the best music, movies, TV, comedy, videogames, books, comics, craft beer, politics and more. Discover your favorite albums and films.

Ehhh. They aren’t resisting at all.

β€œSpotify’s stance has not meaningfully shifted in the slightest. The ads are gone only because the money is, and the company remains careful not to say what it will not do.β€œ

www.pastemagazine.com/music/spotif...

11.01.2026 18:28 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Bonus: also the world’s first standard protocol across all languages

06.01.2026 03:28 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This is a wonderful community, so if you (1) are a part of it or (2) generally have interest in supporting an HPC-related community, please consider donating!

22.12.2025 12:48 β€” πŸ‘ 2    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

I had the pleasure of being mentored by Ian while in graduate school. Queen’s is incredibly lucky to have him!

18.12.2025 22:37 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Day 12 of #AdventOfCode was done twice (cudf and OpenACC)!

github.com/willkill07/A...

github.com/willkill07/A...

This was a very fun year with the shorter duration and increased challenge of a new GPU programming model per day.

13.12.2025 04:41 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Day 11 of #AdventOfCode was done in C++ with std::execution!

github.com/willkill07/A...

Neither the cleanest nor the shortest, but I had to get this programming model in for the event :(

11.12.2025 20:53 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1

Day 10 of #AdventOfCode was done in CUDA C++!

github.com/willkill07/A...

Parallel BFS for Part 1.
Wrote my own solver for Part 2.

Runs in less than 1.3ms πŸš€

11.12.2025 02:53 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1

Day 9 of #AdventOfCode was done in C++ with OpenMP Offload!

github.com/willkill07/A...

Part 1 and Part 2 are similar structurally, with Part 2 effectively checking for bounding box intersection between the candidate and each edge.

Runs in less than 130 microseconds.

09.12.2025 15:05 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1

Day 8 of #AdventOfCode was done in CUDA with CCCL!

github.com/willkill07/A...

Uses CCCL to precompute and sort all pairs of nodes by distance and then uses a disjoint set on the GPU (single threaded) to compute the results for part 1 and part 2 separately.

09.12.2025 02:08 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1

Day 7 of #AdventOfCode was done in ISO C++23 with C++ Standard Parallelism!

github.com/willkill07/A...

07.12.2025 21:30 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1

Day 6 of #AdventOfCode was done in numba-CUDA! A single kernel was used for both parts.

github.com/willkill07/A...

06.12.2025 21:47 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1

Day 5 of #AdventOfCode was done in NVIDIA’s Python CCCL library!

github.com/willkill07/A...

A tiny bit of cupy was used, but almost all of the algorithms were dispatched with cccl

05.12.2025 20:01 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1

Day 4 of #AdventOfCode was so much fun that I did it twice! OpenACC and cudf

OpenACC: github.com/willkill07/A...

cudf: github.com/willkill07/A...

Each are relatively idiomatic. I almost wrote a custom cupy stencil kernel for cudf, but I felt like that was cheating.

05.12.2025 04:24 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1

Day 3 of #AdventOfCode done using Warp!

github.com/willkill07/A...

The kernel itself yields an array of integer values in the correct order. numpy then reconstructs the number.

The GPU kernel generated runs on a single block where each thread gets its own line of input. Runs in <100us on device.

03.12.2025 15:32 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1

Day 2 of #AdventOfCode done using cupy!

github.com/willkill07/A...

I tried to avoid nasty repeated division and modulus on the GPU and aimed to do a lot of array-based programming.

With kernel fusion this would be much faster but it still works! 10 more days means 10 GPU programming models left!

02.12.2025 23:53 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1

Day 1 done using CuTe!

github.com/willkill07/A...

I realize this is kind-of cheating in the sense that it only runs on one GPU thread, but I do take advantage of CuTe Tensors and compile-time JIT via Constexpr!

I do wish that there was an easy way to return a simple value from a CuTe kernel.

01.12.2025 16:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1
Preview
GitHub - willkill07/AdventOfCode2025: My solutions to https://adventofcode.com/2025/ My solutions to https://adventofcode.com/2025/. Contribute to willkill07/AdventOfCode2025 development by creating an account on GitHub.

My #AdventOfCode challenge to myself is to ensure all solutions run on my GPU in 12 different programming models officially supported by NVIDIA. I’ve only programmed in half of them πŸ˜…

README is live:

github.com/willkill07/A...

30.11.2025 23:37 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1
The Advent of Code Day 1 countdown showing 23:46:29 remaining until the first puzzle unlocks.

The Advent of Code Day 1 countdown showing 23:46:29 remaining until the first puzzle unlocks.

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

30.11.2025 05:15 β€” πŸ‘ 149    πŸ” 14    πŸ’¬ 4    πŸ“Œ 9

Definitely one of those

β€œYour scientists were so preoccupied with whether or not they could, they didn't stop to think if they should.”

situations

26.11.2025 02:07 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Day 2 of #sc25 evening events (official and unofficial)

HPC Ignites Plenary in America’s Ballroom 5:30PM-6:45PM

Ribbon Cutting 6:45PM (Hall 4 corner)

Opening Gala in Exhibit Hall 7:00PM-9:00PM

Beowulf Bash afterwards at City Museum (conference badge required, ~7 blocks west of convention center)

17.11.2025 13:28 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Day 2 of #SC25

Workshops and Tutorials continue to be in full swing today!

Student Programming Events (Room 263)
- Resume Workshop 9:00-9:45AM
- Portfolio Workshop 10:00-10:45AM
- Navigating Education Systems Internationally Panel 11AM-12PM
- 1:1 Career Coaching (preregistration required) 1PM-4PM

17.11.2025 13:22 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The longer you are in the HPC space, the smaller it becomes.

17.11.2025 12:01 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

#HPC #SC24. Current registration is 15,446. Peak Scinet bandwidth is 14.72 TBPS. 560 exhibitors.

16.11.2025 17:48 β€” πŸ‘ 10    πŸ” 3    πŸ’¬ 3    πŸ“Œ 0

I’m unsure due to my obligations with the Student Programming events

16.11.2025 15:12 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Day 1 of #sc25

Tutorials and Workshops kick off this morning!

Some Student Programming events are also running today:
- HPC/AI Crash Course - 8:30AM until 3:30PM in Room 263. Please note this required preregistration
- Career Panel - 3:45PM until 4:45PM in Room 263

16.11.2025 13:57 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

Wonderful view of the St Louis skyline coming in on my flight. Eagerly anticipating this year’s #sc25 #hpcignites

15.11.2025 18:09 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Frontiere ran on Frontier. This alone is worth recognition.

The TLDR is that a really big HACC job ran on Frontier with a bunch of new algorithmic optimizations. Neato.

15.11.2025 14:54 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Safe (and uneventful) travels!

15.11.2025 11:16 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Pinned to my lanyard as well :)

15.11.2025 01:11 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@willkill07 is following 20 prominent accounts