Jordan Costello's Avatar

Jordan Costello

@jordancostello.bsky.social

154 Followers  |  120 Following  |  192 Posts  |  Joined: 06.11.2023  |  2.2427

Latest posts by jordancostello.bsky.social on Bluesky

Preview
Square Enix says it wants generative AI to be doing 70% of its QA and debugging by the end of 2027 | VGC The publisher is researching β€œGame QA Automation Technology” with the University of Tokyo…

Software companies already don't listen to QA, it's one of the reasons why everything is getting worse

Using AI to do the majority of testing will just make ignoring bugs even easier

www.videogameschronicle.com/news/square-...

06.11.2025 16:08 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

"oooo i'd draw that but i can't draw"

brother get this through your head

NOBODY can draw

we are literally

ALL

BULLSHITTING

04.11.2025 20:39 β€” πŸ‘ 7981    πŸ” 1884    πŸ’¬ 123    πŸ“Œ 52

What the heck is a trampoline, anyway?

The blog post is now live! Come one, come all - enjoy this deep dive that commemorates going down the compiler rabbit hole (twice! in the Paris airport!)

savannah.dev/posts/what-t...

05.11.2025 05:40 β€” πŸ‘ 25    πŸ” 7    πŸ’¬ 2    πŸ“Œ 1

Going to be thinking about this thread for a while

30.10.2025 19:33 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I hope upscaling counts too

28.10.2025 22:36 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Best I can say about how to make your own tests at first: is that you just need to know and define what you want to go into your mystery box and what you want to come out of it. Inputs and expected outputs, all day every day

28.10.2025 01:43 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

You can get so much more done*
---

*things people will have to spend 10x as much time redoing as you spent making them

27.10.2025 17:46 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
The Limits of LLM-Generated Unit Tests Developers often ask LLMs like OpenAI Codex to write tests for their code - and they do. The tests compile, run, and even pass, giving a sense of confidence. But how much can we really trust those tests? If an LLM only sees the source code and a few comments, does it truly understand what the software is supposed to do?

The Limits of LLM-Generated Unit Tests

Developers often ask LLMs like OpenAI Codex to write tests for their code - and they do. The tests compile, run, and even pass, giving a sense of confidence. But how much can we really trust those tests? If an LLM only sees the source…
#hackernews #llm #openai

25.10.2025 18:43 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
# Kernel Reflection

## Problem

When unit tests fail, we add diagnostics. When those diagnostics aren't enough, we add more. Tests become cluttered with ad-hoc inspection code. Each kernel reinvents its own debugging helpers.

We need a systematic way to inspect kernel state without polluting tests with diagnostic code.

## Solution

Every kernel has built-in reflection: a method that captures its complete computational state. This state is both machine-readable (for assertions) and human-readable (for test failures).

The reflection is opt-in. Normal GPU execution is unaffected. Only when explicitly invoked does the kernel synchronize, read back GPU data, compute statistics, and format results.

## Design

Kernels expose three methods:

**`run()`** executes the kernel (GPU computation). Returns a metadata object with execution details:
```js
{
  renderCount: 42  // Number of times kernel has been executed (tracked internally)
}
```

**`valueOf(options)`** returns a plain object containing all relevant state: textures, buffers, parameters, validation flags. This object is JSON-serializable for structural comparisons. 

Options:
- `pixels` (boolean|undefined): Whether to capture pixel data. `true` = always capture, `false` = never capture, `undefined` = auto-decide based on size (default).

**`toString()`** returns a compact string summarizing the kernel's state in human-readable form. Uses dense notation with Unicode block characters for visual profiles. Useful for console inspection and test failure messages.

The snapshot object returned by `valueOf()` also has its own `toString()` method that produces the same compact output. 

**Critical implementation notes:**
- `toString()` generates its output string **immediately** when `valueOf()` is called, not lazily when `toString()` is invoked. This ensures the string reflects the exact state at snapshot time, even if the kernel state changes later.
- All values in the snapshot are **deep-copied** to prevent later muta…

# Kernel Reflection ## Problem When unit tests fail, we add diagnostics. When those diagnostics aren't enough, we add more. Tests become cluttered with ad-hoc inspection code. Each kernel reinvents its own debugging helpers. We need a systematic way to inspect kernel state without polluting tests with diagnostic code. ## Solution Every kernel has built-in reflection: a method that captures its complete computational state. This state is both machine-readable (for assertions) and human-readable (for test failures). The reflection is opt-in. Normal GPU execution is unaffected. Only when explicitly invoked does the kernel synchronize, read back GPU data, compute statistics, and format results. ## Design Kernels expose three methods: **`run()`** executes the kernel (GPU computation). Returns a metadata object with execution details: ```js { renderCount: 42 // Number of times kernel has been executed (tracked internally) } ``` **`valueOf(options)`** returns a plain object containing all relevant state: textures, buffers, parameters, validation flags. This object is JSON-serializable for structural comparisons. Options: - `pixels` (boolean|undefined): Whether to capture pixel data. `true` = always capture, `false` = never capture, `undefined` = auto-decide based on size (default). **`toString()`** returns a compact string summarizing the kernel's state in human-readable form. Uses dense notation with Unicode block characters for visual profiles. Useful for console inspection and test failure messages. The snapshot object returned by `valueOf()` also has its own `toString()` method that produces the same compact output. **Critical implementation notes:** - `toString()` generates its output string **immediately** when `valueOf()` is called, not lazily when `toString()` is invoked. This ensures the string reflects the exact state at snapshot time, even if the kernel state changes later. - All values in the snapshot are **deep-copied** to prevent later muta…

Absolutely wild stuff!

Debugging and unit-testing WebGL2 compute shaders.

The depth of abstraction these things reach is PROPER.

github.com/oyin-bo/thre...

26.10.2025 00:22 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

πŸ‘€β—οΈ

26.10.2025 03:15 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Bliss; Conned-Descension

On everything
I know nothing
In reality
I know nothing
Of the cosmos
I know nothing
Of the world
I know nothing
Of my land
I know nothing
Of all people
I know nothing
Of my people
I know nothing
Of myself
I know nothing
Of my mind
I know nothing
And in my mind
I know it all

25.10.2025 17:26 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Is an oracle obliged to fix the future if they lack the power to influence the gods?

The gods would say: yes

25.10.2025 03:23 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

I think hybrid drive is a HDD for total storage and SSD for storing copies of critical/frequently used stuff

24.10.2025 05:54 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

In these cases I wonder if it evens out on catching bugs in good tests that weren't written before vs missing bugs poorly reviewing wrong tests.

Or if any teams will ignore the test results and feel accomplished with tests existing (80% is still a good grade, right folks?)

24.10.2025 03:17 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I'm wondering to what extent (AI) psychosis is a statistical linguistic outcome of a person (or LLM) talking to themselves for too long without accepting a context of external, challenging feedback

22.10.2025 15:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Top cause of adult hearing loss is having headphones on when a podcast's Shopify ad ends.

21.10.2025 23:51 β€” πŸ‘ 27    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Been seeing Sora videos on socials with the watermarks removed. The automated removal tools aren't perfect and leave artifacts so it's worth remembering the Sora watermark pattern (For portrait videos): top-left, middle-right, bottom-left. Watch for alternating artifacts in those regions.

22.10.2025 10:33 β€” πŸ‘ 146    πŸ” 37    πŸ’¬ 6    πŸ“Œ 3

I wonder if the current AI label on LLMs and diffusion models will stick this time by being mainstream enough, or if the label will shift to yet another technology 10 years from now. Guessing the latter.

22.10.2025 03:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

AI is a label that keeps getting shifted to the most recent technology. I figure marketing is doing what it can to help people chase the elusive fantasy of whatever "artificial intelligence" means to people. Something deeply psychological that I'm not putting together right now.

22.10.2025 03:01 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I like when different things have their own names so you can, you know, communicate about them and comprehend their differences. And it's wild: that deeply matters with computers that process info distinctly and literally.

22.10.2025 03:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Playing against a CPU? AI. An NPC looks at player? AI. Path finding? Any algorithm? An if statement? Computer vision? Deep learning? LLMs? Diffusion models? All AI. Every 10 years the goalposts move. In a way, that disturbs me.

22.10.2025 03:01 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Kind of sounds like emergent gameplay systems in video games applied more generally. Bunch of simulations or subsystems let loose to have their way with each other.

Sounds excellent

21.10.2025 23:43 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - EnricoMi/publish-unit-test-result-action: GitHub Action to publish unit test results on GitHub GitHub Action to publish unit test results on GitHub - EnricoMi/publish-unit-test-result-action

This repo looks like it's in the ballpark but you've probably seen it, have more specific needs, and I don't have experience to vouch for it personally.

If you do find something that fits your case I'd be curious to see it too.

21.10.2025 21:13 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Would be great to get those compressed and save new results as artifacts in a ci pipeline. But I assume the non-0 cost of that is too expensive. Cheaper to waste time and bandwidth.

My condolences

21.10.2025 20:24 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Like 14gb of reports of test results? Or like product system reports? Either way: πŸ‘€πŸ˜±

21.10.2025 20:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Exploring Dead Games - YouTube

Reminds me of some video game servers that are silently ticking along for clients that no one remembers.

21.10.2025 07:51 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I knew you'd be prepared with a plan

21.10.2025 01:39 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Hypothetically speaking: if everyone does have brain damage, do we still shoot for democracy or do we go with something else?

21.10.2025 01:24 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Bonus points if the solution is easy to read when you need to remember, and easy to update if the process needs to change

15.10.2025 19:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

In a home without doors, the needy cat is king πŸ’€β˜•οΈ

07.05.2025 11:59 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@jordancostello is following 20 prominent accounts