Andrew Gross's Avatar

Andrew Gross

@gross.systems.bsky.social

Engineer at YipitData. NYC Area https://github.com/andrewgross/ https://gross.systems I was told I had to add AI Engineer to my profile for the bots to find me. Views my own, not my employer etc etc.

266 Followers  |  2,844 Following  |  215 Posts  |  Joined: 07.08.2023  |  2.1308

Latest posts by gross.systems on Bluesky

Anyways, would need to reproduce it to understand more about it, but seems to try actually bridge the gap from math to real implementation via their algorithm.

01.11.2025 14:58 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

As for temperature, their work seems to focus on looking at the state of the token distribution for the last token. I think they arent looking to reverse each step of generation but instead take the state when the last token is generated and show you can get back to the prompt from there.

01.11.2025 14:55 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Worth browsing the paper. They start with the mathematical approach ignoring some of the realities and showing that they are reversible. But they also build out an algorithm for being able to produce the input prompt based on a set of hidden states for a model.

01.11.2025 14:53 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
A small number of samples can poison LLMs of any size Anthropic research on data-poisoning attacks in large language models

I wonder how long until we see all these tools that are meant to stop overly-aggressive AI data crawlers start poisoning their data www.anthropic.com/research/sma....

26.10.2025 18:05 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Every new benchmark or tool I see screams that the real limiting factor for making effective systems with LLMs/ML is context + evals. Model "intelligence" is rarely the deciding factor now.

26.09.2025 13:54 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
it/its + skull/skulls | Friendly Neighborhood Werewolf: Image [Header ID: Stars against a black background, in the colors of the xiqyne pride flag: dark magenta, magenta, sky blue, electric blue, light green, and pale yellow. End Header ID.][Icon ID: a digital drawing of a flower with many thin petals, with each petal a different color of the xiqyne pride flag: dark magenta, magenta, sky blue, electric blue, light green, and yellow. A wasp is sitting on the flower in the colors of the aroace flag: orange, yellow, white, light blue, navy blue. End icon ID.]<br />Pronouns are it/its/itself and skull/skulls/skullself. You can use either, or alternate :)

64.media.tumblr.com/0656369524c5...

25.09.2025 11:39 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
OpenAI Platform Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.

Astounding to me that OpenAI has had their new billing dashboard for this long without a good way to tie an API key to usage. API keys get human names, but billing refers to them by `key_XXXXXXXXX`, with no mapping between them. Have to use the legacy dashboard platform.openai.com/account/usag...

22.09.2025 15:38 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

"May you create a successful open source project" - Ancient Developer Curse

14.09.2025 19:06 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Context is also very poorly used in a couple ways. I feel like most tool calls etc need to be done in a sub agent with a concise summary returned to the main context (somehow not dropping important details). The info density of english tokens is bad and I can see using something else.

12.09.2025 11:48 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Context retrieval is one of the worst areas right now. Since context fills up so fast, its hard to convey instructions effectively since most things fall off after 10-20k tokens. Not great to need to re import your claude md before every prompt.

12.09.2025 11:46 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

1mm context seems like a lot, and is in many ways, but it fills up extremely fast between code, prompts, instructions (claude md etc), tool descriptions, tool call results etc. We really need way more / a different system a la long term/short term memory. 100mm+ context to start

12.09.2025 11:42 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Its becoming pretty apparenty from using agentic systems and tools that there are a few big blockers making them more effective

1. Context is way too small
2. Retrieval from that context sucks
3. Density of context is terrible

12.09.2025 11:41 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I've tried using an lsp server with refactoring/renaming tools. It was great when Claude used it but the hardest part was getting Claude to actually use the MCP reliably. Probably a prompting issue. Could be fun to block the ability to use the Write() toolcall using Hooks and try to force it.

12.09.2025 00:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

It was a little annoying that Claude Code didn't have a way to limit the context so it was easy to use other models without manually running compact. I ended up hacking on the JS blob after reviewing the unminified code to find what I needed. I did feel only having 128K context vs the 200k/1mm

11.09.2025 00:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Surprised I haven't seen more discussion of the MTP features in GLM-4.5. Once its configured it really lets the model fly. Went from 70 tok/s to over 200 tok/s. Pretty incredible speedup but no one seems to be running with it.

10.09.2025 23:55 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Some sort of digital sonder

09.09.2025 16:37 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Finally got SGLang working with FP8 on Blackwell. Enabling MTP took GLM 4.5 Air from 70 tok/s to around 200. Pretty great performance! Looks like vLLM does support MTP but hard codes only looking one token ahead, which doesn't do much.

09.09.2025 15:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Its surprising how bad the UX is on the Github UI for adding people to repos. Despite having people in our org with names matching the search prefix, Github loves to suggest random usernames who we have never interacted with. Super easy to accidentally add people.

09.09.2025 14:46 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Today in SGLang configs documented nowhere, `USE_TRITON_W8A8_FP8_KERNEL`. If you have a non-enterprise blackwell GPU, you should set this when running FP8 models (like GLM-Air-FP8). It will allow the model to run and should let you use the tuned triton Blackwell RTX 6000 config.

09.09.2025 01:40 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
sgl-project/sglang SGLang is a fast serving framework for large language models and vision language models. - sgl-project/sglang

Man, Blackwell has been out for almost a year and it is still like pulling teeth to get things working on it. Todays adventure is getting SGLang to play nice with MoE FP8 kernels (hint: use Triton), and then getting SGLang to play nice with itself.

github.com/sgl-project/...

08.09.2025 20:42 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

These days it feels like buying a couple tickets to the lottery is less about escapism and more playing to your outs.

07.09.2025 14:53 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The state of the web is bad enough that I am pondering using a small LLM just to do a better job of filling out address / CC form fields.

07.09.2025 11:42 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - andrewgross/claude-code-unminified Contribute to andrewgross/claude-code-unminified development by creating an account on GitHub.

Ran claude code (clis.js) through Humanify to get a version thats a little more readable. github.com/andrewgross/...

Working on some tooling to make this easier, faster and a bit cleaner on the output.

04.09.2025 14:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Today I learned the hazard of having a dated version of libnccl-dev installed in a container where the CUDA Toolkit and Drivers are a newer version. However, you can go too far, installing the cuda13.0 nccl version with cuda 12.9 installed will not work.

02.09.2025 01:34 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Looks like a lot of this was discovered months ago. Almost done translating the minified JS to something readable, curious how well it will track.

18.08.2025 16:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This also has all the words in all the supported languages to trigger these conditions.

18.08.2025 15:19 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
claude_code_thinking.js GitHub Gist: instantly share code, notes, and snippets.

Turns out all those tips about setting Thinking in Claude Code using terms like ULTRATHINK or MEGATHINK aren't encoded into the model, but just set the thinking token budget: gist.github.com/andrewgross/...

18.08.2025 15:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
GitHub - andrewgross/claude_configs Contribute to andrewgross/claude_configs development by creating an account on GitHub.

Toying around with tracking some global claude configs in git. Some commands, an agent or two, and a global claude md (python focused). github.com/andrewgross/...

14.08.2025 16:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Fun fact, if you run `pip install pyspark` in Databricks and restart the session, it will crash. Although you are running Pyspark, it does not present as an installed Python package, and when you install it, it will overwrite key libraries and break the session.

09.08.2025 17:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The last digits of UUID4 and UUID5 are "random" enough that they distribute well.

07.08.2025 02:33 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@gross.systems is following 20 prominent accounts