Anyways, would need to reproduce it to understand more about it, but seems to try actually bridge the gap from math to real implementation via their algorithm.
01.11.2025 14:58 β π 1 π 0 π¬ 0 π 0@gross.systems.bsky.social
Engineer at YipitData. NYC Area https://github.com/andrewgross/ https://gross.systems I was told I had to add AI Engineer to my profile for the bots to find me. Views my own, not my employer etc etc.
Anyways, would need to reproduce it to understand more about it, but seems to try actually bridge the gap from math to real implementation via their algorithm.
01.11.2025 14:58 β π 1 π 0 π¬ 0 π 0As for temperature, their work seems to focus on looking at the state of the token distribution for the last token. I think they arent looking to reverse each step of generation but instead take the state when the last token is generated and show you can get back to the prompt from there.
01.11.2025 14:55 β π 2 π 0 π¬ 1 π 0Worth browsing the paper. They start with the mathematical approach ignoring some of the realities and showing that they are reversible. But they also build out an algorithm for being able to produce the input prompt based on a set of hidden states for a model.
01.11.2025 14:53 β π 2 π 0 π¬ 1 π 0I wonder how long until we see all these tools that are meant to stop overly-aggressive AI data crawlers start poisoning their data www.anthropic.com/research/sma....
26.10.2025 18:05 β π 1 π 0 π¬ 0 π 0Every new benchmark or tool I see screams that the real limiting factor for making effective systems with LLMs/ML is context + evals. Model "intelligence" is rarely the deciding factor now.
26.09.2025 13:54 β π 1 π 0 π¬ 0 π 0Astounding to me that OpenAI has had their new billing dashboard for this long without a good way to tie an API key to usage. API keys get human names, but billing refers to them by `key_XXXXXXXXX`, with no mapping between them. Have to use the legacy dashboard platform.openai.com/account/usag...
22.09.2025 15:38 β π 0 π 0 π¬ 0 π 0"May you create a successful open source project" - Ancient Developer Curse
14.09.2025 19:06 β π 2 π 0 π¬ 0 π 0Context is also very poorly used in a couple ways. I feel like most tool calls etc need to be done in a sub agent with a concise summary returned to the main context (somehow not dropping important details). The info density of english tokens is bad and I can see using something else.
12.09.2025 11:48 β π 0 π 0 π¬ 0 π 0Context retrieval is one of the worst areas right now. Since context fills up so fast, its hard to convey instructions effectively since most things fall off after 10-20k tokens. Not great to need to re import your claude md before every prompt.
12.09.2025 11:46 β π 0 π 0 π¬ 1 π 01mm context seems like a lot, and is in many ways, but it fills up extremely fast between code, prompts, instructions (claude md etc), tool descriptions, tool call results etc. We really need way more / a different system a la long term/short term memory. 100mm+ context to start
12.09.2025 11:42 β π 0 π 0 π¬ 1 π 0Its becoming pretty apparenty from using agentic systems and tools that there are a few big blockers making them more effective
1. Context is way too small
2. Retrieval from that context sucks
3. Density of context is terrible
I've tried using an lsp server with refactoring/renaming tools. It was great when Claude used it but the hardest part was getting Claude to actually use the MCP reliably. Probably a prompting issue. Could be fun to block the ability to use the Write() toolcall using Hooks and try to force it.
12.09.2025 00:16 β π 0 π 0 π¬ 0 π 0It was a little annoying that Claude Code didn't have a way to limit the context so it was easy to use other models without manually running compact. I ended up hacking on the JS blob after reviewing the unminified code to find what I needed. I did feel only having 128K context vs the 200k/1mm
11.09.2025 00:35 β π 0 π 0 π¬ 0 π 0Surprised I haven't seen more discussion of the MTP features in GLM-4.5. Once its configured it really lets the model fly. Went from 70 tok/s to over 200 tok/s. Pretty incredible speedup but no one seems to be running with it.
10.09.2025 23:55 β π 0 π 0 π¬ 0 π 0Some sort of digital sonder
09.09.2025 16:37 β π 0 π 0 π¬ 0 π 0Finally got SGLang working with FP8 on Blackwell. Enabling MTP took GLM 4.5 Air from 70 tok/s to around 200. Pretty great performance! Looks like vLLM does support MTP but hard codes only looking one token ahead, which doesn't do much.
09.09.2025 15:15 β π 0 π 0 π¬ 0 π 0Its surprising how bad the UX is on the Github UI for adding people to repos. Despite having people in our org with names matching the search prefix, Github loves to suggest random usernames who we have never interacted with. Super easy to accidentally add people.
09.09.2025 14:46 β π 1 π 0 π¬ 0 π 0Today in SGLang configs documented nowhere, `USE_TRITON_W8A8_FP8_KERNEL`. If you have a non-enterprise blackwell GPU, you should set this when running FP8 models (like GLM-Air-FP8). It will allow the model to run and should let you use the tuned triton Blackwell RTX 6000 config.
09.09.2025 01:40 β π 0 π 0 π¬ 0 π 0Man, Blackwell has been out for almost a year and it is still like pulling teeth to get things working on it. Todays adventure is getting SGLang to play nice with MoE FP8 kernels (hint: use Triton), and then getting SGLang to play nice with itself.
github.com/sgl-project/...
These days it feels like buying a couple tickets to the lottery is less about escapism and more playing to your outs.
07.09.2025 14:53 β π 0 π 0 π¬ 0 π 0The state of the web is bad enough that I am pondering using a small LLM just to do a better job of filling out address / CC form fields.
07.09.2025 11:42 β π 0 π 0 π¬ 0 π 0Ran claude code (clis.js) through Humanify to get a version thats a little more readable. github.com/andrewgross/...
Working on some tooling to make this easier, faster and a bit cleaner on the output.
Today I learned the hazard of having a dated version of libnccl-dev installed in a container where the CUDA Toolkit and Drivers are a newer version. However, you can go too far, installing the cuda13.0 nccl version with cuda 12.9 installed will not work.
02.09.2025 01:34 β π 0 π 0 π¬ 0 π 0Looks like a lot of this was discovered months ago. Almost done translating the minified JS to something readable, curious how well it will track.
18.08.2025 16:39 β π 0 π 0 π¬ 0 π 0This also has all the words in all the supported languages to trigger these conditions.
18.08.2025 15:19 β π 0 π 0 π¬ 1 π 0Turns out all those tips about setting Thinking in Claude Code using terms like ULTRATHINK or MEGATHINK aren't encoded into the model, but just set the thinking token budget: gist.github.com/andrewgross/...
18.08.2025 15:18 β π 0 π 0 π¬ 1 π 0Toying around with tracking some global claude configs in git. Some commands, an agent or two, and a global claude md (python focused). github.com/andrewgross/...
14.08.2025 16:49 β π 0 π 0 π¬ 0 π 0Fun fact, if you run `pip install pyspark` in Databricks and restart the session, it will crash. Although you are running Pyspark, it does not present as an installed Python package, and when you install it, it will overwrite key libraries and break the session.
09.08.2025 17:02 β π 0 π 0 π¬ 0 π 0The last digits of UUID4 and UUID5 are "random" enough that they distribute well.
07.08.2025 02:33 β π 0 π 0 π¬ 0 π 0