DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference
www.youtube.com/live/Bh-jlh5...
@suraj.io.bsky.social
@Microsoft.com | ex-@kinvolkio ex-@RedHat | bibliophile | He/Him | Opinions are my own. 🟥 🟩 🟦 🟨
DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference
www.youtube.com/live/Bh-jlh5...
The Kubernetes Network Driver Model: A Composable Architecture for High-Performance Networking
arxiv.org/html/2506.23...
This is a handy database to look at the pricing, supported input and context window size:
models.dev
Using LLMs to write meaningful commit messages from CLI
suraj.io/post/2025/ll...
Reverse engineering claude code: simonwillison.net/2025/Jun/2/c...
17.07.2025 23:16 — 👍 0 🔁 0 💬 0 📌 0A Model Context Protocol (MCP) server that provides browser automation capabilities using Playwright. This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.
github.com/microsoft/pl...
InfiniBand Multilayered Security Protects Data Centers and AI Workloads developer.nvidia.com/blog/infinib...
17.07.2025 22:40 — 👍 0 🔁 0 💬 0 📌 0AI for you CI CD: github.com/githubnext/a...
17.07.2025 22:20 — 👍 0 🔁 0 💬 0 📌 0How Susceptible Are You to the Sunk Cost Fallacy?
hbr.org/2021/07/how-...
Keep ublock origin working on Google Chrome: www.reddit.com/r/Adblock/co...
13.07.2025 18:08 — 👍 0 🔁 0 💬 0 📌 0The Speed of Thought: Navigate LLM Inference Autoscaling for a Gen AI Application Toward Production
www.nvidia.com/en-us/on-dem...
Making your own MCP server in VS Code
youtu.be/SYcQXozpb_E?...
Benchmarking LLM Inference Costs for Smarter Scaling and Deployment developer.nvidia.com/blog/benchma...
04.07.2025 02:58 — 👍 0 🔁 0 💬 0 📌 0Use OpenAI's Codex with Grok on Azure: suraj.io/post/2025/de...
02.07.2025 17:52 — 👍 0 🔁 0 💬 0 📌 0Reference: H100 Inference Performance - Max Throughput
Llama v3.1 70B and 8B
developer.nvidia.com/deep-learnin...
LLM Inference Benchmarking: Fundamental Concepts developer.nvidia.com/blog/llm-ben...
28.06.2025 18:07 — 👍 0 🔁 0 💬 0 📌 0lol 😂
27.06.2025 19:52 — 👍 0 🔁 0 💬 0 📌 0The “S” in MCP Stands for Security
elenacross7.medium.com/%EF%B8%8F-th...
The first copyright ruling on generative AI training is a win for AI labs
www.understandingai.org/p/the-first-...
TIL: Gemini CLI is almost free to use
open.substack.com/pub/simonw/p...
Deploying #grok 3 on #Azure AI Foundry
suraj.io/post/2025/de...
Intercepting Claude code requests
simonwillison.net/2025/Jun/2/c...
Yuval Noah Harari on the Dangers of AI
youtu.be/uuBLxWowDqI?...
Seven replies to the viral Apple reasoning paper – and why they fall short
open.substack.com/pub/garymarc...
Speed up any video to more than the defined playback-speed-control. Paste this in your browser devtools console:
```
document.querySelector('video').playbackRate = 2.5
```
simonwillison.net/2025/Jun/19/...
Model pricing per input & output tokens
www.llm-prices.com
Mark Moyou, PhD - Understanding the end-to-end LLM training and inference pipeline
youtu.be/V2L6hufE2X4?...
vLLM Office Hours - Disaggregated Prefill and KV Cache Storage in vLLM - November 14, 2024
www.youtube.com/watch?v=FPr3...
If anyone wants to sign up & join the Azure terraform community call this is the form . They also ask for speakers if you want to submit a topic.
I usually just catch the recordings but they sometimes do APAC timeslots. forms.office.com/Pages/Respon...
This is a good awareness video to show people about the challenges of AI imagery and the scams that are now easier to create.
youtu.be/xyaSVBXF1K8?...