Suraj Deshmukh | सुरज देशमुख @suraj.io

DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference YouTube video by PyTorch

DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference
www.youtube.com/live/Bh-jlh5...

25.07.2025 03:49 — 👍 0 🔁 0 💬 0 📌 0

The Kubernetes Network Driver Model: A Composable Architecture for High-Performance Networking

The Kubernetes Network Driver Model: A Composable Architecture for High-Performance Networking
arxiv.org/html/2506.23...

25.07.2025 02:48 — 👍 0 🔁 0 💬 0 📌 0

Models.dev — An open-source database of AI models Models.dev is a comprehensive open-source database of AI model specifications, pricing, and features.

This is a handy database to look at the pricing, supported input and context window size:
models.dev

18.07.2025 15:55 — 👍 0 🔁 0 💬 0 📌 0

Using LLMs to write meaningful commit messages Learn how to use the llm CLI tool with GitHub Copilot models to generate meaningful commit messages directly from your terminal.

Using LLMs to write meaningful commit messages from CLI
suraj.io/post/2025/ll...

18.07.2025 03:56 — 👍 0 🔁 0 💬 0 📌 0

claude-trace I've been thinking for a while it would be interesting to run some kind of HTTP proxy against the Claude Code CLI app and take a peek at how it …

Reverse engineering claude code: simonwillison.net/2025/Jun/2/c...

17.07.2025 23:16 — 👍 0 🔁 0 💬 0 📌 0

GitHub - microsoft/playwright-mcp: Playwright MCP server Playwright MCP server. Contribute to microsoft/playwright-mcp development by creating an account on GitHub.

A Model Context Protocol (MCP) server that provides browser automation capabilities using Playwright. This server enables LLMs to interact with web pages through structured accessibility snapshots, bypassing the need for screenshots or visually-tuned models.
github.com/microsoft/pl...

17.07.2025 22:53 — 👍 1 🔁 0 💬 2 📌 0

InfiniBand Multilayered Security Protects Data Centers and AI Workloads | NVIDIA Technical Blog In today’s data-driven world, security isn’t just a feature—it’s the foundation. With the exponential growth of AI, HPC, and hyperscale cloud computing, the integrity of the network fabric is more…

InfiniBand Multilayered Security Protects Data Centers and AI Workloads developer.nvidia.com/blog/infinib...

17.07.2025 22:40 — 👍 0 🔁 0 💬 0 📌 0

GitHub - githubnext/awesome-continuous-ai: An awesome list of Continuous AI Actions and Frameworks An awesome list of Continuous AI Actions and Frameworks - githubnext/awesome-continuous-ai

AI for you CI CD: github.com/githubnext/a...

17.07.2025 22:20 — 👍 0 🔁 0 💬 0 📌 0

How Susceptible Are You to the Sunk Cost Fallacy? Many managers are susceptible to the famous sunk cost effect, whereby they persist investing in a money-losing project even when it makes sense to invest the new money in alternative new projects. The...

How Susceptible Are You to the Sunk Cost Fallacy?

hbr.org/2021/07/how-...

16.07.2025 01:51 — 👍 0 🔁 0 💬 0 📌 0

Pasko13's comment on "whats currently the best way to force re-enable ublock origin in chrome?" Explore this conversation and more from the Adblock community

Keep ublock origin working on Google Chrome: www.reddit.com/r/Adblock/co...

13.07.2025 18:08 — 👍 0 🔁 0 💬 0 📌 0

The Speed of Thought: Navigate LLM Inference Autoscaling for a Gen AI Application Toward Production DLIT71339 | GTC 2025 | NVIDIA On-Demand Learn how to choose the autoscaling hyperparameters for your LLM applications by understanding the key metrics during inference

The Speed of Thought: Navigate LLM Inference Autoscaling for a Gen AI Application Toward Production
www.nvidia.com/en-us/on-dem...

07.07.2025 00:24 — 👍 0 🔁 0 💬 0 📌 0

YouTube video by Microsoft Developer Making your own MCP server in VS Code

Making your own MCP server in VS Code
youtu.be/SYcQXozpb_E?...

04.07.2025 13:22 — 👍 0 🔁 0 💬 0 📌 0

Benchmarking LLM Inference Costs for Smarter Scaling and Deployment | NVIDIA Technical Blog This is the third post in the large language model latency-throughput benchmarking series, which aims to instruct developers on how to determine the cost of LLM inference by estimating the total cost…

Benchmarking LLM Inference Costs for Smarter Scaling and Deployment developer.nvidia.com/blog/benchma...

04.07.2025 02:58 — 👍 0 🔁 0 💬 0 📌 0

Deploying Grok-3 on Azure: A Complete Guide to Running xAI's Latest Model Learn how to deploy and configure Grok-3 on Azure AI Foundry with this step-by-step guide. Set up your own instance of xAI's powerful language model in the cloud.

Use OpenAI's Codex with Grok on Azure: suraj.io/post/2025/de...

02.07.2025 17:52 — 👍 0 🔁 0 💬 0 📌 0

Inference Performance for Data Center Deep Learning Deliver great user experiences by lowering latency.

Reference: H100 Inference Performance - Max Throughput
Llama v3.1 70B and 8B
developer.nvidia.com/deep-learnin...

28.06.2025 19:00 — 👍 0 🔁 0 💬 0 📌 0

LLM Inference Benchmarking: Fundamental Concepts | NVIDIA Technical Blog This is the first post in the large language model latency-throughput benchmarking series, which aims to instruct developers on common metrics used for LLM benchmarking, fundamental concepts…

LLM Inference Benchmarking: Fundamental Concepts developer.nvidia.com/blog/llm-ben...

28.06.2025 18:07 — 👍 0 🔁 0 💬 0 📌 0

lol 😂

27.06.2025 19:52 — 👍 0 🔁 0 💬 0 📌 0

The “S” in MCP Stands for Security Spoiler: it doesn’t. But it should.

The “S” in MCP Stands for Security

elenacross7.medium.com/%EF%B8%8F-th...

27.06.2025 05:48 — 👍 1 🔁 0 💬 1 📌 0

The first copyright ruling on generative AI training is a win for AI labs New ruling provides a blueprint for AI companies to stay on the right side of the law.

The first copyright ruling on generative AI training is a win for AI labs
www.understandingai.org/p/the-first-...

26.06.2025 22:28 — 👍 0 🔁 0 💬 0 📌 0

Phoenix.new is Fly's entry into the prompt-driven app development space Plus exploring the system prompts for Gemini CLI and Claude AI artifacts

TIL: Gemini CLI is almost free to use
open.substack.com/pub/simonw/p...

26.06.2025 22:16 — 👍 1 🔁 0 💬 0 📌 0

Deploying Grok-3 on Azure: A Complete Guide to Running xAI's Latest Model Learn how to deploy and configure Grok-3 on Azure AI Foundry with this step-by-step guide. Set up your own instance of xAI's powerful language model in the cloud.

Deploying #grok 3 on #Azure AI Foundry

suraj.io/post/2025/de...

25.06.2025 00:22 — 👍 0 🔁 0 💬 0 📌 0

claude-trace I've been thinking for a while it would be interesting to run some kind of HTTP proxy against the Claude Code CLI app and take a peek at how it …

Intercepting Claude code requests
simonwillison.net/2025/Jun/2/c...

25.06.2025 00:20 — 👍 0 🔁 0 💬 0 📌 0

YouTube video by Reid Hoffman Yuval Noah Harari on the Dangers of AI

Yuval Noah Harari on the Dangers of AI
youtu.be/uuBLxWowDqI?...

24.06.2025 22:40 — 👍 1 🔁 0 💬 0 📌 0

Seven replies to the viral Apple reasoning paper – and why they fall short Also: another paper that seals the deal

Seven replies to the viral Apple reasoning paper – and why they fall short
open.substack.com/pub/garymarc...

24.06.2025 22:38 — 👍 0 🔁 0 💬 0 📌 0

playbackrate Here's a tip that works on YouTube and almost any other web page that shows you a video. You can increase the playback rate beyond the usually-exposed 2x by running …

Speed up any video to more than the defined playback-speed-control. Paste this in your browser devtools console:

```
document.querySelector('video').playbackRate = 2.5
```

simonwillison.net/2025/Jun/19/...

22.06.2025 00:28 — 👍 0 🔁 0 💬 0 📌 0

LLM pricing calculator

Model pricing per input & output tokens
www.llm-prices.com

21.06.2025 22:27 — 👍 0 🔁 0 💬 0 📌 0

YouTube video by PyData Mark Moyou, PhD - Understanding the end-to-end LLM training and inference pipeline

Mark Moyou, PhD - Understanding the end-to-end LLM training and inference pipeline
youtu.be/V2L6hufE2X4?...

14.06.2025 23:35 — 👍 1 🔁 0 💬 0 📌 0

YouTube video by Neural Magic vLLM Office Hours - Disaggregated Prefill and KV Cache Storage in vLLM - November 14, 2024

vLLM Office Hours - Disaggregated Prefill and KV Cache Storage in vLLM - November 14, 2024
www.youtube.com/watch?v=FPr3...

06.06.2025 05:20 — 👍 0 🔁 0 💬 0 📌 0

Microsoft Forms

If anyone wants to sign up & join the Azure terraform community call this is the form . They also ask for speakers if you want to submit a topic.

I usually just catch the recordings but they sometimes do APAC timeslots. forms.office.com/Pages/Respon...

03.06.2025 05:19 — 👍 0 🔁 3 💬 0 📌 0

YouTube video by @soapsoupproductions The Deepfake Scams You're Not Ready For (Made with Google Veo 3)

This is a good awareness video to show people about the challenges of AI imagery and the scams that are now easier to create.

youtu.be/xyaSVBXF1K8?...

01.06.2025 16:30 — 👍 8 🔁 8 💬 0 📌 0

Suraj Deshmukh | सुरज देशमुख

Latest posts by suraj.io on Bluesky

@suraj.io is following 20 prominent accounts