Making sense of KV Cache optimizations, Ep. 1: An overview
Let's make sense of the zoo of techniques that exist out there.
KV caching is a necessity on modern #LLMs, but it's not easy do to right. In this post I go through a recent survey that categorizes the most important KV caching techniques. Brace yourself for a deep dive!
www.zansara.dev/posts/2025-1...
#AI #GenAI #LLM #KVcaching #vllm
29.10.2025 12:23 β π 0 π 0 π¬ 0 π 0
How does prompt caching work?
Nearly all inference libraries can do it for you. But what's really going on under the hood?
Do you know how exactly prompt caching works in #GPT models? What is cached, at which stage? Let's have a deep dive into KV caching and how it makes your #LLM inference speed constant regardless of the prompt size.
www.zansara.dev/posts/2025-1...
#AI #GenAI #kvcaching
23.10.2025 15:45 β π 1 π 0 π¬ 0 π 0
What is prompt caching?
Caching prompts can have an outsized impact on the cost and latency of your AI apps. But what exactly to cache and how?
For today's post about common #GenAI questions, let's talk about prompt caching.
Caching sounds like a good idea when you hit speed and cost issues at scale, but you should be careful about what you cache to make it pay off for its added complexity.
www.zansara.dev/posts/2025-1...
#AI #LLMs
17.10.2025 13:54 β π 2 π 0 π¬ 0 π 0
Why using a reranker?
And is the added latency worth it? Let's understand what they do and how can they improve the quality of your RAG pipelines so drastically.
I'm starting a series of small blog posts addressing some common doubts about practical details of #GenAI tech like #RAG, agents, #LLM inference or training, etc.
Here is the first one on rerankers: www.zansara.dev/posts/2025-1...
Do you use them in your RAG pipelines?
#AI #LLMs #rerankers
13.10.2025 15:07 β π 1 π 0 π¬ 0 π 0
Code Mode: the better way to use MCP
It turns out we've all been using MCP wrong. Most agents today use MCP by exposing the
I've seen several approaches to fix the "tools overload" issue that plagues most MCP-heavy apps, but this one is the most interesting so far.
blog.cloudflare.com/code-mode/
#GenAI #AI #MCP
30.09.2025 10:40 β π 0 π 0 π¬ 0 π 0
GitHub - deepset-ai/haystack: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data...
π¦ deepset-ai / haystack
β 22,263 (+30)
π Python
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's be...
14.09.2025 12:02 β π 1 π 1 π¬ 0 π 0
Trying to play "Guess Who" with an LLM
I expected a different kind of fun.
How can we trust LLMs to handle user's credentials when they can't be made to hide the identity of their character in a Guess Who game? And if you think that affects only small models, think again - flagship proprietary model have the same issues as small OSS ones.
www.zansara.dev/posts/2025-0...
15.09.2025 15:52 β π 1 π 0 π¬ 0 π 0
Play 'Guess Who' with LLMs!
Play 'Guess Who' against your favorite LLMs
How about games? I'm working on a little game that makes you play Guess Who against a model of your choice and I'm loving how delirious the gameplay gets at times. www.zansara.dev/guess-who/
06.09.2025 10:04 β π 1 π 0 π¬ 0 π 0
Play 'Guess Who' with LLMs!
Play 'Guess Who' against your favorite LLMs
LLMs are fantastic personal assistants... and terrible tabletop games players. βοΈ
Do you want to challenge GPT-5 or Claude Opus 4.1 at a round of Guess Who? Give it a try and share your most unexpected gameplays! π²
π www.zansara.dev/guess-who/
#LLM #GenAI #GPT #GPT5 #AI
06.09.2025 01:01 β π 3 π 0 π¬ 3 π 0
GPT-5: Key characteristics, pricing and model card
Iβve had preview access to the new GPT-5 model family for the past two weeks, and have been using GPT-5 as my daily-driver. Itβs my new favorite model. Itβs still β¦
I've had preview access to GPT-5 for a couple of weeks, so I have a lot to say about it. Here's my first post, focusing just on core characteristics, pricing (it's VERY competitively priced) and interesting details from the GPT-5 system card simonwillison.net/2025/Aug/7/g...
07.08.2025 17:44 β π 179 π 32 π¬ 13 π 3
Speechify: Free Text to Speech with Humanlike AI Voices
No sign-up required. Turn any text into speech in seconds. Used by 50M+ users & 500k+ 5-star reviews. Perfect for PDFs, books, docs β anything.
βYour target language is so small that #Speechify does not directly support it? Just clone your teacher's or friend's voice and the app will read any text with it π₯
π Try it here: speechify.com/text-to-spee...
#TTS #LanguageLearning #VoiceCloning #AI #TextToSpeech
14.06.2025 18:55 β π 0 π 0 π¬ 0 π 0
π£οΈ Learning uncommon languages in the age of #AI has become so much more enjoyable! Check out #Speechify: just take a picture of a page, and it will read it out loud like your teacher would π
π Try it here: speechify.com/text-to-spee...
#TTS #LanguageLearning #TextToSpeech #OCR
14.06.2025 18:55 β π 1 π 0 π¬ 1 π 0
Can you really interrupt an LLM?
With the recent release of Voice Mode for Claude, it seems like Voice AI is a solved problem. Now that LLMs can speak natively, thereβs apparently no more need for any of the complex voice pipelines t...
π‘ It turns out, this is a very tricky feature for Voice AI LLMs, and I can bet your voice agents suffer from this problem as well!
π Do you want to learn more about this issue? Check out my latest blog post! π
www.zansara.dev/posts/2025-0...
#GenAI #AI #LLMs #VoiceAI
02.06.2025 17:18 β π 1 π 0 π¬ 0 π 0
Alright maybe Gemini has a bug too π€ GPT 4o will SURELY manage to nail this!
#GenAI #AI #GPT #GPT4o #OpenAI #VoiceAI
02.06.2025 17:18 β π 1 π 0 π¬ 1 π 0
Surely this must be a Claude bug π Let's try with Gemini β¨
#GenAI #Ai #Gemini2.0 #VoiceAI
02.06.2025 17:18 β π 0 π 0 π¬ 1 π 0
βHave you ever tried to interrupt a Voice AI mid-sentence? Probably yes.
π But the LLM did not perceive the interruption the same way you did.
π€ Let's see what Claude does when we interrupt while it counts...
#GenAI #Ai #Claude4 #VoiceAI
02.06.2025 17:18 β π 2 π 0 π¬ 1 π 0
π§ Reasoning #LLMs may overthink or jump to conclusions when the reasoning effort is set to the wrong value.
β¨ AutoThink runs the query through a classifier and decides how much effort the query needs.
β Have you tried it?
papers.ssrn.com/sol3/papers...
#GenAI #AI
28.05.2025 09:43 β π 2 π 0 π¬ 0 π 0
GitHub - anthropics/claude-code: Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands.
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo...
π SkyrocketingοΌ π (200+ new stars)
π¦ anthropics / claude-code
β 9,088 (+205)
π Shell
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows...
23.05.2025 18:02 β π 4 π 1 π¬ 0 π 0
π’ Don't overlook this in the wave of releases! #MistralAI has a new coding LLM: it's #Devstral, an open model perfect for on-prem, private and local deployments π
π° Have a look at the announcement: mistral.ai/news/devstral
#MistralAI #GenAI #LLMs #SWEBench
23.05.2025 15:01 β π 1 π 0 π¬ 0 π 0
Italian orthography - Wikipedia
If you're curious Wikipedia has a nice explanation! en.m.wikipedia.org/wiki/Italian... It's called a shallow or phonemic orthography. However many other languages have this feature (Spanish for example!), so I don't know why the models still prefer Italian over them...
23.05.2025 14:49 β π 0 π 0 π¬ 1 π 0
It may be π Or it may be due to the fact that in Italian each letter (or small groups of letters) corresponds to a specific sound in a very consistent way. It makes it a lot easier to transcribe for humans as well!
23.05.2025 10:09 β π 0 π 0 π¬ 1 π 0
Vibecoding with Claude 4 πΆ [Original video at this link: www.zansara.dev/posts/2025-0... ] #vibecoding #AI #GenAI #Claude4 #LLMs #Coding #AgenticAI #VSCode #AnthropicAI
22.05.2025 21:50 β π 1 π 0 π¬ 0 π 0
π§ Another flagship model released! @anthropic.com just unveiled Claude Opus 4 and Claude Sonnet 4, and they are at the top of the leaderboard for coding π»
π° Check out the announcement: www.anthropic.com/news/claude-4
#GenAI #LLMs #Claude #Claude4 #SweBench
22.05.2025 16:48 β π 1 π 0 π¬ 0 π 0
Google for Developers Blog - News about Web, Mobile, AI and Cloud
π Small models are making giant leaps! #Google just released Gemma 3n, a mobile-first #multimodal LLM that can understand text, images, audio and even video input while running on your phone π±
π° Read the announcement here: developers.googleblog.com/en/introduc...
#GenAI #LLMs #Gemma #SLM
22.05.2025 09:05 β π 2 π 0 π¬ 0 π 0
Kudos to @deepgram.com for their fantastic transcription quality and generous free tier πΈ They make these little experiments accessible to everyone π
21.05.2025 16:05 β π 1 π 0 π¬ 0 π 0
Using Llama Models in the EU
The Llama 4 family has been released over a month ago and I finally found some time to explore it. Or so I wished to do, until I realized one crucial issue with these models:
They are banned in the EU...
β οΈ Attention! If you or your company:
- πͺπΊ are based in the EU
- π¦ youβre thinking of integrating Llama models into your product
π Pay close attention to its license: you may be breaking Metaβs terms!
www.zansara.dev/posts/2025-0...
#GenAI #Llama #Multimodal #LLM #AI #AIAct
16.05.2025 15:26 β π 1 π 0 π¬ 0 π 0
Writer http://jalammar.github.io. O'Reilly Author http://LLM-book.com. LLM Builder Cohere.com.
Founding list[float] engineer. Recsys. Personalization. Infra. Systems. Normcore code. Nutella. Vectors. Words. Vibes. Bad puns (soon).
https://vickiboykis.com/what_are_embeddings/
Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet.
[bridged from https://blog.cloudflare.com/ on the web: https://fed.brid.gy/web/blog.cloudflare.com ]
Cloudflare is the worldβs leading connectivity cloud, and we have our eyes set on an ambitious goal β to help build a better Internet.
Author of Hands-On Data Analysis with #Pandas. Full-stack software engineer, international speaker, and open source contributor. Opinions are my own.
stefaniemolin.com
Data Analyst | Exploring Data Science, ML & AI |
Simplifying data | Sharing insights | Learning out loud.
Just a passionate dev, learning from this community daily.
β¨ Sharing the entire journey - bugs, breakthroughs, and banter. π
Independent AI researcher, creator of datasette.io and llm.datasette.io, building open source tools for data journalism, writing about a lot of stuff at https://simonwillison.net/
GitHub trending auto-post bot.No official GitHub product.
Made by: @kawamataryo.bsky.social
Code: https://github.com/kawamataryo/bsky-github-trending-bot
Norwegian photojournalist and tech enthusiast. Worried about how AI will change society.
Currently working for TV 2 Norway.
Big Tech and startups, from the inside. Highly relevant for software engineers and managers, useful for those working in tech. The #1 technology newsletter on Substack. https://newsletter.pragmaticengineer.com/about
Writing The Pragmatic Engineer (@pragmaticengineer.com), the #1 technology newsletter on Substack. Author of The Software Engineer's Guidebook (engguidebook.com). Formerly at Uber, Skype, Skyscanner. More at pragmaticengineer.com
Original news, reviews, analysis of tech trends, and expert advice on the most fundamental aspects of tech.
At wired.com where tomorrow is realized || Sign up for our newsletters: https://wrd.cm/newsletters
Find our WIRED journalists here: https://bsky.app/starter-pack/couts.bsky.social/3l6vez3xaus27
Technology news and analysis with a focus on founders and startup teams.
Got a tip? http://techcrunch.com/tips
Covering life in the future
https://www.theverge.com/subscribe
We're an Al safety and research company that builds reliable, interpretable, and steerable Al systems. Talk to our Al assistant Claude at Claude.ai.
The best place to find out whatβs new in science β and why it matters.