ZanSara @zansara - Bluesky Profile

Making sense of KV Cache optimizations, Ep. 1: An overview Let's make sense of the zoo of techniques that exist out there.

KV caching is a necessity on modern #LLMs, but it's not easy do to right. In this post I go through a recent survey that categorizes the most important KV caching techniques. Brace yourself for a deep dive!

www.zansara.dev/posts/2025-1...

#AI #GenAI #LLM #KVcaching #vllm

29.10.2025 12:23 — 👍 0 🔁 0 💬 0 📌 0

How does prompt caching work? Nearly all inference libraries can do it for you. But what's really going on under the hood?

Do you know how exactly prompt caching works in #GPT models? What is cached, at which stage? Let's have a deep dive into KV caching and how it makes your #LLM inference speed constant regardless of the prompt size.

www.zansara.dev/posts/2025-1...

#AI #GenAI #kvcaching

23.10.2025 15:45 — 👍 1 🔁 0 💬 0 📌 0

What is prompt caching? Caching prompts can have an outsized impact on the cost and latency of your AI apps. But what exactly to cache and how?

For today's post about common #GenAI questions, let's talk about prompt caching.

Caching sounds like a good idea when you hit speed and cost issues at scale, but you should be careful about what you cache to make it pay off for its added complexity.

www.zansara.dev/posts/2025-1...

#AI #LLMs

17.10.2025 13:54 — 👍 2 🔁 0 💬 0 📌 0

Why using a reranker? And is the added latency worth it? Let's understand what they do and how can they improve the quality of your RAG pipelines so drastically.

I'm starting a series of small blog posts addressing some common doubts about practical details of #GenAI tech like #RAG, agents, #LLM inference or training, etc.

Here is the first one on rerankers: www.zansara.dev/posts/2025-1...

Do you use them in your RAG pipelines?

#AI #LLMs #rerankers

13.10.2025 15:07 — 👍 1 🔁 0 💬 0 📌 0

Code Mode: the better way to use MCP It turns out we've all been using MCP wrong. Most agents today use MCP by exposing the

I've seen several approaches to fix the "tools overload" issue that plagues most MCP-heavy apps, but this one is the most interesting so far.

blog.cloudflare.com/code-mode/

#GenAI #AI #MCP

30.09.2025 10:40 — 👍 0 🔁 0 💬 0 📌 0

GitHub - deepset-ai/haystack: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots. AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data...

📦 deepset-ai / haystack
⭐ 22,263 (+30)
🗒 Python

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's be...

14.09.2025 12:02 — 👍 1 🔁 1 💬 0 📌 0

Trying to play "Guess Who" with an LLM I expected a different kind of fun.

How can we trust LLMs to handle user's credentials when they can't be made to hide the identity of their character in a Guess Who game? And if you think that affects only small models, think again - flagship proprietary model have the same issues as small OSS ones.

www.zansara.dev/posts/2025-0...

15.09.2025 15:52 — 👍 1 🔁 0 💬 0 📌 0

Play 'Guess Who' with LLMs! Play 'Guess Who' against your favorite LLMs

How about games? I'm working on a little game that makes you play Guess Who against a model of your choice and I'm loving how delirious the gameplay gets at times. www.zansara.dev/guess-who/

06.09.2025 10:04 — 👍 1 🔁 0 💬 0 📌 0

Play 'Guess Who' with LLMs! Play 'Guess Who' against your favorite LLMs

LLMs are fantastic personal assistants... and terrible tabletop games players. ♟️

Do you want to challenge GPT-5 or Claude Opus 4.1 at a round of Guess Who? Give it a try and share your most unexpected gameplays! 🎲

👉 www.zansara.dev/guess-who/

#LLM #GenAI #GPT #GPT5 #AI

06.09.2025 01:01 — 👍 3 🔁 0 💬 3 📌 0

GPT-5: Key characteristics, pricing and model card I’ve had preview access to the new GPT-5 model family for the past two weeks, and have been using GPT-5 as my daily-driver. It’s my new favorite model. It’s still …

I've had preview access to GPT-5 for a couple of weeks, so I have a lot to say about it. Here's my first post, focusing just on core characteristics, pricing (it's VERY competitively priced) and interesting details from the GPT-5 system card simonwillison.net/2025/Aug/7/g...

07.08.2025 17:44 — 👍 179 🔁 32 💬 13 📌 3

Speechify: Free Text to Speech with Humanlike AI Voices No sign-up required. Turn any text into speech in seconds. Used by 50M+ users & 500k+ 5-star reviews. Perfect for PDFs, books, docs – anything.

❓Your target language is so small that #Speechify does not directly support it? Just clone your teacher's or friend's voice and the app will read any text with it 👥

👉 Try it here: speechify.com/text-to-spee...

#TTS #LanguageLearning #VoiceCloning #AI #TextToSpeech

14.06.2025 18:55 — 👍 0 🔁 0 💬 0 📌 0

🗣️ Learning uncommon languages in the age of #AI has become so much more enjoyable! Check out #Speechify: just take a picture of a page, and it will read it out loud like your teacher would 📖

👉 Try it here: speechify.com/text-to-spee...

#TTS #LanguageLearning #TextToSpeech #OCR

14.06.2025 18:55 — 👍 1 🔁 0 💬 1 📌 0

Can you really interrupt an LLM? With the recent release of Voice Mode for Claude, it seems like Voice AI is a solved problem. Now that LLMs can speak natively, there’s apparently no more need for any of the complex voice pipelines t...

💡 It turns out, this is a very tricky feature for Voice AI LLMs, and I can bet your voice agents suffer from this problem as well!

🔍 Do you want to learn more about this issue? Check out my latest blog post! 👇

www.zansara.dev/posts/2025-0...

#GenAI #AI #LLMs #VoiceAI

02.06.2025 17:18 — 👍 1 🔁 0 💬 0 📌 0

Alright maybe Gemini has a bug too 🤔 GPT 4o will SURELY manage to nail this!

#GenAI #AI #GPT #GPT4o #OpenAI #VoiceAI

02.06.2025 17:18 — 👍 1 🔁 0 💬 1 📌 0

Surely this must be a Claude bug 🐛 Let's try with Gemini ✨
#GenAI #Ai #Gemini2.0 #VoiceAI

02.06.2025 17:18 — 👍 0 🔁 0 💬 1 📌 0

✋Have you ever tried to interrupt a Voice AI mid-sentence? Probably yes.

💭 But the LLM did not perceive the interruption the same way you did.

👤 Let's see what Claude does when we interrupt while it counts...

#GenAI #Ai #Claude4 #VoiceAI

02.06.2025 17:18 — 👍 2 🔁 0 💬 1 📌 0

🧠 Reasoning #LLMs may overthink or jump to conclusions when the reasoning effort is set to the wrong value.
✨ AutoThink runs the query through a classifier and decides how much effort the query needs.
❓ Have you tried it?
papers.ssrn.com/sol3/papers...
#GenAI #AI

28.05.2025 09:43 — 👍 2 🔁 0 💬 0 📌 0

GitHub - anthropics/claude-code: Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands. Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo...

🚀 Skyrocketing！ 🚀 (200+ new stars)

📦 anthropics / claude-code
⭐ 9,088 (+205)
🗒 Shell

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows...

23.05.2025 18:02 — 👍 4 🔁 1 💬 0 📌 0

📢 Don't overlook this in the wave of releases! #MistralAI has a new coding LLM: it's #Devstral, an open model perfect for on-prem, private and local deployments 🐈

📰 Have a look at the announcement: mistral.ai/news/devstral

#MistralAI #GenAI #LLMs #SWEBench

23.05.2025 15:01 — 👍 1 🔁 0 💬 0 📌 0

Italian orthography - Wikipedia

If you're curious Wikipedia has a nice explanation! en.m.wikipedia.org/wiki/Italian... It's called a shallow or phonemic orthography. However many other languages have this feature (Spanish for example!), so I don't know why the models still prefer Italian over them...

23.05.2025 14:49 — 👍 0 🔁 0 💬 1 📌 0

It may be 😁 Or it may be due to the fact that in Italian each letter (or small groups of letters) corresponds to a specific sound in a very consistent way. It makes it a lot easier to transcribe for humans as well!

23.05.2025 10:09 — 👍 0 🔁 0 💬 1 📌 0

Vibecoding with Claude 4 🎶 [Original video at this link: www.zansara.dev/posts/2025-0... ] #vibecoding #AI #GenAI #Claude4 #LLMs #Coding #AgenticAI #VSCode #AnthropicAI

22.05.2025 21:50 — 👍 1 🔁 0 💬 0 📌 0

🧠 Another flagship model released! @anthropic.com just unveiled Claude Opus 4 and Claude Sonnet 4, and they are at the top of the leaderboard for coding 💻

📰 Check out the announcement: www.anthropic.com/news/claude-4

#GenAI #LLMs #Claude #Claude4 #SweBench

22.05.2025 16:48 — 👍 1 🔁 0 💬 0 📌 0

Google for Developers Blog - News about Web, Mobile, AI and Cloud

🐜 Small models are making giant leaps! #Google just released Gemma 3n, a mobile-first #multimodal LLM that can understand text, images, audio and even video input while running on your phone 📱

📰 Read the announcement here: developers.googleblog.com/en/introduc...

#GenAI #LLMs #Gemma #SLM

22.05.2025 09:05 — 👍 2 🔁 0 💬 0 📌 0

Kudos to @deepgram.com for their fantastic transcription quality and generous free tier 💸 They make these little experiments accessible to everyone 🙌

21.05.2025 16:05 — 👍 1 🔁 0 💬 0 📌 0

A simple vibecoding exercise Sometimes, after an entire day of coding, the last thing you want to do is to code some more. It would be so great if I could just sit down and enjoy some Youtube videos… Being abroad, most of the videos I watch are in a foreign language, and it helps immensely to have subtitles when I’m not in the mood for hard focus. However, Youtube subtitles are often terrible or missing entirely.

Do you know that GenAI can help you finish that side project that has been gathering dust for months, waiting for its time to shine? ✨

In my last blog post I vibecode a small subtitle generator with o4-mini-high and Claude 3.7 Sonnet 🎬

www.zansara.dev/posts/2025-...

#GenAI #LLMs

21.05.2025 16:01 — 👍 0 🔁 0 💬 1 📌 0

Using Llama Models in the EU The Llama 4 family has been released over a month ago and I finally found some time to explore it. Or so I wished to do, until I realized one crucial issue with these models: They are banned in the EU...

⚠️ Attention! If you or your company:

- 🇪🇺 are based in the EU
- 🦙 you’re thinking of integrating Llama models into your product

📜 Pay close attention to its license: you may be breaking Meta’s terms!

www.zansara.dev/posts/2025-0...

#GenAI #Llama #Multimodal #LLM #AI #AIAct

16.05.2025 15:26 — 👍 1 🔁 0 💬 0 📌 0

Beyond the hype of reasoning models: debunking three common misunderstandings With the release of OpenAI’s o1 and similar models such as DeepSeek R1, Gemini 2.0 Flash Thinking, Phi 4 Reasoning and more, a new type of LLMs entered the scene: the so-called reasoning models. With ...

Wanna learn more about reasoning LLMs? Check out this short blog post where we debunk three common misunderstanding about these models, and join me at ODSC East 2025 for a complete webinar on the topic!

www.zansara.dev/posts/2025-0...

#AI #GenAI #LLMs #ODSCEast #webinar

15.05.2025 17:17 — 👍 0 🔁 0 💬 0 📌 0

GitHub - intentional-ai/intentional: Intentional is an open-source framework to build reliable LLM chatbots that actually talk and behave as you expect. Intentional is an open-source framework to build reliable LLM chatbots that actually talk and behave as you expect. - intentional-ai/intentional

😵‍💫 Piling up instructions in the system prompt of your #LLM doesn't scale!

📢 Intentional makes #GenAI #chatbots able to handle an endless amount of tasks while keeping them under control at all times. Leave it a star on GitHub and try out the demo!

github.com/intentional-...

21.12.2024 16:11 — 👍 1 🔁 0 💬 0 📌 0

ZanSara

Latest posts by zansara.bsky.social on Bluesky

@zansara is following 20 prominent accounts