ZanSara's Avatar

ZanSara

@zansara.bsky.social

✨ GenAI expert | 🐍 Python coder | πŸͺ Sci-fi reader | πŸ‡­πŸ‡Ί Studying weird languages | zansara.dev

14 Followers  |  25 Following  |  26 Posts  |  Joined: 21.12.2024  |  2.6173

Latest posts by zansara.bsky.social on Bluesky

Preview
Making sense of KV Cache optimizations, Ep. 1: An overview Let's make sense of the zoo of techniques that exist out there.

KV caching is a necessity on modern #LLMs, but it's not easy do to right. In this post I go through a recent survey that categorizes the most important KV caching techniques. Brace yourself for a deep dive!

www.zansara.dev/posts/2025-1...

#AI #GenAI #LLM #KVcaching #vllm

29.10.2025 12:23 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
How does prompt caching work? Nearly all inference libraries can do it for you. But what's really going on under the hood?

Do you know how exactly prompt caching works in #GPT models? What is cached, at which stage? Let's have a deep dive into KV caching and how it makes your #LLM inference speed constant regardless of the prompt size.

www.zansara.dev/posts/2025-1...

#AI #GenAI #kvcaching

23.10.2025 15:45 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
What is prompt caching? Caching prompts can have an outsized impact on the cost and latency of your AI apps. But what exactly to cache and how?

For today's post about common #GenAI questions, let's talk about prompt caching.

Caching sounds like a good idea when you hit speed and cost issues at scale, but you should be careful about what you cache to make it pay off for its added complexity.

www.zansara.dev/posts/2025-1...

#AI #LLMs

17.10.2025 13:54 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Why using a reranker? And is the added latency worth it? Let's understand what they do and how can they improve the quality of your RAG pipelines so drastically.

I'm starting a series of small blog posts addressing some common doubts about practical details of #GenAI tech like #RAG, agents, #LLM inference or training, etc.

Here is the first one on rerankers: www.zansara.dev/posts/2025-1...

Do you use them in your RAG pipelines?

#AI #LLMs #rerankers

13.10.2025 15:07 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Code Mode: the better way to use MCP It turns out we've all been using MCP wrong. Most agents today use MCP by exposing the

I've seen several approaches to fix the "tools overload" issue that plagues most MCP-heavy apps, but this one is the most interesting so far.

blog.cloudflare.com/code-mode/

#GenAI #AI #MCP

30.09.2025 10:40 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - deepset-ai/haystack: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots. AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data...

πŸ“¦ deepset-ai / haystack
⭐ 22,263 (+30)
πŸ—’ Python

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's be...

14.09.2025 12:02 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
Trying to play "Guess Who" with an LLM I expected a different kind of fun.

How can we trust LLMs to handle user's credentials when they can't be made to hide the identity of their character in a Guess Who game? And if you think that affects only small models, think again - flagship proprietary model have the same issues as small OSS ones.

www.zansara.dev/posts/2025-0...

15.09.2025 15:52 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Play 'Guess Who' with LLMs! Play 'Guess Who' against your favorite LLMs

How about games? I'm working on a little game that makes you play Guess Who against a model of your choice and I'm loving how delirious the gameplay gets at times. www.zansara.dev/guess-who/

06.09.2025 10:04 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Play 'Guess Who' with LLMs! Play 'Guess Who' against your favorite LLMs

LLMs are fantastic personal assistants... and terrible tabletop games players. β™ŸοΈ

Do you want to challenge GPT-5 or Claude Opus 4.1 at a round of Guess Who? Give it a try and share your most unexpected gameplays! 🎲

πŸ‘‰ www.zansara.dev/guess-who/

#LLM #GenAI #GPT #GPT5 #AI

06.09.2025 01:01 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 3    πŸ“Œ 0
Preview
GPT-5: Key characteristics, pricing and model card I’ve had preview access to the new GPT-5 model family for the past two weeks, and have been using GPT-5 as my daily-driver. It’s my new favorite model. It’s still …

I've had preview access to GPT-5 for a couple of weeks, so I have a lot to say about it. Here's my first post, focusing just on core characteristics, pricing (it's VERY competitively priced) and interesting details from the GPT-5 system card simonwillison.net/2025/Aug/7/g...

07.08.2025 17:44 β€” πŸ‘ 179    πŸ” 32    πŸ’¬ 13    πŸ“Œ 3
Preview
Speechify: Free Text to Speech with Humanlike AI Voices No sign-up required. Turn any text into speech in seconds. Used by 50M+ users & 500k+ 5-star reviews. Perfect for PDFs, books, docs – anything.

❓Your target language is so small that #Speechify does not directly support it? Just clone your teacher's or friend's voice and the app will read any text with it πŸ‘₯

πŸ‘‰ Try it here: speechify.com/text-to-spee...

#TTS #LanguageLearning #VoiceCloning #AI #TextToSpeech

14.06.2025 18:55 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

πŸ—£οΈ Learning uncommon languages in the age of #AI has become so much more enjoyable! Check out #Speechify: just take a picture of a page, and it will read it out loud like your teacher would πŸ“–

πŸ‘‰ Try it here: speechify.com/text-to-spee...

#TTS #LanguageLearning #TextToSpeech #OCR

14.06.2025 18:55 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Can you really interrupt an LLM? With the recent release of Voice Mode for Claude, it seems like Voice AI is a solved problem. Now that LLMs can speak natively, there’s apparently no more need for any of the complex voice pipelines t...

πŸ’‘ It turns out, this is a very tricky feature for Voice AI LLMs, and I can bet your voice agents suffer from this problem as well!

πŸ” Do you want to learn more about this issue? Check out my latest blog post! πŸ‘‡

www.zansara.dev/posts/2025-0...

#GenAI #AI #LLMs #VoiceAI

02.06.2025 17:18 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

Alright maybe Gemini has a bug too πŸ€” GPT 4o will SURELY manage to nail this!

#GenAI #AI #GPT #GPT4o #OpenAI #VoiceAI

02.06.2025 17:18 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

Surely this must be a Claude bug πŸ› Let's try with Gemini ✨
#GenAI #Ai #Gemini2.0 #VoiceAI

02.06.2025 17:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

βœ‹Have you ever tried to interrupt a Voice AI mid-sentence? Probably yes.

πŸ’­ But the LLM did not perceive the interruption the same way you did.

πŸ‘€ Let's see what Claude does when we interrupt while it counts...

#GenAI #Ai #Claude4 #VoiceAI

02.06.2025 17:18 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

🧠 Reasoning #LLMs may overthink or jump to conclusions when the reasoning effort is set to the wrong value.
✨ AutoThink runs the query through a classifier and decides how much effort the query needs.
❓ Have you tried it?
papers.ssrn.com/sol3/papers...
#GenAI #AI

28.05.2025 09:43 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - anthropics/claude-code: Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands. Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo...

πŸš€ Skyrocketing! πŸš€ (200+ new stars)

πŸ“¦ anthropics / claude-code
⭐ 9,088 (+205)
πŸ—’ Shell

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows...

23.05.2025 18:02 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

πŸ“’ Don't overlook this in the wave of releases! #MistralAI has a new coding LLM: it's #Devstral, an open model perfect for on-prem, private and local deployments 🐈

πŸ“° Have a look at the announcement: mistral.ai/news/devstral

#MistralAI #GenAI #LLMs #SWEBench

23.05.2025 15:01 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Italian orthography - Wikipedia

If you're curious Wikipedia has a nice explanation! en.m.wikipedia.org/wiki/Italian... It's called a shallow or phonemic orthography. However many other languages have this feature (Spanish for example!), so I don't know why the models still prefer Italian over them...

23.05.2025 14:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

It may be 😁 Or it may be due to the fact that in Italian each letter (or small groups of letters) corresponds to a specific sound in a very consistent way. It makes it a lot easier to transcribe for humans as well!

23.05.2025 10:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

Vibecoding with Claude 4 🎢 [Original video at this link: www.zansara.dev/posts/2025-0... ] #vibecoding #AI #GenAI #Claude4 #LLMs #Coding #AgenticAI #VSCode #AnthropicAI

22.05.2025 21:50 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

🧠 Another flagship model released! @anthropic.com just unveiled Claude Opus 4 and Claude Sonnet 4, and they are at the top of the leaderboard for coding πŸ’»

πŸ“° Check out the announcement: www.anthropic.com/news/claude-4

#GenAI #LLMs #Claude #Claude4 #SweBench

22.05.2025 16:48 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Google for Developers Blog - News about Web, Mobile, AI and Cloud

🐜 Small models are making giant leaps! #Google just released Gemma 3n, a mobile-first #multimodal LLM that can understand text, images, audio and even video input while running on your phone πŸ“±

πŸ“° Read the announcement here: developers.googleblog.com/en/introduc...

#GenAI #LLMs #Gemma #SLM

22.05.2025 09:05 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Kudos to @deepgram.com for their fantastic transcription quality and generous free tier πŸ’Έ They make these little experiments accessible to everyone πŸ™Œ

21.05.2025 16:05 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
A simple vibecoding exercise Sometimes, after an entire day of coding, the last thing you want to do is to code some more. It would be so great if I could just sit down and enjoy some Youtube videos… Being abroad, most of the videos I watch are in a foreign language, and it helps immensely to have subtitles when I’m not in the mood for hard focus. However, Youtube subtitles are often terrible or missing entirely.

Do you know that GenAI can help you finish that side project that has been gathering dust for months, waiting for its time to shine? ✨

In my last blog post I vibecode a small subtitle generator with o4-mini-high and Claude 3.7 Sonnet 🎬

www.zansara.dev/posts/2025-...

#GenAI #LLMs

21.05.2025 16:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Using Llama Models in the EU The Llama 4 family has been released over a month ago and I finally found some time to explore it. Or so I wished to do, until I realized one crucial issue with these models: They are banned in the EU...

⚠️ Attention! If you or your company:

- πŸ‡ͺπŸ‡Ί are based in the EU
- πŸ¦™ you’re thinking of integrating Llama models into your product

πŸ“œ Pay close attention to its license: you may be breaking Meta’s terms!

www.zansara.dev/posts/2025-0...

#GenAI #Llama #Multimodal #LLM #AI #AIAct

16.05.2025 15:26 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Beyond the hype of reasoning models: debunking three common misunderstandings With the release of OpenAI’s o1 and similar models such as DeepSeek R1, Gemini 2.0 Flash Thinking, Phi 4 Reasoning and more, a new type of LLMs entered the scene: the so-called reasoning models. With ...

Wanna learn more about reasoning LLMs? Check out this short blog post where we debunk three common misunderstanding about these models, and join me at ODSC East 2025 for a complete webinar on the topic!

www.zansara.dev/posts/2025-0...

#AI #GenAI #LLMs #ODSCEast #webinar

15.05.2025 17:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - intentional-ai/intentional: Intentional is an open-source framework to build reliable LLM chatbots that actually talk and behave as you expect. Intentional is an open-source framework to build reliable LLM chatbots that actually talk and behave as you expect. - intentional-ai/intentional

πŸ˜΅β€πŸ’« Piling up instructions in the system prompt of your #LLM doesn't scale!

πŸ“’ Intentional makes #GenAI #chatbots able to handle an endless amount of tasks while keeping them under control at all times. Leave it a star on GitHub and try out the demo!

github.com/intentional-...

21.12.2024 16:11 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@zansara is following 20 prominent accounts