Kwindla Hultman Kramer @kwindla

March Voice AI Meetup - Wednesday the 5th

lu.ma/ffpyl57n

17.02.2025 01:58 — 👍 1 🔁 0 💬 0 📌 0

Gemini 2.0: Flash, Flash-Lite and Pro The Gemini 2.0 model family is now updated, to include the production-ready Gemini 2.0 Flash, the experimental Gemini 2.0 Pro, and Gemini 2.0 Flash Lite.

The new model is GA today:

developers.googleblog.com/en/gemini-2-...

05.02.2025 17:55 — 👍 0 🔁 0 💬 0 📌 0

Gemini 2.0 Flash is competitive with GPT-4o on:
- TTFT,
- instruction following,
- function calling, and
- natural conversation dynamics.

GPT-4o was ahead on all of these attributes by a wide enough margin that using any other LLM for voice AI mostly didn't make sense. Now there's competition!

05.02.2025 17:55 — 👍 1 🔁 0 💬 1 📌 0

Source code is here:

github.com/pipecat-ai/p...

My favorite thing about this demo is that it's a really nice example of composite function calling.

Here are the function definitions. Gemini figures out solely from the argument descriptions how to find a conversation from "a few minutes ago"!

04.02.2025 15:51 — 👍 0 🔁 0 💬 0 📌 0

Memory for voice AI agents (and composite function calling) ...

There are several ways to store (and later, retrieve) conversation state. One of the simplest is just to define a couple of functions and use your local filesystem!

Here, @chadbailey.net shows how to do that, using Gemini 2.0 Flash.

04.02.2025 15:51 — 👍 1 🔁 0 💬 1 📌 0

Introduction WebRTC For The Curious # Introduction # WebRTC For The Curious is an open-source book created by WebRTC implementers to share their hard-earned knowledge with the world. It’s written for those who are...

The Pion community is vibrant and welcoming, and Sean also wrote the definitive guide to WebRTC, "WebRTC For The Curious."

webrtcforthecurious.com

If you're interested in diving into WebRTC, read Sean's guide and check out the Pion code and forums.

03.02.2025 20:25 — 👍 0 🔁 0 💬 0 📌 0

YouTube video by Daily WebRTC and AI in 2025 — Sean DuBois and Kwindla Hultman Kramer

Full video here:

www.youtube.com/watch?v=l_rT...

We talked about:
- WebRTC history.
- Why you probably need to use WebRTC instead of WebSockets for your voice AI application
- How we see things evolving for WebRTC x multimodal AI
- Embedded WebRTC, telephony, and surprising use cases.

03.02.2025 20:25 — 👍 0 🔁 0 💬 1 📌 0

Sean DuBois is one of my favorite people to talk to about WebRTC, audio and video, designing good libraries, and hacking in general.

Sean is the creator of Pion. Pion is an Open Source WebRTC implementation that is influential and very widely used (including at OpenAI, where Sean works).

03.02.2025 20:25 — 👍 1 🔁 0 💬 1 📌 0

LinkedIn This link will take you to a page that’s not on LinkedIn

Usually, when you work on a system like this, you never manage to write up all the lessons learned even for your own use, much less publish them in such an accessible paper. Kudos to the DeepSeek team.

lnkd.in/gz8SBvuM

30.01.2025 20:46 — 👍 1 🔁 0 💬 0 📌 0

Writing really, really optimized distributed systems code is very satisfying. I've written a lot of both GPU code and networking code over the years, so the overlap here makes me particularly happy!

But my favorite, favorite part is that they also wrote a section, "Suggestions on Hardware Design."

30.01.2025 20:46 — 👍 1 🔁 0 💬 1 📌 0

It would be fun to see this code, though the actual implementation itself is tightly coupled enough to the architecture of the H800 and their cluster design — the specifics of the NVLink and InfiniBand — that it wouldn't be useful as an open source building block.

30.01.2025 20:46 — 👍 0 🔁 0 💬 1 📌 0

My favorite part of the DeepSeek-V3 Technical Report is the stuff about the all-to-all communication kernels. (Mostly in section 3.2.2. "Efficient Implementation of Cross-Node All-to-All Communication.")

30.01.2025 20:46 — 👍 0 🔁 0 💬 1 📌 0

24.01.2025 01:37 — 👍 0 🔁 0 💬 0 📌 0

YouTube video by Daily Google Gemini search metadata

Using Gemini search metadata in a voice AI application

Filipi added support in Pipecat for Google Gemini's `groundingMetadata`. This makes it easy to do things like:

- link to URLs
- log searches for observability
- use specific search result chunks for RAG

youtu.be/oL9w-3Hbag0

22.01.2025 20:01 — 👍 2 🔁 2 💬 0 📌 0

YouTube video by Daily Voice AI programming with Gemini² and Cursor

Voice AI programming with Gemini² and Cursor

Adrian built a Gemini voice + vision AI agent that writes software indirectly, collaborating with a human and with Gemini running inside Cursor. Really nice glimpse of the future (and nice example of a "multi-agent" architecture).

youtu.be/0VFZWZfU0vw

21.01.2025 17:47 — 👍 1 🔁 1 💬 0 📌 0

19.01.2025 21:54 — 👍 0 🔁 0 💬 0 📌 0

Pipecat 0.0.53 release is out today.

31 entries in the Changelog, including:

🌓 Frame observers — for implementing loggers, debuggers, and pipeline tools
🌓 Heartbeat frames — pipeline traversal timing and warnings if system frames get blocked anywhere in the pipeline

github.com/pipecat-ai/p...

18.01.2025 23:29 — 👍 3 🔁 1 💬 0 📌 0

YouTube video by Daily Gemini Multimodal Live search tool + iOS native app

Search is a built-in tool in the Gemini Multimodal Live API.

Here's an iOS starter project that shows:

- how to to use the Gemini search built-in tool
- combining the built-in search with custom functions

Here's the code: github.com/pipecat-ai/p...

youtube.com/shorts/7jX7l...

17.01.2025 23:47 — 👍 0 🔁 0 💬 0 📌 0

YouTube video by Daily Pipecat Flows - open source Voice AI agent builder

Maslow's hierarchy of voice AI

u are here ⤵️
◻️◻️◻️◻️🟦◻️◻️◻️◻️
◻️◻️◻️🟦🟦🟦◻️◻️◻️
◻️◻️🟦🟦🟦🟦🟦◻️◻️
◻️🟦🟦🟦🟦🟦🟦🟦◻️
🟦🟦🟦🟦🟦🟦🟦🟦🟦

Network transport ▶️ Turn detection ▶️ Interruption handling ▶️ Natural voices ▶️ Tool use

www.youtube.com/watch?v=tAQW...

17.01.2025 18:46 — 👍 0 🔁 0 💬 0 📌 0

YouTube Share your videos with friends, family, and the world

"Voice AI in 2025" panel recording from the Voice AI Meetup last night.

Thank you to panelists Karan Goel, Niamh Gavin, Shrestha Basu Mallick, and Swyx.

And thank you to Chroma for hosting the meetup in their fantastic office in SF.

www.youtube.com/live/B6zTwHh...

15.01.2025 19:53 — 👍 0 🔁 0 💬 0 📌 0

YouTube video by Daily Gemini + Pipecat demo — Spotify Voice AI Assistant

Christian built a voice AI assistant to control Spotify.

Tech:
- Google Gemini 2.0
- Pipecat
- Deepgram
- Cartesia

Code is here: github.com/pipecat-ai/s...

youtu.be/q6v-3BQem3Y

15.01.2025 18:06 — 👍 0 🔁 0 💬 0 📌 0

YouTube video by Daily PipeCat iOS Client + Gemini Multimodal Live WebSocket API 🌟

Gemini Multimodal Live API + iOS + WebRTC

Nice walk-through from Paul: youtu.be/nU3K8h_pkeQ

📣 set up a voice client in your iOS app
📶 specify WebSockets or WebRTC for network transport
🏓 attach a delegate to handle lifecycle events (for example "connected", "LLM ready")

14.01.2025 17:59 — 👍 1 🔁 0 💬 0 📌 0

I listen to a fair amount of bluegrass, and alt country that overlaps with bluegrass, and I love a lot of mainstream country that overlaps with alt country!

Also, of course, the hip-hop and r&b of my youth, and hip-hop and r&b today that reminds me of the hip-hop and r&b of my youth.

12.01.2025 16:45 — 👍 2 🔁 0 💬 0 📌 0

Sunday morning listening ... and hacking.

12.01.2025 14:52 — 👍 0 🔁 0 💬 1 📌 0

Today's reminder of how early we are in the generative AI/deep learning technology transition: moved a moderately complex prompt to a different LLM and 150% of my evals broke. 150% because evals I didn't even have (but, obviously, needed) broke, too.

11.01.2025 17:48 — 👍 1 🔁 0 💬 0 📌 0

Oh, wait. I take it back.

10.01.2025 23:15 — 👍 0 🔁 0 💬 1 📌 0

They know what they’re doing over there in Cupertino (and Shenzhen).

10.01.2025 23:13 — 👍 0 🔁 0 💬 1 📌 0

pipecat/examples/simple-chatbot/examples/ios at main · pipecat-ai/pipecat Open Source framework for voice and multimodal conversational AI - pipecat-ai/pipecat

The Simple Chatbot iOS example code is here:

github.com/pipecat-ai/p...

Clone the repo -> add your API keys to the .env file -> build -> run on your phone!

10.01.2025 19:23 — 👍 0 🔁 0 💬 0 📌 0

iOS + Gemini Multimodal Live + WebRTC

Filipi Fuchter added an iOS example to the Pipecat "Simple Chatbot" repo. With the Pipecat iOS SDK, you can build apps that use Gemini Multimodal Live and Gemini Flash with WebRTC, WebSockets, and HTTP networking.

10.01.2025 19:23 — 👍 0 🔁 0 💬 1 📌 0

Kwin Kramer | Building the Future of Real-Time AI with Daily and PipeCat: Insights on Multimodal Systems and Developer Tools Deep Learning Leaders · Episode

I had a lot of fun talking to Eric Landau about the state of Voice AI at the end of 2024, what's coming in 2025, what the pain points are today if you're scaling voice AI agents in production, and — of course — the importance of data tooling and evals.

open.spotify.com/episode/5Fjj...

10.01.2025 04:56 — 👍 0 🔁 0 💬 0 📌 0

Kwindla Hultman Kramer

Latest posts by kwindla.bsky.social on Bluesky

@kwindla is following 20 prominent accounts