🇺🇦 Alex Polozov @alexpolozov.com

"Yes, them too" is looking increasingly likely.

02.02.2025 20:14 — 👍 3 🔁 0 💬 1 📌 0

Forgive me if I'm not up to date on every new shiny reasoning model. My emotional bandwidth these days is spent on checking if my parents are being deported out of the country 😡

02.02.2025 20:04 — 👍 12 🔁 0 💬 2 📌 0

I feel so old.

08.01.2025 00:38 — 👍 4 🔁 0 💬 0 📌 0

We don't appreciate SWE-bench, and the way it moved lab discourse, enough.

Sure, it's only Python, SWE life is more than bugfixing, not all code can be tested...

But.

A benchmark is really a model of your target use case. And, as we know, all models are wrong but some are useful 😉

19.12.2024 21:38 — 👍 7 🔁 0 💬 0 📌 0

I'm so excited for the world where every engineering team or sole builder can focus their time on creation, not bugfixing or maintenance 😍

Come chat at NeurIPS! Stop at Google DeepMind booth at 10-11am to chat 1:1, or 12-1pm to meet many members of the Gemini team.

11.12.2024 16:16 — 👍 2 🔁 0 💬 0 📌 0

We really went all-in on exploiting the strengths of Flash 🚀 The agent samples hundreds of candidates, explores multiple tool-assisted strategies for repo understanding and editing, and validates its own work every step of the way. All dirt cheap thanks to Flash!

11.12.2024 16:16 — 👍 0 🔁 0 💬 1 📌 0

The research version of Jules powered by Gemini 2.0 Flash accomplished an impressive 51.8% on SWE-bench Verified, beating much more expensive models.

Work is underway to integrate this into Jules as we speak. Also hope to make an official SWE-bench submission once the team gets a moment to rest 😅

11.12.2024 16:16 — 👍 0 🔁 0 💬 1 📌 0

🦑 Jules, an experimental AI assistant for SWEs, works with you to accomplish tasks you would rather offload. It gets feedback on every step of the plan, understands your codebase patterns, and validates its own work.

In preview with trusted testers today. Aiming to make it broadly available soon!

11.12.2024 16:16 — 👍 0 🔁 0 💬 1 📌 0

The next chapter of the Gemini era for developers Explore the latest with the release of Gemini 2.0 Flash and new coding agents, now available for testing in Google AI Studio.

Welcome to Gemini 2.0 era!

I am thrilled about ✨ Gemini 2.0 Flash as it allowed us to build the next generation of Code Agents experience: developers.googleblog.com/en/the-next-...

11.12.2024 16:16 — 👍 9 🔁 4 💬 1 📌 0

H100s don't grow on trees, you know. #NeurIPS2024

11.12.2024 01:32 — 👍 4 🔁 0 💬 1 📌 0

Hello Vancouver!
For the NeurIPS week, they should've replaced this ⭐ with a ✨ lol.

11.12.2024 00:36 — 👍 1 🔁 0 💬 0 📌 0

Oh, and I'll also be at #NeurIPS! First after the pandemic, too. Do I still remember to shake hands and make eye contact? Is the Vancouver Conv Center map burned into my memory? Let's find out.

✨ Stoked to chat about Gemini, code/SWE agents, and whether our industry is doomed to obsolete ourselves.

10.12.2024 14:34 — 👍 4 🔁 0 💬 0 📌 0

*taps mic* Is this thing on?
Well, as good a time for an intro as any 😅

Hello world! I'm Alex. In no particular order:
• research scientist at Google DeepMind
• Gemini SWE Agents co-lead
• Ukrainian
• New Yorker
• movie nerd

Happy to try again on a new forum. Maybe it'll feel like 2019 again 😊

10.12.2024 14:32 — 👍 7 🔁 0 💬 1 📌 0

🇺🇦 Alex Polozov

Latest posts by alexpolozov.com on Bluesky

@alexpolozov.com is following 19 prominent accounts