πŸ‡ΊπŸ‡¦ Alex Polozov's Avatar

πŸ‡ΊπŸ‡¦ Alex Polozov

@alexpolozov.com.bsky.social

Sr. Staff Research Scientist @ Google DeepMind β€’ previously Google X, Microsoft Research, UW β€’ program synthesis, AI for Code and SWE β€’ he/him β€’ alexpolozov.com

1,311 Followers  |  317 Following  |  13 Posts  |  Joined: 11.05.2023  |  1.5125

Latest posts by alexpolozov.com on Bluesky

"Yes, them too" is looking increasingly likely.

02.02.2025 20:14 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Forgive me if I'm not up to date on every new shiny reasoning model. My emotional bandwidth these days is spent on checking if my parents are being deported out of the country 😑

02.02.2025 20:04 β€” πŸ‘ 12    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

I feel so old.

08.01.2025 00:38 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We don't appreciate SWE-bench, and the way it moved lab discourse, enough.

Sure, it's only Python, SWE life is more than bugfixing, not all code can be tested...

But.

A benchmark is really a model of your target use case. And, as we know, all models are wrong but some are useful πŸ˜‰

19.12.2024 21:38 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I'm so excited for the world where every engineering team or sole builder can focus their time on creation, not bugfixing or maintenance 😍

Come chat at NeurIPS! Stop at Google DeepMind booth at 10-11am to chat 1:1, or 12-1pm to meet many members of the Gemini team.

11.12.2024 16:16 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We really went all-in on exploiting the strengths of Flash πŸš€ The agent samples hundreds of candidates, explores multiple tool-assisted strategies for repo understanding and editing, and validates its own work every step of the way. All dirt cheap thanks to Flash!

11.12.2024 16:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The research version of Jules powered by Gemini 2.0 Flash accomplished an impressive 51.8% on SWE-bench Verified, beating much more expensive models.

Work is underway to integrate this into Jules as we speak. Also hope to make an official SWE-bench submission once the team gets a moment to rest πŸ˜…

11.12.2024 16:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸ¦‘ Jules, an experimental AI assistant for SWEs, works with you to accomplish tasks you would rather offload. It gets feedback on every step of the plan, understands your codebase patterns, and validates its own work.

In preview with trusted testers today. Aiming to make it broadly available soon!

11.12.2024 16:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
The next chapter of the Gemini era for developers Explore the latest with the release of Gemini 2.0 Flash and new coding agents, now available for testing in Google AI Studio.

Welcome to Gemini 2.0 era!

I am thrilled about ✨ Gemini 2.0 Flash as it allowed us to build the next generation of Code Agents experience: developers.googleblog.com/en/the-next-...

11.12.2024 16:16 β€” πŸ‘ 9    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0
Post image

H100s don't grow on trees, you know. #NeurIPS2024

11.12.2024 01:32 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Hello Vancouver!
For the NeurIPS week, they should've replaced this ⭐ with a ✨ lol.

11.12.2024 00:36 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Oh, and I'll also be at #NeurIPS! First after the pandemic, too. Do I still remember to shake hands and make eye contact? Is the Vancouver Conv Center map burned into my memory? Let's find out.

✨ Stoked to chat about Gemini, code/SWE agents, and whether our industry is doomed to obsolete ourselves.

10.12.2024 14:34 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

*taps mic* Is this thing on?
Well, as good a time for an intro as any πŸ˜…

Hello world! I'm Alex. In no particular order:
β€’ research scientist at Google DeepMind
β€’ Gemini SWE Agents co-lead
β€’ Ukrainian
β€’ New Yorker
β€’ movie nerd

Happy to try again on a new forum. Maybe it'll feel like 2019 again 😊

10.12.2024 14:32 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@alexpolozov.com is following 19 prominent accounts