"Yes, them too" is looking increasingly likely.
02.02.2025 20:14 β π 3 π 0 π¬ 1 π 0@alexpolozov.com.bsky.social
Sr. Staff Research Scientist @ Google DeepMind β’ previously Google X, Microsoft Research, UW β’ program synthesis, AI for Code and SWE β’ he/him β’ alexpolozov.com
"Yes, them too" is looking increasingly likely.
02.02.2025 20:14 β π 3 π 0 π¬ 1 π 0Forgive me if I'm not up to date on every new shiny reasoning model. My emotional bandwidth these days is spent on checking if my parents are being deported out of the country π‘
02.02.2025 20:04 β π 12 π 0 π¬ 2 π 0I feel so old.
08.01.2025 00:38 β π 4 π 0 π¬ 0 π 0We don't appreciate SWE-bench, and the way it moved lab discourse, enough.
Sure, it's only Python, SWE life is more than bugfixing, not all code can be tested...
But.
A benchmark is really a model of your target use case. And, as we know, all models are wrong but some are useful π
I'm so excited for the world where every engineering team or sole builder can focus their time on creation, not bugfixing or maintenance π
Come chat at NeurIPS! Stop at Google DeepMind booth at 10-11am to chat 1:1, or 12-1pm to meet many members of the Gemini team.
We really went all-in on exploiting the strengths of Flash π The agent samples hundreds of candidates, explores multiple tool-assisted strategies for repo understanding and editing, and validates its own work every step of the way. All dirt cheap thanks to Flash!
11.12.2024 16:16 β π 0 π 0 π¬ 1 π 0The research version of Jules powered by Gemini 2.0 Flash accomplished an impressive 51.8% on SWE-bench Verified, beating much more expensive models.
Work is underway to integrate this into Jules as we speak. Also hope to make an official SWE-bench submission once the team gets a moment to rest π
π¦ Jules, an experimental AI assistant for SWEs, works with you to accomplish tasks you would rather offload. It gets feedback on every step of the plan, understands your codebase patterns, and validates its own work.
In preview with trusted testers today. Aiming to make it broadly available soon!
Welcome to Gemini 2.0 era!
I am thrilled about β¨ Gemini 2.0 Flash as it allowed us to build the next generation of Code Agents experience: developers.googleblog.com/en/the-next-...
H100s don't grow on trees, you know. #NeurIPS2024
11.12.2024 01:32 β π 4 π 0 π¬ 1 π 0Hello Vancouver!
For the NeurIPS week, they should've replaced this β with a β¨ lol.
Oh, and I'll also be at #NeurIPS! First after the pandemic, too. Do I still remember to shake hands and make eye contact? Is the Vancouver Conv Center map burned into my memory? Let's find out.
β¨ Stoked to chat about Gemini, code/SWE agents, and whether our industry is doomed to obsolete ourselves.
*taps mic* Is this thing on?
Well, as good a time for an intro as any π
Hello world! I'm Alex. In no particular order:
β’ research scientist at Google DeepMind
β’ Gemini SWE Agents co-lead
β’ Ukrainian
β’ New Yorker
β’ movie nerd
Happy to try again on a new forum. Maybe it'll feel like 2019 again π