Aman's Avatar

Aman

@amangotchu.bsky.social

Founder @ firebender.com

31 Followers  |  191 Following  |  9 Posts  |  Joined: 15.01.2025
Posts Following

Posts by Aman (@amangotchu.bsky.social)

Kotlin-bench V2: Agentic LLM Evaluation Evaluating AI models on real-world Kotlin & Android tasks using Firebender's agentic harness with IDE tool integration.

View all the results on our blog and github
firebender.com/blog/kotlin-...
github.com/firebenders/...

08.01.2026 21:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Introducing Kotlin-bench V2, the first benchmark that evaluates agentic coding models for Android and Kotlin. πŸš€

Claude Opus 4.5 is the most intelligent model for Android.

Gemini 3 Flash delivers the best cost-to-intelligence ratio. Great intelligence at ~10x cheaper cost.

08.01.2026 21:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Firebender - Most powerful AI assistant in Android Studio Write code 10x faster with Firebender, the most powerful AI assistant for Android Studio.

View the full leaderboard here: firebender.com/leaderboard

Try out Claude 4 Sonnet in Firebender: plugins.jetbrains.com/plugin/25224...

23.05.2025 20:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Claude 4 Sonnet is officially the best AI model for Android and Kotlin development πŸš€

Claude 4 Sonnet solved 26% of Kotlin-bench tasks, outperforming OpenAI's o3.

Claude 4 Sonnet & Opus are available in Firebender today for all users of JetBrains IDEs. Try them out and let us know what you think!

23.05.2025 20:48 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

o4-mini and o3 is now in Firebender 0.9.20 on Android Studio/Intellij

agent benchmarks coming soon for kotlin-bench

16.04.2025 18:01 β€” πŸ‘ 4    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Post image

Updated our Kotlin-bench leaderboard with results for Grok 3 and GPT-4.1!

TL;DR: Grok 3 is a very capable coding model for Android & Kotlin development. GPT-4.1 shows improvement but still trails behind other major competitors.

See the full leaderboard here:
firebender.com/leaderboard

15.04.2025 17:21 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

I just released Kotlin-bench, the first-ever benchmark that evaluates LLMs against real-world Kotlin & Android Github issues.

Gemini 2.5 topped the leaderboard solving 14% of issues, with Claude 3.7 thinking solving 12% in 2nd place.

Code, datasets, and results here: firebender.com/blog/kotlin-...

04.04.2025 01:56 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1

Future of Android Development will be multi-agents that code, fix bugs, implement UI changes from figma all autonomously. Firebender 0.9.6 proves this, and here’s why:

03.03.2025 23:23 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

First impressions after about an hour of using it.

1. Absolutely love how it fixes it's own errors. πŸ˜‚
2. Autocomplete feels much faster than Copilot.
3. Very eager to make changes outside of the scope of the file I'm working on. Might be user error and might be fixable with rules

03.03.2025 10:00 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

DMing

03.03.2025 01:26 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Try it out and let us know what you think! We just made it IntelliJ-compatible in addition to our standard Android Studio offering.

Very curious how it helps with your KMP/CMP work

02.03.2025 18:39 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

Yep!!

You can specify guidelines and rules for how you want the AI to write tests, what architecture pattern you expect, and more.

More info here
docs.firebender.com/context/rules

02.03.2025 18:38 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0