Massimo Caccia's Avatar

Massimo Caccia

@masscaccia.bsky.social

Research Scientist at ServiceNow Gradient-descent enthusiast building LLM agents. Formerly Mila, Deepmind, Amazon, ElemenAI, Spotify

151 Followers  |  477 Following  |  5 Posts  |  Joined: 20.11.2024
Posts Following

Posts by Massimo Caccia (@masscaccia.bsky.social)

Pretty cool people are being added to the LLM Agent & LLM Reasoning group. Thanks @lisaalaz.bsky.social for suggesting @jhamrick.bsky.social @gabepsilon.bsky.social and others.

Feel free to mention yourself and others. :)

go.bsky.app/LUrLWXe

#LLMAgents #LLMReasoning

23.11.2024 19:36 β€” πŸ‘ 10    πŸ” 1    πŸ’¬ 8    πŸ“Œ 0
x.com

Nice starter pack! Would love to be added. I develop LLM agent benchmarks (workarena, browsergym) and tools to design LLM agents.

See x.com/alex_lacoste...

Or x.com/alex_lacoste...

16.12.2024 17:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

We’re really excited to release this large collaborative work for unifying web agent benchmarks under the same roof.

In this TMLR paper, we dive in-depth into #BrowserGym and #AgentLab. We also present some unexpected performances from Claude 3.5-Sonnet

12.12.2024 17:55 β€” πŸ‘ 20    πŸ” 11    πŸ’¬ 1    πŸ“Œ 2

I finally created my first starter pack for #buildinpublic #indiehacker and #founder who are building in the AI Agent Space

If you are building AI agents stuff, I'd be happy to include you in 😁

πŸ’‘ Share what you're building in the comment
🧑 Like and repost for visibility

go.bsky.app/JPx5hfV

30.11.2024 13:24 β€” πŸ‘ 15    πŸ” 2    πŸ’¬ 9    πŸ“Œ 0
x.com

Would love to be added :)
I build web agents, in particular open-source packages for developing and evaluating web agents.
e.g. x.com/alex_lacoste...

09.12.2024 00:41 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I've created a starter pack of researchers working on digital agents (focusing on web, mobile and OS agents).

I am missing a lot, and many are not on bsky yet, so if I missed you or someone you know, please send me a DM with the link to a relevant paper and I will update the starter pack!

05.12.2024 19:21 β€” πŸ‘ 10    πŸ” 2    πŸ’¬ 1    πŸ“Œ 1

I finally created my first starter pack for #buildinpublic #indiehacker and #founder who are building tools related to AI and LLM.

If you are building AI stuff, I'd be happy to include you in 😁

πŸ’‘ Share what you're building in the comment
🧑 Like and repost for visibility

go.bsky.app/UcofkF4

25.11.2024 22:23 β€” πŸ‘ 43    πŸ” 19    πŸ’¬ 33    πŸ“Œ 1
x.com

Would love to be added :)
I build web agents, in particular open-source packages for developing and evaluating web agents.
e.g. x.com/alex_lacoste...

08.12.2024 00:30 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Would love to be added :)
I build web agents, in particular open-source packages for developing and evaluating web agents.

08.12.2024 00:29 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Hello! 1 year MILA alumni here βœ‹ would love to be added

08.12.2024 00:26 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
AgentLab diagram.

The image describes AgentLab, a framework for efficient parallel experiments with agents. It highlights:

Core Agent Features:

Dynamic Prompting and a Unified LLM API for interacting with large language models.
BrowserGym Platform:

A tool for testing agents on benchmarks like WebArena, WorkArena, MiniWoB, and others.
Key Features:

Reproducibility, a Unified Leaderboard, an analysis tool called Xray, and a Dataset for sharing agent traces.
Blue elements represent AgentLab components.

AgentLab diagram. The image describes AgentLab, a framework for efficient parallel experiments with agents. It highlights: Core Agent Features: Dynamic Prompting and a Unified LLM API for interacting with large language models. BrowserGym Platform: A tool for testing agents on benchmarks like WebArena, WorkArena, MiniWoB, and others. Key Features: Reproducibility, a Unified Leaderboard, an analysis tool called Xray, and a Dataset for sharing agent traces. Blue elements represent AgentLab components.

🧡-1
We are thrilled to release #AgentLab, a new open-source package for developing and evaluating web agents. This builds on the new #BrowserGym package which supports 10 different benchmarks, including #WebArena.

03.12.2024 21:02 β€” πŸ‘ 18    πŸ” 15    πŸ’¬ 2    πŸ“Œ 0