Jonathan Bragg's Avatar

Jonathan Bragg

@jbragg.bsky.social

Leading agents R&D at AI2. AI & HCI research scientist. https://jonathanbragg.com

258 Followers  |  24 Following  |  7 Posts  |  Joined: 05.10.2023  |  1.4126

Latest posts by jbragg.bsky.social on Bluesky


Brooke Vlahos, Peter Clark, Doug Downey, @yoavgo.bsky.social Ashish Sabharwal, Daniel S. Weld

06.11.2025 17:01 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Amanpreet Singh, Harshit Surana, Aryeh Tiktinsky, Rosni Vasu @guywiener.bsky.social Chloe Anastasiades, Stefan Candra, Jason Dunkelberger, Dan Emery, Rob Evans, Malachi Hamada, Regan Huff, Rodney Kinney, Matt Latzke, Jaron Lochner, Ruben Lozano-Aguilera, Cecile Nguyen, Smita Rao, Amber Tanaka...

06.11.2025 17:01 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐Ÿ™ Many thanks to my @ai2.bsky.social teammatesโ€”Mike Dโ€™Arcy @nbalepur.bsky.social Dan Bareket, Bhavana Dalvi @sergeyf.bsky.social Dany Haddad, Jena D. Hwang, @peterjansen-ai.bsky.social Varsha Kishore, Bodhisattwa Majumder @arnaik19.bsky.social Sigal Rahamimov, Kyle Richardson...

06.11.2025 17:01 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
GitHub - allenai/agent-baselines Contribute to allenai/agent-baselines development by creating an account on GitHub.

We tested 22 agent classesโ€”more *kinds* than other benchmarks

๐Ÿค–AgentBaselines makes them reusable, incl. our SOTA science agents: github.com/allenai/agent-baselines

๐Ÿ“šBlog: allenai.org/blog/astabench
๐Ÿ“„Paper: arxiv.org/abs/2510.21652
๐Ÿ“ŠLeaderboard: huggingface.co/spaces/allenai/asta-bench-leaderboard

06.11.2025 17:01 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐Ÿ› ๏ธAstaBench is the first to provide reproducible (date-limited) large-scale search toolsโ€”plus a full scientific research environment for agents.

๐Ÿ“ŠOur leaderboard highlights agents that use these tools, enabling more controlled measurement of *AI*. (We measure LLM costs too.)

06.11.2025 17:01 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
AstaBench with abstract measurement icons

AstaBench with abstract measurement icons

Agent benchmarks don't measure true *AI* advances

We built one that's hard & trustworthy:
๐Ÿ‘‰ AstaBench tests agents w/ *standardized tools* on 2400+ scientific research problems
๐Ÿ‘‰ SOTA results across 22 agent *classes*
๐Ÿ‘‰ AgentBaselines agents suite

๐Ÿ†• arxiv.org/abs/2510.21652

๐Ÿงต๐Ÿ‘‡

06.11.2025 17:01 โ€” ๐Ÿ‘ 7    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@kylelo.bsky.social your gifs are an unapproved manipulation of my human attention

09.10.2025 21:06 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@jbragg is following 20 prominent accounts