Brooke Vlahos, Peter Clark, Doug Downey, @yoavgo.bsky.social Ashish Sabharwal, Daniel S. Weld
06.11.2025 17:01 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
Amanpreet Singh, Harshit Surana, Aryeh Tiktinsky, Rosni Vasu @guywiener.bsky.social Chloe Anastasiades, Stefan Candra, Jason Dunkelberger, Dan Emery, Rob Evans, Malachi Hamada, Regan Huff, Rodney Kinney, Matt Latzke, Jaron Lochner, Ruben Lozano-Aguilera, Cecile Nguyen, Smita Rao, Amber Tanaka...
06.11.2025 17:01 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
๐ Many thanks to my @ai2.bsky.social teammatesโMike DโArcy @nbalepur.bsky.social Dan Bareket, Bhavana Dalvi @sergeyf.bsky.social Dany Haddad, Jena D. Hwang, @peterjansen-ai.bsky.social Varsha Kishore, Bodhisattwa Majumder @arnaik19.bsky.social Sigal Rahamimov, Kyle Richardson...
06.11.2025 17:01 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
GitHub - allenai/agent-baselines
Contribute to allenai/agent-baselines development by creating an account on GitHub.
We tested 22 agent classesโmore *kinds* than other benchmarks
๐คAgentBaselines makes them reusable, incl. our SOTA science agents: github.com/allenai/agent-baselines
๐Blog: allenai.org/blog/astabench
๐Paper: arxiv.org/abs/2510.21652
๐Leaderboard: huggingface.co/spaces/allenai/asta-bench-leaderboard
06.11.2025 17:01 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
๐ ๏ธAstaBench is the first to provide reproducible (date-limited) large-scale search toolsโplus a full scientific research environment for agents.
๐Our leaderboard highlights agents that use these tools, enabling more controlled measurement of *AI*. (We measure LLM costs too.)
06.11.2025 17:01 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
AstaBench with abstract measurement icons
Agent benchmarks don't measure true *AI* advances
We built one that's hard & trustworthy:
๐ AstaBench tests agents w/ *standardized tools* on 2400+ scientific research problems
๐ SOTA results across 22 agent *classes*
๐ AgentBaselines agents suite
๐ arxiv.org/abs/2510.21652
๐งต๐
06.11.2025 17:01 โ ๐ 7 ๐ 1 ๐ฌ 1 ๐ 0
@kylelo.bsky.social your gifs are an unapproved manipulation of my human attention
09.10.2025 21:06 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0
CS PhD Student. Trying to find that dog in me at UMD. Babysitting (aligning) + Bullying (evaluating) LLMs
nbalepur.github.io
jmhessel.com
@Anthropic. Seattle bike lane enjoyer. Opinions my own.
Research Scientist at Google DeepMind, in the People + AI Research (PAIR) team.
savvaspetridis.github.io
Breakthrough AI to solve the world's biggest problems.
โบ Join us: http://allenai.org/careers
โบ Get our newsletter: https://share.hsforms.com/1uJkWs5aDRHWhiky3aHooIg3ioxm
Hummus, people, and data. Co-Founder & CTO of B12. Previously Locu, MIT CSAIL. He/him.
https://marcua.net/
Queens is the future.
Associate Professor of Computer Science, Virginia Tech
๐ HCI Professor, UW CS
๐ ๏ธ Director, makeabilitylab.cs.uw.edu
โฟ๏ธ Co-founder, projectsidewalk.org
๐ค Visiting Researcher, Google Research
Assistant Professor at BU CDS
EconCS | Theory of CS | MD+AI+DS4SG | MD4SG co-founder
Previously Columbia, UW, Oberlin. Views are mine alone.
www.kiragoldner.com
Ex NY Times, now author of Substack Paul Krugman. Nobel laureate and, according to Donald Trump, "Deranged BUM"
Economics editor at The Bulwark. MS NOW (formerly MSNBC) anchor.
Previously WaPo op-ed columnist and NYT reporter.
Econ, politics, immigration, tax, etc. + occasional theater nerdery.
Assistant Professor the Polaris Lab @ Princeton (https://www.polarislab.org/); Researching: RL, Strategic Decision-Making+Exploration; AI+Law
PhD Student @MIT | Previous @allen_ai | #NLP #HCI | www.szj.io
Associate prof at @UMich in SI and CSE working in computational social science and natural language processing. PI of the Blablablab blablablab.si.umich.edu
Assistant Professor @ the University of Washington iSchool | formerly an Innovator in Residence @ Library of Congress | essays in WIRED, Gawker, The New Republic, Longreads, Current Affairs, etc.
๐ www.bcglee.com
Research Scientist at the Allen Institute for AI (AI2), interested in information extraction, NLP for healthcare and transfer learning, PhD from CMU LTI. Website: https://www.cs.cmu.edu/~anaik/
Asst Prof @uwischool.bsky.social; #NLP #healthinformatics #accessibility #scholcomm
๐ด๐๏ธ๐โ๏ธโท๏ธ๐งถโซ๏ธโช๏ธ๐๐ธin Seattle; llwang.net; she/her
Associate professor of social computing at UW CSE, leading @socialfutureslab.bsky.social
social.cs.washington.edu
asst prof of computer science at cu boulder
nlp, cultural analytics, narratives, communities
books, bikes, games, art
https://maria-antoniak.github.io