Andrew Wang's Avatar

Andrew Wang

@andrewwnlp.bsky.social

PhD student @jhuclsp.bsky.social

366 Followers  |  40 Following  |  4 Posts  |  Joined: 18.11.2024  |  1.4523

Latest posts by andrewwnlp.bsky.social on Bluesky

Thanks to my collaborators Sophia Hager, Adi Asija, Nick Andrews, and @danielkhashabi.bsky.social at @jhuclsp.bsky.social !

Arxiv: arxiv.org/abs/2508.11027
Code: github.com/JHU-CLSP/hell-or-high-water
(Data coming soon!)

19.09.2025 14:06 — 👍 0    🔁 0    💬 0    📌 0
Post image

More tools = worse at handling tool failures

When tool schemas are provided in-context, we find that performance gaps between adversarial and non-adversarial settings increases with the number of schemas.

19.09.2025 14:05 — 👍 0    🔁 0    💬 1    📌 0
Post image

LLM agents do not handle tool failures well

With RAG on tool schemas, we observe a substantial performance gap between adversarial and non-adversarial settings.

19.09.2025 14:04 — 👍 0    🔁 0    💬 1    📌 0
Post image

Tools break in the real world all the time, but not much attention has been given to how well LLMs deal with tool failures.

We introduce HOHW, a tool-use benchmark where problems remain solvable even when tools break adversarially.

19.09.2025 14:04 — 👍 1    🔁 1    💬 1    📌 0

@andrewwnlp is following 19 prominent accounts