Gabi Stanovsky @gabistanovsky

Latest posts by gabistanovsky.bsky.social on Bluesky

The 5th Generation, Evaluation, and Metrics (GEM) Workshop will be at #ACL2026!

Call for papers is out. Topics include:
🐟 LMs as evaluators
🐠 Living benchmarks
🍣 Eval with humans
and more

New for 2026: Opinion & Statement Papers!

Full CFP: gem-workshop.com/call-for-pap...

27.01.2026 19:17 — 👍 21 🔁 7 💬 0 📌 1

🚨New paper alert🚨

🧠
Instruction-tuned LLMs show amplified cognitive biases — but are these new behaviors, or pretraining ghosts resurfacing?

Excited to share our new paper, accepted to CoLM 2025🎉!
See thread below 👇
#BiasInAI #LLMs #MachineLearning #NLProc

15.07.2025 13:38 — 👍 4 🔁 1 💬 1 📌 1

Can RAG performance get * worse * with more relevant documents?📄
We put the number of retrieved documents in RAG to the test!
💥Preprint💥: arxiv.org/abs/2503.04388
1/3

11.03.2025 14:32 — 👍 3 🔁 3 💬 2 📌 0

🚨New arXiv preprint!🚨
LLMs can hallucinate - but did you know they can do so with high certainty even when they know the correct answer? 🤯
We find those hallucinations in our latest work with @itay-itzhak.bsky.social, @fbarez.bsky.social, @gabistanovsky.bsky.social and Yonatan Belinkov

19.02.2025 15:50 — 👍 21 🔁 10 💬 3 📌 2

GEM is so back! Our workshop for Generation, Evaluation, and Metrics is coming to an ACL near you.

Evaluation in the world of GenAI is more important than ever, so please consider submitting your amazing work.

CfP can be found at gem-benchmark.com/workshop

12.02.2025 14:25 — 👍 9 🔁 5 💬 0 📌 1

A vote to stop defining what's LLMs at the start of every paper

06.02.2025 08:30 — 👍 1 🔁 0 💬 0 📌 0

Joint work with @rkeydar.bsky.social, Gadi Perl, @eliyahabba.bsky.social
We hope this will help spur a much needed multidisciplinary discussion about realistic regulation measures. Happy to hear your thoughts!

03.02.2025 08:04 — 👍 2 🔁 0 💬 0 📌 0

There's a lot of talk about regulating AI, but do regulators know the technology well enough?
In our new paper, we survey major reg efforts & find they rely on benchmarking, which we know to be problematic. How did this happen & what can we do about it?
arxiv.org/pdf/2501.15693

03.02.2025 08:04 — 👍 0 🔁 2 💬 1 📌 2

@gabistanovsky is following 20 prominent accounts

@shaharl6000

@evavnmssnhv

Hamish Ivison
@hamishivi

I (try to) do NLP research. Antipodean abroad. currently doing PhD @uwcse, prev @usyd @ai2 🇦🇺🇨🇦🇬🇧 ivison.id.au

Joseph Chang
@josephc

https://josephcc.com

Ben Newman
@benn9

NLP research - PhD student at UW

Amy Zhang
@axz

Associate professor of social computing at UW CSE, leading @socialfutureslab.bsky.social social.cs.washington.edu

Ronen Tamari
@ronentk.me

Sewon Min
@sewonm

UC Berkeley/BAIR, AI2 || Prev: UWNLP, Meta/FAIR || sewonmin.com

Ananya Harsh Jha
@ananyahjha93

https://ananyahjha93.github.io Second year PhD at @uwcse.bsky.social with @hanna-nlp.bsky.social and @lukezettlemoyer.bsky.social

Harsh Trivedi
@harsh3vedi

🤖 Building AI agents & interactive environments: 🌍 AppWorld (https://appworld.dev) #NLProc PhD @stonybrooku. Past intern Allen AI & visitor CILVR at NYU. 🐦 https://x.com/harsh3vedi 🌐 https://harshtrivedi.me/

Mechanical Dirk
@mechanicaldirk

Training big models at @ai2.bsky.social.

Rose E Wang
@rewang

https://cs.stanford.edu/~rewang AI & Education ✨ On academic+industry job market. CS PhD @stanfordnlp prev: MIT 2020, Google Brain, Google Brain Robotics, @allen_ai

Pao Siangliulue
@paopow

🎒 {Creativity, AI, People} | HCI researcher & software eng | @allenai

Akari Asai
@akariasai

Ph.D. student at University of Washington CSE. NLP. IBM Ph.D. fellow (2022-2023). Meta student researcher (2023-) . ☕️ 🐕 🏃‍♀️🧗‍♀️🍳

Hyunwoo Kim
@hyunwoo-kim

Social Reasoning/Cognition + AI, Postdoc at NVIDIA | Previously @ai2.bsky.social | PhD from Seoul Natl Univ. http://hyunwookim.com

Maarten Sap
@maartensap

Working on #NLProc for social good. Currently at LTI at CMU. 🏳‍🌈

Bill Yuchen Lin
@billyuchenlin

Research Scientist @allen_ai & Affiliate Assistant Prof @UW; Researching on LLM alignment, eval, synthetic data, reasoning, agent. Ex: Google, Meta FAIR;

Sameer Singh
@sameer-singh

CS Prof at UC Irvine, CTO/Cofounder at Envive AI Work on evaluation and robustness of LLMs

Fangyuan Xu
@fangyuanxu

许方园👩🏻‍💻phd student @ nyu, interested in natural language processing 🌍: carriex.github.io

Chaitanya Malaviya
@cmalaviya

Senior research scientist @ GoogleDeepMind | benchmarking and evaluation | prev @upenn.edu @ai2.bsky.social, and @ltiatcmu.bsky.social‬ chaitanyamalaviya.github.io