Gabi Stanovsky's Avatar

Gabi Stanovsky

@gabistanovsky.bsky.social

Assistant professor at the Hebrew University.

299 Followers  |  224 Following  |  3 Posts  |  Joined: 11.11.2024  |  1.4081

Latest posts by gabistanovsky.bsky.social on Bluesky

Post image

๐ŸšจNew paper alert๐Ÿšจ

๐Ÿง 
Instruction-tuned LLMs show amplified cognitive biases โ€” but are these new behaviors, or pretraining ghosts resurfacing?

Excited to share our new paper, accepted to CoLM 2025๐ŸŽ‰!
See thread below ๐Ÿ‘‡
#BiasInAI #LLMs #MachineLearning #NLProc

15.07.2025 13:38 โ€” ๐Ÿ‘ 4    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Post image

Can RAG performance get * worse * with more relevant documents?๐Ÿ“„
We put the number of retrieved documents in RAG to the test!
๐Ÿ’ฅPreprint๐Ÿ’ฅ: arxiv.org/abs/2503.04388
1/3

11.03.2025 14:32 โ€” ๐Ÿ‘ 3    ๐Ÿ” 3    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Post image

๐ŸšจNew arXiv preprint!๐Ÿšจ
LLMs can hallucinate - but did you know they can do so with high certainty even when they know the correct answer? ๐Ÿคฏ
We find those hallucinations in our latest work with @itay-itzhak.bsky.social, @fbarez.bsky.social, @gabistanovsky.bsky.social and Yonatan Belinkov

19.02.2025 15:50 โ€” ๐Ÿ‘ 21    ๐Ÿ” 10    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 2
Post image

GEM is so back! Our workshop for Generation, Evaluation, and Metrics is coming to an ACL near you.

Evaluation in the world of GenAI is more important than ever, so please consider submitting your amazing work.

CfP can be found at gem-benchmark.com/workshop

12.02.2025 14:25 โ€” ๐Ÿ‘ 9    ๐Ÿ” 5    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1

A vote to stop defining what's LLMs at the start of every paper

06.02.2025 08:30 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Joint work with @rkeydar.bsky.social, Gadi Perl, @eliyahabba.bsky.social
We hope this will help spur a much needed multidisciplinary discussion about realistic regulation measures. Happy to hear your thoughts!

03.02.2025 08:04 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

There's a lot of talk about regulating AI, but do regulators know the technology well enough?
In our new paper, we survey major reg efforts & find they rely on benchmarking, which we know to be problematic. How did this happen & what can we do about it?
arxiv.org/pdf/2501.15693

03.02.2025 08:04 โ€” ๐Ÿ‘ 0    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2

@gabistanovsky is following 20 prominent accounts