๐จNew paper alert๐จ
๐ง
Instruction-tuned LLMs show amplified cognitive biases โ but are these new behaviors, or pretraining ghosts resurfacing?
Excited to share our new paper, accepted to CoLM 2025๐!
See thread below ๐
#BiasInAI #LLMs #MachineLearning #NLProc
15.07.2025 13:38 โ ๐ 4 ๐ 1 ๐ฌ 1 ๐ 1
Can RAG performance get * worse * with more relevant documents?๐
We put the number of retrieved documents in RAG to the test!
๐ฅPreprint๐ฅ: arxiv.org/abs/2503.04388
1/3
11.03.2025 14:32 โ ๐ 3 ๐ 3 ๐ฌ 2 ๐ 0
๐จNew arXiv preprint!๐จ
LLMs can hallucinate - but did you know they can do so with high certainty even when they know the correct answer? ๐คฏ
We find those hallucinations in our latest work with @itay-itzhak.bsky.social, @fbarez.bsky.social, @gabistanovsky.bsky.social and Yonatan Belinkov
19.02.2025 15:50 โ ๐ 21 ๐ 10 ๐ฌ 3 ๐ 2
GEM is so back! Our workshop for Generation, Evaluation, and Metrics is coming to an ACL near you.
Evaluation in the world of GenAI is more important than ever, so please consider submitting your amazing work.
CfP can be found at gem-benchmark.com/workshop
12.02.2025 14:25 โ ๐ 9 ๐ 5 ๐ฌ 0 ๐ 1
A vote to stop defining what's LLMs at the start of every paper
06.02.2025 08:30 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
Joint work with @rkeydar.bsky.social, Gadi Perl, @eliyahabba.bsky.social
We hope this will help spur a much needed multidisciplinary discussion about realistic regulation measures. Happy to hear your thoughts!
03.02.2025 08:04 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0
There's a lot of talk about regulating AI, but do regulators know the technology well enough?
In our new paper, we survey major reg efforts & find they rely on benchmarking, which we know to be problematic. How did this happen & what can we do about it?
arxiv.org/pdf/2501.15693
03.02.2025 08:04 โ ๐ 0 ๐ 2 ๐ฌ 1 ๐ 2
I (try to) do NLP research. Antipodean abroad.
currently doing PhD @uwcse,
prev @usyd @ai2
๐ฆ๐บ๐จ๐ฆ๐ฌ๐ง
ivison.id.au
NLP research - PhD student at UW
Associate professor of social computing at UW CSE, leading @socialfutureslab.bsky.social
social.cs.washington.edu
Researcher & entrepreneur | Building a collective sensemaking layer for research @cosmik.network | Stigmergic cognition | https://ronentk.me/ | Prev- Open Science Fellow @asterainstitute.bsky.social
UC Berkeley/BAIR, AI2 || Prev: UWNLP, Meta/FAIR || sewonmin.com
https://ananyahjha93.github.io
Second year PhD at @uwcse.bsky.social with @hanna-nlp.bsky.social and @lukezettlemoyer.bsky.social
๐ค Building AI agents & interactive environments: ๐ AppWorld (https://appworld.dev) #NLProc PhD @stonybrooku. Past intern Allen AI & visitor CILVR at NYU.
๐ฆ https://x.com/harsh3vedi
๐ https://harshtrivedi.me/
Training big models at @ai2.bsky.social.
https://cs.stanford.edu/~rewang
AI & Education โจ On academic+industry job market. CS PhD @stanfordnlp
prev: MIT 2020, Google Brain, Google Brain Robotics,
@allen_ai
๐ {Creativity, AI, People} | HCI researcher & software eng | @allenai
Ph.D. student at University of Washington CSE. NLP. IBM Ph.D. fellow (2022-2023). Meta student researcher (2023-) . โ๏ธ ๐ ๐โโ๏ธ๐งโโ๏ธ๐ณ
Social Reasoning/Cognition + AI, Postdoc at NVIDIA | Previously @ai2.bsky.social | PhD from Seoul Natl Univ.
http://hyunwookim.com
Working on #NLProc for social good.
Currently at LTI at CMU. ๐ณโ๐
Research Scientist @allen_ai & Affiliate Assistant Prof @UW; Researching on LLM alignment, eval, synthetic data, reasoning, agent. Ex: Google, Meta FAIR;
CS Prof at UC Irvine, CTO/Cofounder at Envive AI
Work on evaluation and robustness of LLMs
่ฎธๆนๅญ๐ฉ๐ปโ๐ปphd student @ nyu, interested in natural language processing
๐: carriex.github.io
Senior research scientist @ GoogleDeepMind | benchmarking and evaluation | prev @upenn.edu @ai2.bsky.social, and @ltiatcmu.bsky.socialโฌ
chaitanyamalaviya.github.io