Screenshot of first page of paper. It is here: https://arxiv.org/pdf/2507.00828
Abstract: Topic model and document-clustering evaluations either use automated metrics that align poorly with human preferences or require expert labels that are intractable to scale. We design a scalable human evaluation protocol and a corresponding automated approximation that reflect practitioners' real-world usage of models. Annotators -- or an LLM-based proxy -- review text items assigned to a topic or cluster, infer a category for the group, then apply that category to other documents. Using this protocol, we collect extensive crowdworker annotations of outputs from a diverse set of topic models on two datasets. We then use these annotations to validate automated proxies, finding that the best LLM proxies are statistically indistinguishable from a human annotator and can therefore serve as a reasonable substitute in automated evaluations
Evaluating topic models (and document clustering methods) is hard. In fact, since our paper critiquing standard evaluation practices four years ago, there hasn't been a good replacement metric
That ends today (we hope)! Our new ACL paper introduces an LLM-based evaluation protocol 🧵
08.07.2025 12:40 — 👍 52 🔁 10 💬 3 📌 2
Congrats and have a great start at Mila! 🙂
01.07.2025 22:39 — 👍 1 🔁 0 💬 0 📌 0
🎓 I recently defended my PhD and moved from one dream team at ETH Zurich to another at DeepMind—a huge thank you to the many people who have supported me along the way!
11.06.2025 09:39 — 👍 31 🔁 0 💬 0 📌 0
@vesteinns.bsky.social
04.04.2025 19:50 — 👍 0 🔁 0 💬 0 📌 0
Our paper "A Practical Method for Generating String Counterfactuals" has been accepted to the findings of NAACL 2025! a joint work with @matan-avitan.bsky.social , @yoavgo.bsky.social and Ryan Cotterell. We propose "Intervention Lens", a technique to explain intervention in natural language. (1/6)
12.02.2025 15:19 — 👍 38 🔁 4 💬 1 📌 2
Are LLMs biased when they write about political issues?
We just released IssueBench – the largest, most realistic benchmark of its kind – to answer this question more robustly than ever before.
Long 🧵with spicy results 👇
13.02.2025 14:08 — 👍 82 🔁 28 💬 4 📌 2
Can we understand and control how language models balance context and prior knowledge? Our latest paper shows it’s all about a 1D knob! 🎛️
arxiv.org/abs/2411.07404
Co-led with
@kevdududu.bsky.social - @niklasstoehr.bsky.social , Giovanni Monea, @wendlerc.bsky.social, Robert West & Ryan Cotterell.
22.11.2024 15:49 — 👍 13 🔁 3 💬 1 📌 0
mech interp: bsky.app/starter-pack...
women in nlp: bsky.app/starter-pack...
nlp #1: bsky.app/starter-pack...
nlp #2: bsky.app/starter-pack...
ml/data/tech: bsky.app/starter-pack...
robotics & ai: bsky.app/starter-pack...
19.11.2024 19:22 — 👍 74 🔁 19 💬 7 📌 4
If you’re interested in mechanistic interpretability, I just found this starter pack and wanted to boost it (thanks for creating it @butanium.bsky.social !). Excited to have a mech interp community on bluesky 🎉
go.bsky.app/LisK3CP
19.11.2024 00:28 — 👍 36 🔁 8 💬 3 📌 2
Just launched a Political Comm/NLP/Text-as-Data Starter Pack. 🦋🤗
Join us and/or drop a message to be added!
go.bsky.app/39MWTjg #starterpack #polsci
18.11.2024 15:01 — 👍 30 🔁 10 💬 3 📌 0
Trying to bring ML/NLP/etal people from ETH Zürich together. Ping me to add you. 🙂
bsky.app/starter-pack...
18.11.2024 10:51 — 👍 26 🔁 6 💬 1 📌 0
☝️🤗
17.11.2024 16:47 — 👍 1 🔁 0 💬 1 📌 0
Niklas Stoehr - ACL Anthology
@mginn.bsky.social May I please ask you to also be added to the list? ☺️ Many thanks!
aclanthology.org/people/n/nik...
17.11.2024 10:51 — 👍 1 🔁 0 💬 0 📌 0
PhD student in Machine Learning at ETH Zurich & Max Planck Institute
PhD student in Computer Science and Natural Language Processing at ETH Zürich
Asst Prof of Information @ UMich thinking about assumptions built into AI
Postdoc at Utrecht University, previously PhD candidate at the University of Amsterdam
Multimodal NLP, Vision and Language, Cognitively Inspired NLP
https://ecekt.github.io/
The largest workshop on analysing and interpreting neural networks for NLP.
BlackboxNLP will be held at EMNLP 2025 in Suzhou, China
blackboxnlp.github.io
Welcome to ETH AI Center! We are ethz.ch/en 's central hub leading the way towards trustworthy, accessible and inclusive #artificialintelligence
ai.ethz.ch
PhD student at Vector Institute / University of Toronto. Building tools to study neural nets and find out what they know. He/him.
www.danieldjohnson.com
Associate Professor of Social Networks at ETH Zürich, Switzerland. Co-director of the ETH Social Networks Lab
@socialnetworkslab.bsky.social
ELLIS PhD Fellow @belongielab.org | @aicentre.dk | University of Copenhagen | @amsterdamnlp.bsky.social | @ellis.eu
Multi-modal ML | Alignment | Culture | Evaluations & Safety| AI & Society
Web: https://www.srishti.dev/
#NLP Postdoc at Mila - Quebec AI Institute and McGill University | Former PhD @ University of Copenhagen (CopeNLU)
🌐 karstanczak.github.io
i build esoteric tools to help find patterns in epistemic data.
Assistant Professor at UZH, group leader of the ALPILab 🌼
Working on RL, multi-agent, imitation learning, and other sequential decision-making stuff
Asst Prof. @ UCSD | PI of LeM🍋N Lab | Former Postdoc at ETH Zürich, PhD @ NYU | computational linguistics, NLProc, CogSci, pragmatics | he/him 🏳️🌈
alexwarstadt.github.io
PhD student at Cambridge University. Causality & language models. Passionate musician, professional debugger.
pietrolesci.github.io
working on secure agentic AI, CTO @ invariantlabs.ai
PhD @ SRI Lab, ETH Zurich. Also lmql.ai author.