There's a reviewer at ICLR who apparently always writes *exactly* 40 weaknesses and comments no matter what paper he's reviewing.
Exhibit A: openreview.net/forum?id=8qk...
Exhibit B: openreview.net/forum?id=GlX...
Exhibit C: openreview.net/forum?id=kDh...
15.11.2025 14:42 โ ๐ 8 ๐ 2 ๐ฌ 1 ๐ 2
within the next 3-4 days, so sadly that doesn't work
11.11.2025 11:03 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
*Urgently* looking for emergency reviewers for the ARR October Interpretability track ๐๐
ReSkies much appreciated
11.11.2025 10:29 โ ๐ 1 ๐ 9 ๐ฌ 1 ๐ 0
๐๐'๐ง๐ ๐๐๐ง๐๐ฃ๐ ๐ฃ๐๐ฌ ๐๐๐๐ช๐ก๐ฉ๐ฎ ๐ข๐๐ข๐๐๐ง๐จ!
KSoC: utah.peopleadmin.com/postings/190... (AI broadly)
Education + AI:
- utah.peopleadmin.com/postings/189...
- utah.peopleadmin.com/postings/190...
Computer Vision:
- utah.peopleadmin.com/postings/183...
07.11.2025 23:35 โ ๐ 16 ๐ 10 ๐ฌ 1 ๐ 0
hvala Josipa ๐๐ฅณ
07.11.2025 10:58 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
Thank you Gabriele :)
07.11.2025 10:52 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
Very honored to be one out of seven outstanding papers at this years' EMNLP :)
Huge thanks to my amazing collaborators @fatemehc.bsky.social @anamarasovic.bsky.social @boknilev.bsky.social , this would not have been possible without them!
07.11.2025 08:58 โ ๐ 23 ๐ 6 ๐ฌ 2 ๐ 2
Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement
Gabriele Sarti, Vilรฉm Zouhar, Malvina Nissim, Arianna Bisazza. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.
Presenting today our work "Unsupervised Word-level Quality Estimation Through the Lens of Annotator (Dis)agreement" at the Machine Translation morning session (Room A301, 11:45 China time). See you there! ๐ค
Paper: aclanthology.org/2025.emnlp-m...
Slides/video/poster: underline.io/events/502/s...
06.11.2025 01:19 โ ๐ 5 ๐ 2 ๐ฌ 0 ๐ 0
Hereโs a custom feed for #EMNLP2025. Click the pin to save it to your home screen!
02.11.2025 15:15 โ ๐ 11 ๐ 4 ๐ฌ 0 ๐ 0
literally any book by sally rooney
(jk I know you don't like her)
24.10.2025 14:20 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
โฐ One week left to apply for the two PhD Fellowships in Trustworthy NLP and Explainable NLU! The two positions have a starting date in spring 2026. Check the original post for more details๐
24.10.2025 08:30 โ ๐ 4 ๐ 1 ๐ฌ 0 ๐ 0
The only benefit of them being humanoid is training data I guess?
Companies have a bunch of videos of e.g. factory workers doing repetetive tasks, so you have more signal on intermediate steps of some actions to train the robots behavior
23.10.2025 14:17 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
๐ฃTomorrow at #COLM2025:
1๏ธโฃ Purbid's ๐ฉ๐จ๐ฌ๐ญ๐๐ซ at ๐๐จ๐๐๐ (๐๐:๐๐๐๐ฆ-๐:๐๐๐ฉ๐ฆ) on catching redundant preference pairs & how pruning them hurts accuracy; www.anamarasovic.com/publications...
2๏ธโฃ My ๐ญ๐๐ฅ๐ค at ๐๐๐๐-๐๐๐๐ฌ๐จ๐ง-๐๐ฅ๐๐ง (๐๐๐ฉ๐ฆ) on measuring CoT faithfulness by looking at internals, not just behaviorally
1/3
09.10.2025 16:54 โ ๐ 14 ๐ 3 ๐ฌ 1 ๐ 1
If you're at COLM, check out various works by Ana and her group!
09.10.2025 16:58 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0
Huge thanks to @adisimhi.bsky.social for leading the work & Jonathan Herzig, @itay-itzhak.bsky.social, Idan Szpektor, @boknilev.bsky.social
๐ ManagerBench:
๐ - arxiv.org/pdf/2510.00857
๐ฉโ๐ป โ github.com/technion-cs-...
๐ โ technion-cs-nlp.github.io/ManagerBench...
๐ - huggingface.co/datasets/Adi...
08.10.2025 15:14 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0
Here's the twist: LLMsโ harm assessments actually align well with human judgments ๐ฏ
The problem? Flawed prioritization!
08.10.2025 15:14 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
The results? Frontier LLMs struggle badly with this trade-off:
Many consistently choose harmful options to achieve operational goals
Others become overly cautiousโavoiding harm but becoming ineffective
The sweet spot of safe AND pragmatic? Largely missing!
08.10.2025 15:14 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
ManagerBench evaluates LLMs on realistic managerial scenarios validated by humans. Each scenario forces a choice:
โ A pragmatic but harmful action that achieves the goal
โ
A safe action with worse operational performance
โcontrol scenarios with only inanimate objects at risk๐
08.10.2025 15:14 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
Many works investigate the relationship between LLM, goals, and safety.
We create a realistic management scenario where LLMs have explicit motivations to choose harmful options, while always having a harmless option.
08.10.2025 15:14 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
๐คWhat happens when LLM agents choose between achieving their goals and avoiding harm to humans in realistic management scenarios? Are LLMs pragmatic or prefer to avoid human harm?
๐ New paper out: ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs๐๐งต
08.10.2025 15:14 โ ๐ 8 ๐ 2 ๐ฌ 1 ๐ 2
I won't be at COLM, so come see Yonatan talk about our work on estimating CoT faithfulness using machine unlearning!
Check out the thread for the (many) other interesting works from his group ๐
07.10.2025 13:47 โ ๐ 3 ๐ 1 ๐ฌ 0 ๐ 0
Hereโs a #COLM2025 feed!
Pin it ๐ to follow along with the conference this week!
06.10.2025 20:26 โ ๐ 26 ๐ 17 ๐ฌ 2 ๐ 1
Josip Juki\'c, Martin Tutek, Jan \v{S}najder
Context Parametrization with Compositional Adapters
https://arxiv.org/abs/2509.22158
29.09.2025 07:47 โ ๐ 1 ๐ 1 ๐ฌ 0 ๐ 0
Adi Simhi, Jonathan Herzig, Martin Tutek, Itay Itzhak, Idan Szpektor, Yonatan Belinkov
ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs
https://arxiv.org/abs/2510.00857
02.10.2025 06:59 โ ๐ 1 ๐ 1 ๐ฌ 0 ๐ 0
Opportunities to join my group in fall 2026:
* PhD applications direct or via ELLIS @ellis.eu (ellis.eu/news/ellis-p...)
* Post-doc applications direct or via Azrieli (azrielifoundation.org/fellows/inte...) or Zuckerman (zuckermanstem.org/ourprograms/...)
01.10.2025 13:44 โ ๐ 3 ๐ 1 ๐ฌ 0 ๐ 0
What's the right unit of analysis for understanding LLM internals? We explore in our mech interp survey (a major update from our 2024 ms).
Weโve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics!
01.10.2025 14:03 โ ๐ 40 ๐ 14 ๐ฌ 2 ๐ 2
Hints of an Openreview x Overleaf stealth collab, sharing data of future works? ๐ค
30.09.2025 19:19 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
PhD candidate at University of Mannheim | LLMs and synthetic data
Website: https://maxbenkre.github.io/
PhD student @ Huji | Student Researcher @Google
Interested in the overlap of mech interp & cog comp neuroscience
https://daria-lioubashevski.github.io/
linguist turned NLP researcher, PhD student @cambridgenlp
Explainable AI research from the machine learning group of Prof. Klaus-Robert Mรผller at @tuberlin.bsky.social & @bifold.berlin
asst prof @Stanford linguistics | director of social interaction lab ๐ฑ | bluskies about computational cognitive science & language
Professor @milanlp.bsky.social for #NLProc, compsocsci, #ML
Also at http://dirkhovy.com/
Phd student @ University of Mannheim | Social NLP | she/her
Associate Professor at GroNLP ( @gronlp.bsky.socialโฌ ) #NLP | Multilingualism | Interpretability | Language Learning in Humans vs NeuralNets | Mum^2
Head of the InClow research group: https://inclow-lm.github.io/
PhD student at ILLC / University of Amsterdam, interested in safety, bias, and stereotypes in conversational and generative AI #NLProc
https://veranep.github.io/
He teaches information science at Cornell. http://mimno.infosci.cornell.edu
Researcher trying to shape AI towards positive outcomes. ML & Ethics +birds. Generally trying to do the right thing. TIME 100 | TED speaker | Senate testimony provider | Navigating public life as a recluse.
Former: Google, Microsoft; Current: Hugging Face
PhD student USC NLP working on generalization and reasoning, prev UMassAmherst, IITG (he/him)
Associate Professor in EECS at MIT. Neural nets, generative models, representation learning, computer vision, robotics, cog sci, AI.
https://web.mit.edu/phillipi/
NLP & ML research @cohereforai.bsky.social ๐จ๐ฆ
PhD student @ Charles University. Researching evaluation and explainability of reasoning in language models.
Chief scientist at Redwood Research (https://www.redwoodresearch.org/), focused on technical AI safety research to reduce risks from rogue AIs
Research Scientist at Apple for uncertainty quantification.
Assistant Professor at Bocconi University in MilaNLP group โข Working in #NLP, #HateSpeech and #Ethics โข She/her โข #ERCStG PERSONAE