Martin Tutek's Avatar

Martin Tutek

@mtutek.bsky.social

Postdoc @ TakeLab, UniZG | previously: Technion; TU Darmstadt | PhD @ TakeLab, UniZG Faithful explainability, controllability & safety of LLMs. ๐Ÿ”Ž On the academic job market ๐Ÿ”Ž https://mttk.github.io/

278 Followers  |  364 Following  |  72 Posts  |  Joined: 24.11.2024  |  2.117

Latest posts by mtutek.bsky.social on Bluesky

There's a reviewer at ICLR who apparently always writes *exactly* 40 weaknesses and comments no matter what paper he's reviewing.

Exhibit A: openreview.net/forum?id=8qk...
Exhibit B: openreview.net/forum?id=GlX...
Exhibit C: openreview.net/forum?id=kDh...

15.11.2025 14:42 โ€” ๐Ÿ‘ 8    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2

within the next 3-4 days, so sadly that doesn't work

11.11.2025 11:03 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

*Urgently* looking for emergency reviewers for the ARR October Interpretability track ๐Ÿ™๐Ÿ™

ReSkies much appreciated

11.11.2025 10:29 โ€” ๐Ÿ‘ 1    ๐Ÿ” 9    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image Post image Post image

Full house at BlackboxNLP at #EMNLP2025!! Getting ready for my 1.45PM keynote ๐Ÿ˜Ž Join us in A102 to learn about "Memorization: myth or mystery?"

09.11.2025 03:04 โ€” ๐Ÿ‘ 12    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

๐™’๐™š'๐™ง๐™š ๐™๐™ž๐™ง๐™ž๐™ฃ๐™œ ๐™ฃ๐™š๐™ฌ ๐™›๐™–๐™˜๐™ช๐™ก๐™ฉ๐™ฎ ๐™ข๐™š๐™ข๐™—๐™š๐™ง๐™จ!

KSoC: utah.peopleadmin.com/postings/190... (AI broadly)

Education + AI:
- utah.peopleadmin.com/postings/189...
- utah.peopleadmin.com/postings/190...

Computer Vision:
- utah.peopleadmin.com/postings/183...

07.11.2025 23:35 โ€” ๐Ÿ‘ 16    ๐Ÿ” 10    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps Martin Tutek, Fateme Hashemi Chaleshtori, Ana Marasovic, Yonatan Belinkov. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.

Outstanding paper (5/7):

"Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps"
by Martin Tutek, Fateme Hashemi Chaleshtori, Ana Marasovic, and Yonatan Belinkov
aclanthology.org/2025.emnlp-m...

6/n

07.11.2025 22:32 โ€” ๐Ÿ‘ 11    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

hvala Josipa ๐ŸŽ‰๐Ÿฅณ

07.11.2025 10:58 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Thank you Gabriele :)

07.11.2025 10:52 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image Post image

Very honored to be one out of seven outstanding papers at this years' EMNLP :)

Huge thanks to my amazing collaborators @fatemehc.bsky.social @anamarasovic.bsky.social @boknilev.bsky.social , this would not have been possible without them!

07.11.2025 08:58 โ€” ๐Ÿ‘ 23    ๐Ÿ” 6    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 2
Preview
Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement Gabriele Sarti, Vilรฉm Zouhar, Malvina Nissim, Arianna Bisazza. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.

Presenting today our work "Unsupervised Word-level Quality Estimation Through the Lens of Annotator (Dis)agreement" at the Machine Translation morning session (Room A301, 11:45 China time). See you there! ๐Ÿค—

Paper: aclanthology.org/2025.emnlp-m...
Slides/video/poster: underline.io/events/502/s...

06.11.2025 01:19 โ€” ๐Ÿ‘ 5    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Hereโ€™s a custom feed for #EMNLP2025. Click the pin to save it to your home screen!

02.11.2025 15:15 โ€” ๐Ÿ‘ 11    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps When prompted to think step-by-step, language models (LMs) produce a chain of thought (CoT), a sequence of reasoning steps that the model supposedly used to produce its prediction. Despite much work o...

Flying out to @emnlpmeeting soon๐Ÿ‡จ๐Ÿ‡ณ
I'll present our parametric CoT faithfulness work (arxiv.org/abs/2502.14829) on Wednesday at the second Interpretability session, 16:30-18:00 local time A104-105

If you're in Suzhou, reach out to talk all things reasoning :)

31.10.2025 13:30 โ€” ๐Ÿ‘ 11    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1

literally any book by sally rooney

(jk I know you don't like her)

24.10.2025 14:20 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

โฐ One week left to apply for the two PhD Fellowships in Trustworthy NLP and Explainable NLU! The two positions have a starting date in spring 2026. Check the original post for more details๐Ÿ‘‡

24.10.2025 08:30 โ€” ๐Ÿ‘ 4    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

The only benefit of them being humanoid is training data I guess?

Companies have a bunch of videos of e.g. factory workers doing repetetive tasks, so you have more signal on intermediate steps of some actions to train the robots behavior

23.10.2025 14:17 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image Post image

๐Ÿ“ฃTomorrow at #COLM2025:

1๏ธโƒฃ Purbid's ๐ฉ๐จ๐ฌ๐ญ๐ž๐ซ at ๐’๐จ๐‹๐š๐‘ (๐Ÿ๐Ÿ:๐Ÿ๐Ÿ“๐š๐ฆ-๐Ÿ:๐ŸŽ๐ŸŽ๐ฉ๐ฆ) on catching redundant preference pairs & how pruning them hurts accuracy; www.anamarasovic.com/publications...

2๏ธโƒฃ My ๐ญ๐š๐ฅ๐ค at ๐—๐‹๐‹๐Œ-๐‘๐ž๐š๐ฌ๐จ๐ง-๐๐ฅ๐š๐ง (๐Ÿ๐Ÿ๐ฉ๐ฆ) on measuring CoT faithfulness by looking at internals, not just behaviorally

1/3

09.10.2025 16:54 โ€” ๐Ÿ‘ 14    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

If you're at COLM, check out various works by Ana and her group!

09.10.2025 16:58 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Huge thanks to @adisimhi.bsky.social for leading the work & Jonathan Herzig, @itay-itzhak.bsky.social, Idan Szpektor, @boknilev.bsky.social

๐Ÿ”— ManagerBench:
๐Ÿ“„ - arxiv.org/pdf/2510.00857
๐Ÿ‘ฉโ€๐Ÿ’ป โ€“ github.com/technion-cs-...
๐ŸŒ โ€“ technion-cs-nlp.github.io/ManagerBench...
๐Ÿ“Š - huggingface.co/datasets/Adi...

08.10.2025 15:14 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Here's the twist: LLMsโ€™ harm assessments actually align well with human judgments ๐ŸŽฏ
The problem? Flawed prioritization!

08.10.2025 15:14 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

The results? Frontier LLMs struggle badly with this trade-off:

Many consistently choose harmful options to achieve operational goals
Others become overly cautiousโ€”avoiding harm but becoming ineffective

The sweet spot of safe AND pragmatic? Largely missing!

08.10.2025 15:14 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

ManagerBench evaluates LLMs on realistic managerial scenarios validated by humans. Each scenario forces a choice:

โŒ A pragmatic but harmful action that achieves the goal
โœ… A safe action with worse operational performance
โž•control scenarios with only inanimate objects at risk๐Ÿ˜Ž

08.10.2025 15:14 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Many works investigate the relationship between LLM, goals, and safety.

We create a realistic management scenario where LLMs have explicit motivations to choose harmful options, while always having a harmless option.

08.10.2025 15:14 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿค”What happens when LLM agents choose between achieving their goals and avoiding harm to humans in realistic management scenarios? Are LLMs pragmatic or prefer to avoid human harm?

๐Ÿš€ New paper out: ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs๐Ÿš€๐Ÿงต

08.10.2025 15:14 โ€” ๐Ÿ‘ 8    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2

I won't be at COLM, so come see Yonatan talk about our work on estimating CoT faithfulness using machine unlearning!

Check out the thread for the (many) other interesting works from his group ๐ŸŽ‰

07.10.2025 13:47 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Hereโ€™s a #COLM2025 feed!

Pin it ๐Ÿ“Œ to follow along with the conference this week!

06.10.2025 20:26 โ€” ๐Ÿ‘ 26    ๐Ÿ” 17    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1

Josip Juki\'c, Martin Tutek, Jan \v{S}najder
Context Parametrization with Compositional Adapters
https://arxiv.org/abs/2509.22158

29.09.2025 07:47 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Adi Simhi, Jonathan Herzig, Martin Tutek, Itay Itzhak, Idan Szpektor, Yonatan Belinkov
ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs
https://arxiv.org/abs/2510.00857

02.10.2025 06:59 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Opportunities to join my group in fall 2026:
* PhD applications direct or via ELLIS @ellis.eu (ellis.eu/news/ellis-p...)
* Post-doc applications direct or via Azrieli (azrielifoundation.org/fellows/inte...) or Zuckerman (zuckermanstem.org/ourprograms/...)

01.10.2025 13:44 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

What's the right unit of analysis for understanding LLM internals? We explore in our mech interp survey (a major update from our 2024 ms).

Weโ€™ve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics!

01.10.2025 14:03 โ€” ๐Ÿ‘ 40    ๐Ÿ” 14    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 2

Hints of an Openreview x Overleaf stealth collab, sharing data of future works? ๐Ÿค”

30.09.2025 19:19 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@mtutek is following 20 prominent accounts