Philipp Mondorf @pmondorf - Bluesky Profile

👥 @veraneplenbroek.bsky.social, Sandro Pezelle, @barbaraplank.bsky.social, @davidschlangen.bsky.social, Alessandro Suglia, @akskuchi.bsky.social, @ecekt.bsky.social, and @alberto-testoni.bsky.social.
📍Poster Session 2 — Hall 4/5, 11:00–12:30, Monday, July 28.

#MaiNLP #MCML #NLProc

18.07.2025 10:19 — 👍 3 🔁 0 💬 0 📌 0

👥 Special thanks to @annabavaresco.bsky.social, @raffagbernardi.bsky.social, @leobertolazzi.bsky.social, @delliott.bsky.social, Raquel Fernández, Albert Gatt, @esamghaleb.bsky.social, Mario Giulianelli, @michaelwhanna.bsky.social, @akoller.bsky.social, @andre-t-martins.bsky.social

18.07.2025 10:19 — 👍 3 🔁 0 💬 1 📌 0

👥 This work is the result of a wonderful collaboration involving 20 researchers from 11 different universities.

18.07.2025 10:19 — 👍 0 🔁 0 💬 1 📌 0

🔎Based on evaluations across 11 recent LLMs, we find that model judgments should be used with care, as they exhibit notable variability depending on the task and samples being evaluated. We argue that LLMs should be carefully validated against human judgments before being used as evaluators.

18.07.2025 10:19 — 👍 0 🔁 0 💬 1 📌 0

🔎 In this work, we study whether LLM judgments can be reliably used as proxies for human judgments. We introduce JUDGE-BENCH, an extensive collection of 20 datasets with human annotations covering a variety of NLP tasks.

18.07.2025 10:19 — 👍 0 🔁 0 💬 1 📌 0

LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks There is an increasing trend towards evaluating NLP models with LLMs instead of human judgments, raising questions about the validity of these evaluations, as well as their reproducibility in the case...

📄 [ACL 2025 main] LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks (doi.org/10.48550/arX...)

18.07.2025 10:19 — 👍 9 🔁 4 💬 1 📌 0

👥 Huge thanks to my collaborators and co-authors, Sondre Wold and @barbaraplank.bsky.social
📍Poster Session 7 — Hall 4/5, 10:30–12:00, Tuesday, July 29.

18.07.2025 10:19 — 👍 1 🔁 0 💬 1 📌 0

🔎 Moreover, we show that these circuits can be reused and combined through set operations to represent more complex functional capabilities of the model. For more information, check out the paper!

18.07.2025 10:19 — 👍 1 🔁 0 💬 1 📌 0

🔎 In this work, we study the relationship between transformer circuits identified for highly compositional and functionally related tasks. We find that functionally similar circuits exhibit both notable node overlap and cross-task faithfulness.

18.07.2025 10:19 — 👍 1 🔁 0 💬 1 📌 0

Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models A fundamental question in interpretability research is to what extent neural networks, particularly language models, implement reusable functions through subnetworks that can be composed to perform mo...

📄 [ACL 2025 main] Circuit compositions: Exploring Modular Structures in Transformer-Based Language Models (doi.org/10.48550/arX...)

18.07.2025 10:19 — 👍 5 🔁 2 💬 1 📌 0

I am happy to share that I’ll be attending #ACL2025 in Vienna 🇦🇹, where I’ll be presenting two papers (more information below)!

18.07.2025 10:19 — 👍 11 🔁 0 💬 1 📌 0

The hand-drawn sign from three years ago.

🎉MaiNLP is turning 3 today!🎂🥳 We’ve grown a lot since @barbaraplank.bsky.social started this group with nothing but three aspiring researches and a hand-drawn sign on the door. Huge thanks to all the amazing people who have joined or visited us since. Here’s to many more years of exciting research!🚀

01.04.2025 10:40 — 👍 19 🔁 9 💬 1 📌 2

🙋‍♂️

25.11.2024 18:03 — 👍 0 🔁 0 💬 0 📌 0

Philipp Mondorf

Latest posts by pmondorf.bsky.social on Bluesky

@pmondorf is following 19 prominent accounts