🧑🔬I’m recruiting PhD students in Natural Language Processing @unileipzig.bsky.social Computer Science, together with @scadsai.bsky.social!
Topics include, but aren’t limited to:
🔎Linguistic Interpretability
🌍Multilingual Evaluation
📖Computational Typology
Please share!
#NLProc #NLP
11.12.2025 13:36 — 👍 41 🔁 25 💬 1 📌 3
The team meeting of the week was presented by Alexandre Vérine, from PSL, about "Quality and Diversity in generative models through the lens of f-divergences."
Thanks a lot for this interesting talk!
24.11.2025 18:13 — 👍 0 🔁 1 💬 0 📌 0
Accepted to a Workshop (1/2):
"Self-Retrieval from Distant Contexts for Document-Level Machine Translation", accepted to the Conference on Machine Translation (WMT25), from @ziqianpeng.bsky.social, @rachelbawden.bsky.social, @yvofr.bsky.social
28.10.2025 08:57 — 👍 0 🔁 2 💬 1 📌 0
There's many directions where this could go, multilingual, low-resource language, interpretability, depending on your profile, and the internship may lead to a PhD, provided we get funding!
06.11.2025 09:07 — 👍 1 🔁 1 💬 0 📌 0
Come work with @yvofr.bsky.social @weissweiler.bsky.social and me at @mlia-isir.bsky.social for a M2 internship on Assessing the Morphological Competence of LLMs! For 5-6 months from February or March 2026. Paid 600€/month
06.11.2025 09:02 — 👍 3 🔁 2 💬 1 📌 0
What's the plural of "LLM-as-a-Judge"?
24.10.2025 15:19 — 👍 0 🔁 0 💬 0 📌 0
Landing Page
work done with @yvofr.bsky.social as part of the Democratic Commons programme, many thanks to our colleagues at Make, Sciences Po, and Sorbonne! about.make.org/democratic-c...
23.10.2025 16:16 — 👍 0 🔁 0 💬 1 📌 0
here's what one example of the dataset looks like, there are 72,234 just like this one (I regret my multimodal days where there were pictures in my papers)
23.10.2025 16:09 — 👍 0 🔁 0 💬 1 📌 0
Assessing the Political Fairness of Multilingual LLMs: A Case Study based on a 21-way Multiparallel EuroParl Dataset
The political biases of Large Language Models (LLMs) are usually assessed by simulating their answers to English surveys. In this work, we propose an alternative framing of political biases, relying on principles of fairness in multilingual translation. We systematically compare the translation quality of speeches in the European Parliament (EP), observing systematic differences with majority parties from left, center, and right being better translated than outsider parties. This study is made possible by a new, 21-way multiparallel version of EuroParl, the parliamentary proceedings of the EP, which includes the political affiliations of each speaker. The dataset consists of 1.5M sentences for a total of 40M words and 249M characters. It covers three years, 1000+ speakers, 7 countries, 12 EU parties, 25 EU committees, and hundreds of national parties.
We find that LLMs translate some political parties unfairly using a new version of EuroParl, fully multi-parallel and including (political) metadata
hal.science/hal-05328251
23.10.2025 16:07 — 👍 1 🔁 0 💬 1 📌 0
Google Colab
I tried for a pythonic library, have a look at the example notebook colab.research.google.com/github/PaulL...
15.10.2025 17:27 — 👍 0 🔁 0 💬 0 📌 0
🤔 ppllm is benchmarked against:
- a vllm-based implementation: 4.15 times faster!
- a naive hugging face implementation, which does not sort texts by length: 4.61 times faster!
15.10.2025 17:26 — 👍 0 🔁 0 💬 1 📌 0
🤔 ppllm implements windowed PPL, which allows to compute the PPL of arbitrarily long texts.
It aims to be feature complete for many information-theoretic metrics, including Perplexity (PPL), Surprisal, and bits per character (BPC), and their word-level counterparts.
15.10.2025 17:26 — 👍 0 🔁 0 💬 1 📌 0
"in 2025 we will have flying cars" 😂😂😂
05.07.2025 16:17 — 👍 399 🔁 91 💬 8 📌 35
Work done with Laurène Cave,
@haldaume3.bsky.social, Léo Labat, Gaël Lejeune, Pierre-Antoine Lequeu,
@bpiwowar.bsky.social, Nazanin Shafiabadi and @yvofr.bsky.social, read the paper here talnarchives.atala.org/ateliers/202...
Any feedback is appreciated :)
07.07.2025 08:02 — 👍 0 🔁 0 💬 0 📌 0
Last week, I presented my work on "Assessing the Political Biases of Multilingual LLMs" at the EALM workshop @ TALN 2025 ! Thanks again to the ANR Diké project for organizing the workshop
07.07.2025 07:59 — 👍 0 🔁 0 💬 1 📌 0
Investigating Length Issues in Document-level Machine Translation
Transformer architectures are increasingly effective at processing and generating very long chunks of texts, opening new perspectives for document-level machine translation (MT). In this work, we chal...
📢 🎉 The team has one paper accepted to #MTsummit2025!
"Investigating Length Issues in Document-level Machine Translation" by @ziqianpeng.bsky.social, @rachelbawden.bsky.social and @yvofr.bsky.social in collaboration with @inriaparisnlp.bsky.social
📍 Geneva | 🗓️ 23-27,June
📕 arxiv.org/abs/2412.17592
10.06.2025 18:45 — 👍 1 🔁 1 💬 1 📌 0
"meticulously" is so absent from this list (from aclanthology.org/2025.coling-... )
16.06.2025 08:25 — 👍 0 🔁 0 💬 0 📌 0
Am I the only reviewer that actually fills this "Reviewer Checklist"? And why do Area Chairs never answer when the paper needs to be desk-rejected? And reviews are due in 3 days 🫠
16.06.2025 07:46 — 👍 0 🔁 0 💬 0 📌 0
For the EALM Workshop
"On Assessing the Political Biases of Multilingual Large Language Models" by @lernerp.bsky.social Laurène Cave, @haldaume3.bsky.social Léo Labat, Gaël Lejeune, Pierre-Antoine Lequeu, @bpiwowar.bsky.social Nazanin Shafiabadi and yvofr.bsky.social, collaborated with the STIH lab
10.06.2025 18:39 — 👍 0 🔁 2 💬 1 📌 0
", I am" 🤔
Don't you think this would increase the imbalance in multilingual LLMs?
28.03.2025 08:27 — 👍 0 🔁 0 💬 0 📌 0
Amazed at what a COLING paper could look like in the 80's
20.02.2025 09:26 — 👍 1 🔁 0 💬 0 📌 0
Hope you enjoyed our poster at #AISummit! I'm standing next to Pierre-Antoine Lequeu, @salimhafid.bsky.social, and @manonberriche.bsky.social but there's more people involved! Zoom-in to read their names or learn more about the project here about.make.org/democratic-c...
11.02.2025 09:54 — 👍 3 🔁 1 💬 0 📌 0
We are the Leuven AI Group of Multilingual NLP (LAGoM NLP), a research lab at the department of Computer Science at KU Leuven, led by @mdlhx
PhD @ ETH Zürich | working on (multilingual) evaluation of NLP | on the academic job market | go #vegan | https://vilda.net
PhD Student @ Sorbonne Université (@mlia-isir.bsky.social)
Research in information retrieval and conversational search: Towards Language models that know what they know 🧠
Homepage : victormorand.github.io
PhD student @mainlp.bsky.social (@cislmu.bsky.social, LMU Munich). Interested in language variation & change, currently working on NLP for dialects and low-resource languages.
verenablaschke.github.io
MLIA research team at CNRS/ISIR lab in Sorbonne University @sorbonne-universite.fr
https://www.isir.upmc.fr/equipes/mlia/
a mediocre combination of a mediocre AI scientist, a mediocre physicist, a mediocre chemist, a mediocre manager and a mediocre professor.
see more at https://kyunghyuncho.me/
NLP assistant prof at KU Leuven, PI @lagom-nlp.bsky.social. I like syntax more than most people. Also multilingual NLP, interpretability, mountains and beer. (She/her)
Postdoc at Uppsala University Computational Linguistics with Joakim Nivre
PhD from LMU Munich, prev. UT Austin, Princeton, @ltiatcmu.bsky.social, Cambridge
computational linguistics, construction grammar, morphosyntax
leonieweissweiler.github.io
⚙️ Gameplay Programmer • Unreal Engine
🎮 Working on UNRECORD @DRAMA
📍 Rennes, France • he/him
Directeur de recherche at Inria, former invited professor at Collège de France, co-founder of opensquare
Ingénieure d'études @ CNRS
Chargée de projets en traduction et terminologie scientifique
Translation and terminology 🇫🇷 🇺🇸 🇪🇸
Postdoc @milanlp.bsky.social working on LLM safety and societal impacts. Previously PhD @oii.ox.ac.uk and CTO / co-founder of Rewire (acquired '23)
https://paulrottger.com/
👩💻 Postdoc researcher @medialab-scpo.bsky.social
🔍 Exploring the use of LLMs in participatory democracy frameworks
🤝 Democratic Commons Project with Cevipof, ISIR and Make.org
🎓 PhD in Sociology on Misinformation Reception
🌐 manonberriche.github.io
Postdoctoral Researcher in AI for Democracy @SciencesPo
NLP Researcher at EleutherAI, PhD UC San Diego Linguistics.
Interested in multilingual NLP, tokenizers, open science.
📍Boston. She/her.
https://catherinearnett.github.io/
Research scientist at INA