Paul Lerner's Avatar

Paul Lerner

@lernerp.bsky.social

Postdoc @mlia_isir@sciences.re (Sorbonne Université, CNRS, ISIR) / Teacher @ aivancity / Teacher Assistant @ Sorbonne Université https://paullerner.github.io/

41 Followers  |  52 Following  |  44 Posts  |  Joined: 15.11.2024  |  2.1855

Latest posts by lernerp.bsky.social on Bluesky

Post image

🧑‍🔬I’m recruiting PhD students in Natural Language Processing @unileipzig.bsky.social Computer Science, together with @scadsai.bsky.social!

Topics include, but aren’t limited to:

🔎Linguistic Interpretability
🌍Multilingual Evaluation
📖Computational Typology

Please share!

#NLProc #NLP

11.12.2025 13:36 — 👍 41    🔁 25    💬 1    📌 3
Post image

The team meeting of the week was presented by Alexandre Vérine, from PSL, about "Quality and Diversity in generative models through the lens of f-divergences."
Thanks a lot for this interesting talk!

24.11.2025 18:13 — 👍 0    🔁 1    💬 0    📌 0

Accepted to a Workshop (1/2):

"Self-Retrieval from Distant Contexts for Document-Level Machine Translation", accepted to the Conference on Machine Translation (WMT25), from @ziqianpeng.bsky.social, @rachelbawden.bsky.social, @yvofr.bsky.social

28.10.2025 08:57 — 👍 0    🔁 2    💬 1    📌 0

There's many directions where this could go, multilingual, low-resource language, interpretability, depending on your profile, and the internship may lead to a PhD, provided we get funding!

06.11.2025 09:07 — 👍 1    🔁 1    💬 0    📌 0
Preview
Unlike “Likely”, “Unlike” is Unlikely: BPE-based Segmentation hurts Morphological Derivations in LLMs Paul Lerner, François Yvon. Proceedings of the 31st International Conference on Computational Linguistics. 2025.

As we found in aclanthology.org/2025.coling-... that BPE-based LLMs (i.e. pretty much all LLMs) did not handle prefixations well

06.11.2025 09:06 — 👍 1    🔁 1    💬 1    📌 0
Preview
Derivational morphology reveals analogical generalization in large language models | PNAS What mechanisms underlie linguistic generalization in large language models (LLMs)? This question has attracted considerable attention, with most s...

Basically the idea is to extend www.pnas.org/doi/10.1073/... to see how well LLMs model competition between affixes, not only suffixes (e.g. -ity vs. -ness) but also prefixes (e.g. un- vs. non-)

06.11.2025 09:04 — 👍 2    🔁 1    💬 1    📌 0

Come work with @yvofr.bsky.social @weissweiler.bsky.social and me at @mlia-isir.bsky.social for a M2 internship on Assessing the Morphological Competence of LLMs! For 5-6 months from February or March 2026. Paid 600€/month

06.11.2025 09:02 — 👍 3    🔁 2    💬 1    📌 0

What's the plural of "LLM-as-a-Judge"?

24.10.2025 15:19 — 👍 0    🔁 0    💬 0    📌 0
Preview
Assessing the Political Fairness of Multilingual LLMs: A Case Study based on a 21-way Multiparallel EuroParl Dataset The political biases of Large Language Models (LLMs) are usually assessed by simulating their answers to English surveys. In this work, we propose an alternative framing of political biases, relying o...

now on arxiv arxiv.org/abs/2510.20508

24.10.2025 07:30 — 👍 0    🔁 0    💬 0    📌 0
Landing Page

work done with @yvofr.bsky.social as part of the Democratic Commons programme, many thanks to our colleagues at Make, Sciences Po, and Sorbonne! about.make.org/democratic-c...

23.10.2025 16:16 — 👍 0    🔁 0    💬 1    📌 0
GitHub - PaulLerner/21-EuroParl: Dataset and code for the paper "Assessing the Political Fairness of Multilingual LLMs: A Case Study based on a 21-way Multiparallel EuroParl Dataset" (Lerner and Yvon,... Dataset and code for the paper "Assessing the Political Fairness of Multilingual LLMs: A Case Study based on a 21-way Multiparallel EuroParl Dataset" (Lerner and Yvon, 2025) - PaulLerner/...

the dataset and code are available github.com/PaulLerner/2...

23.10.2025 16:13 — 👍 0    🔁 0    💬 1    📌 0
Post image

here's what one example of the dataset looks like, there are 72,234 just like this one (I regret my multimodal days where there were pictures in my papers)

23.10.2025 16:09 — 👍 0    🔁 0    💬 1    📌 0
Assessing the Political Fairness of Multilingual LLMs: A Case Study based on a 21-way Multiparallel EuroParl Dataset The political biases of Large Language Models (LLMs) are usually assessed by simulating their answers to English surveys. In this work, we propose an alternative framing of political biases, relying on principles of fairness in multilingual translation. We systematically compare the translation quality of speeches in the European Parliament (EP), observing systematic differences with majority parties from left, center, and right being better translated than outsider parties. This study is made possible by a new, 21-way multiparallel version of EuroParl, the parliamentary proceedings of the EP, which includes the political affiliations of each speaker. The dataset consists of 1.5M sentences for a total of 40M words and 249M characters. It covers three years, 1000+ speakers, 7 countries, 12 EU parties, 25 EU committees, and hundreds of national parties.

We find that LLMs translate some political parties unfairly using a new version of EuroParl, fully multi-parallel and including (political) metadata
hal.science/hal-05328251

23.10.2025 16:07 — 👍 1    🔁 0    💬 1    📌 0
Preview
Google Colab

I tried for a pythonic library, have a look at the example notebook colab.research.google.com/github/PaulL...

15.10.2025 17:27 — 👍 0    🔁 0    💬 0    📌 0

🤔 ppllm is benchmarked against:
- a vllm-based implementation: 4.15 times faster!
- a naive hugging face implementation, which does not sort texts by length: 4.61 times faster!

15.10.2025 17:26 — 👍 0    🔁 0    💬 1    📌 0
Post image

🤔 ppllm implements windowed PPL, which allows to compute the PPL of arbitrarily long texts.
It aims to be feature complete for many information-theoretic metrics, including Perplexity (PPL), Surprisal, and bits per character (BPC), and their word-level counterparts.

15.10.2025 17:26 — 👍 0    🔁 0    💬 1    📌 0
Preview
GitHub - PaulLerner/ppllm: 🤔 A Python Library to Compute LLM's Perplexity and Surprisal 🤔 A Python Library to Compute LLM's Perplexity and Surprisal - PaulLerner/ppllm

introducing 🤔 ppllm, a Python Library to Compute LLM's Perplexity and Surprisal github.com/PaulLerner/p...

15.10.2025 17:24 — 👍 0    🔁 0    💬 1    📌 0
Preview
Contribute to current consultations - Comment l’IA peut-elle améliorer la vie des Français en limitant les risques ? - Make.org Finding proposals is easier when working together. Discover a democratic place where you can discuss the big issues you care about, submit your proposals concerning them and vote on proposals proposed...

make.org/FR/consultat...

04.09.2025 14:00 — 👍 0    🔁 0    💬 0    📌 0
Post image

"in 2025 we will have flying cars" 😂😂😂

05.07.2025 16:17 — 👍 399    🔁 91    💬 8    📌 35

Work done with Laurène Cave,
@haldaume3.bsky.social, Léo Labat, Gaël Lejeune, Pierre-Antoine Lequeu,
@bpiwowar.bsky.social, Nazanin Shafiabadi and @yvofr.bsky.social, read the paper here talnarchives.atala.org/ateliers/202...
Any feedback is appreciated :)

07.07.2025 08:02 — 👍 0    🔁 0    💬 0    📌 0
Post image

Last week, I presented my work on "Assessing the Political Biases of Multilingual LLMs" at the EALM workshop @ TALN 2025 ! Thanks again to the ANR Diké project for organizing the workshop

07.07.2025 07:59 — 👍 0    🔁 0    💬 1    📌 0
Preview
Investigating Length Issues in Document-level Machine Translation Transformer architectures are increasingly effective at processing and generating very long chunks of texts, opening new perspectives for document-level machine translation (MT). In this work, we chal...

📢 🎉 The team has one paper accepted to #MTsummit2025!
"Investigating Length Issues in Document-level Machine Translation" by @ziqianpeng.bsky.social, @rachelbawden.bsky.social and @yvofr.bsky.social in collaboration with @inriaparisnlp.bsky.social
📍 Geneva | 🗓️ 23-27,June
📕 arxiv.org/abs/2412.17592

10.06.2025 18:45 — 👍 1    🔁 1    💬 1    📌 0
Post image

"meticulously" is so absent from this list (from aclanthology.org/2025.coling-... )

16.06.2025 08:25 — 👍 0    🔁 0    💬 0    📌 0

Am I the only reviewer that actually fills this "Reviewer Checklist"? And why do Area Chairs never answer when the paper needs to be desk-rejected? And reviews are due in 3 days 🫠

16.06.2025 07:46 — 👍 0    🔁 0    💬 0    📌 0

For the EALM Workshop
"On Assessing the Political Biases of Multilingual Large Language Models" by @lernerp.bsky.social Laurène Cave, @haldaume3.bsky.social Léo Labat, Gaël Lejeune, Pierre-Antoine Lequeu, @bpiwowar.bsky.social Nazanin Shafiabadi and yvofr.bsky.social, collaborated with the STIH lab

10.06.2025 18:39 — 👍 0    🔁 2    💬 1    📌 0
Preview
PLMlatex, Éditeur LaTeX en ligne Un éditeur LaTeX en ligne facile à utiliser. Pas d’installation, collaboration en temps réel, gestion des versions, des centaines de modèles de documents LaTeX, et plus encore.

@mdlhx.bsky.social PS: I just found out we can actually share projects outside of CNRS plmlatex.math.cnrs.fr/6632958859wh...

07.05.2025 07:16 — 👍 1    🔁 0    💬 0    📌 0
Preview
Identifiant Un éditeur LaTeX en ligne facile à utiliser. Pas d’installation, collaboration en temps réel, gestion des versions, des centaines de modèles de documents LaTeX, et plus encore.

CNRS provides plmlatex.math.cnrs.fr that covers most of the features. I guess it's not so complicated to host (the software is open source)

06.05.2025 13:04 — 👍 1    🔁 0    💬 1    📌 0

", I am" 🤔
Don't you think this would increase the imbalance in multilingual LLMs?

28.03.2025 08:27 — 👍 0    🔁 0    💬 0    📌 0
Post image

Amazed at what a COLING paper could look like in the 80's

20.02.2025 09:26 — 👍 1    🔁 0    💬 0    📌 0
Post image

Hope you enjoyed our poster at #AISummit! I'm standing next to Pierre-Antoine Lequeu, @salimhafid.bsky.social, and @manonberriche.bsky.social but there's more people involved! Zoom-in to read their names or learn more about the project here about.make.org/democratic-c...

11.02.2025 09:54 — 👍 3    🔁 1    💬 0    📌 0

@lernerp is following 20 prominent accounts