Thanks Clara Meister for presenting "Parity-Aware Byte-Pair Encoding: Improving Cross-lingual Fairness in Tokenization" at our lab seminar.
#NLProc #tokenization #fairness
@deboranozza.bsky.social
Assistant Professor at Bocconi University in MilaNLP group • Working in #NLP, #HateSpeech and #Ethics • She/her • #ERCStG PERSONAE
Thanks Clara Meister for presenting "Parity-Aware Byte-Pair Encoding: Improving Cross-lingual Fairness in Tokenization" at our lab seminar.
#NLProc #tokenization #fairness
📚 For today’s reading group @arimuti.bsky.social presented Emergent Misalignment: Narrow Finetuning Can Produce Broadly Misaligned LLMs (Betley et al., 2025).
🧩 arxiv.org/abs/2502.17424
#NLProc #AIAlignment #LLMs
#TBT #NLProc "Explaining Speech Classification Models" by Pastor et al. (2024) makes speech classification more transparent! 🔍 Their research reveals which words matter most and how tone and background noise impact decisions.
09.10.2025 15:05 — 👍 2 🔁 1 💬 0 📌 0📢 Are you interested in a PhD in #NLProc to study and improve how AI model emotions and social signals?
🚨Exciting news:🚨 I’m hiring a PhD candidate at LIACS,
@unileiden.bsky.social.
📍 Leiden, The Netherlands
📅 Deadline: 17 Nov 2025
👉 Position details and application link: tinyurl.com/5x5v6zsa
#MemoryModay #NLProc 'Measuring Harmful Representations in Scandinavian Language Models' uncovers gender bias, challenging Scandinavia's equity image. #MachineLearning
06.10.2025 15:05 — 👍 2 🔁 1 💬 0 📌 0#TBT #NLProc Explore 'Wisdom of Instruction-Tuned LLM Crowds' by Plaza et al. LLM labels outperform single models in tasks & languages. But few-shot can't top zero-shot. Supervised models rule.
02.10.2025 15:06 — 👍 2 🔁 2 💬 0 📌 0🚨 Are you looking for a PhD in #NLProc dealing with #LLMs?
🎉 Good news: I am hiring! 🎉
The position is part of the “Contested Climate Futures" project. 🌱🌍 You will focus on developing next-generation AI methods🤖 to analyze climate-related concepts in content—including texts, images, and videos.
📣 New Preprint!
Have you ever wondered what the political content in LLM's training data is? What are the political opinions expressed? What is the proportion of left- vs right-leaning documents in the pre- and post-training data? Do they correlate with the political biases reflected in models?
📖For our last Reading Group @donyarn.bsky.social presented
"Culture is Everywhere: A Call for Intentionally Cultural Evaluation" by Oh et al.
Paper: arxiv.org/pdf/2509.01301
#NLProc
#MemoryModay #NLProc 'Universal Joy: A Data Set and Results for Classifying Emotions Across Languages' by Lamprinidis et al. (2021) explores how AI research affects our planet. Tech can be green too! #SustainableTech
29.09.2025 15:05 — 👍 3 🔁 2 💬 0 📌 0MilaNLPers at CLiC-it 2025 presenting "Probing Feminist Representations: A Study of Bias in LLMs and Word Embeddings"
Check the paper at clic2025.unica.it/wp-content/u...
#NLProc #clicit25
What makes LLMs agree even when they shouldn’t? 🤔
At our last seminar’s lab, Jan Batzner presented The Brief History of LLM Sycophancy as We Know It.
#NLProc #sycophancy #LLM
#TBT #NLProc 'Classist Tools: Social Class Correlates with Performance in NLP' by Curry et al. (2024) explores AI's hidden energy problem, and how machine learning impacts environmental sustainability. Tech can be green! #CleanTech
25.09.2025 15:04 — 👍 3 🔁 2 💬 0 📌 0#MemoryModay #NLProc 'Benchmarking Post-Hoc Interpretability Approaches for Transformer-based Misogyny Detection' - Attanasio et al. Explores reliability of interpretability in hate speech detection.
22.09.2025 15:05 — 👍 4 🔁 2 💬 0 📌 0📖For this Thursday's Reading Group @elisabassignana.bsky.social presented "The sociolinguistic foundations of language modeling" by Grieve et al.
Paper: www.frontiersin.org/journals/art...
#NLProc #LLM #sociolinguistics
#TBT #NLProc 'Geolocation with Attention-Based Multitask Learning Models' by Tommaso Fornaciari, @dirkhovy.bsky.social (2019) reveals how online political talks can become one-sided. Breaking out of our bubbles! #SocialMedia
18.09.2025 15:06 — 👍 4 🔁 2 💬 0 📌 0🎓We're back with Fridays' lab seminars! Today we had
Suyash Fulay presenting "Truth, Political Bias, and AI Representation".
#NLProc
#MemoryModay #NLProc 'Dense Node Representation for Geolocation' by Fornaciari & @dirkhovy.bsky.social reveals efficient geolocation methods using node2vec & doc2vec models. Greater network size, less parameters. /publication/2019_m2v/2019_m2v
15.09.2025 15:06 — 👍 3 🔁 2 💬 0 📌 0📖For this Thursday's Reading Group @deboranozza.bsky.social
presented two papers on #sycophancy in #LLMs.
Papers: arxiv.org/pdf/2505.13995, arxiv.org/pdf/2310.13548
#NLProc
#TBT #NLProc 'MilaNLP @ WASSA: Does BERT Feel Sad When You Cry?' by Fornaciari et al. (2021) indicates that emotion and empathy are not related tasks for prediction.
11.09.2025 15:07 — 👍 2 🔁 2 💬 0 📌 0#MemoryModay #NLProc 'My Answer is C' by Wang et al. (2024) underscores the scrutiny needed for full text responses in LLMs multi-choice evaluations.
08.09.2025 15:04 — 👍 3 🔁 2 💬 0 📌 0We're back with the reading group! 🚀
Today @paul-rottger.bsky.social presented The Levers of Political Persuasion with Conversational AI by Hackenburg et al. (2025)
Paper: arxiv.org/pdf/2507.13919
#NLProc
#TBT #NLProc 'Temporal and Second Language Influence on Intra-Annotator Agreement and Stability in Hate Speech Labelling' Abercrombie, @dirkhovy.bsky.social, & Prabhakaran (2023). NLP models: inclusive or not?
04.09.2025 15:03 — 👍 3 🔁 2 💬 0 📌 0#MemoryModay #NLProc Hung et al.'s 2023 paper, 'Can Demographic Factors Improve Text Classification?' finds demographic adaptations of Transformer NLP models don't notably boost performance.
01.09.2025 15:04 — 👍 4 🔁 2 💬 0 📌 0#TBT #NLProc Outstanding Paper at ACL 2024! 'Towards More Meaningful Evaluations for Values and Opinions in Large Language Models' by @paul-rottger.bsky.social et al. Enhancing political bias evaluation.
28.08.2025 15:04 — 👍 4 🔁 2 💬 0 📌 0#MemoryModay #NLProc 'Detecting Misogynous Memes with Text & Image Modalities' by Attanasio, @deboranozza.bsky.social, Bianchi. Their novel system uses Perceiver IO, surpassing all previous benchmarks.
25.08.2025 15:01 — 👍 1 🔁 1 💬 0 📌 0#TBT #NLProc 'XLM-EMO: Multilingual Emotion Prediction in Social Media Text' by Bianchi, @deboranozza.bsky.social, @dirkhovy.bsky.social (2022) advances cross-language emotion detection, especially for low-resource languages.
21.08.2025 15:04 — 👍 4 🔁 2 💬 0 📌 0#MemoryModay #NLProc Plaza-del-Arco, @debora_nozza, @dirkhovy.bsky.social's 2024 paper "Wisdom Instruction-Tuned Language Model Crowds" shows that multiple LLMs can be BETTER than a single model & specialize across tasks & languages.
18.08.2025 15:05 — 👍 3 🔁 3 💬 0 📌 0#TBT #NLProc 'Make Natural Language Processing About People Again' by @dirkhovy.bsky.social (2018) uncovers how AI models portray different religions and emotions. #AIEthics
14.08.2025 15:08 — 👍 3 🔁 1 💬 0 📌 0#MemoryModay #NLProc #TBT #NLPproc 'Pipelines for Social Bias Testing of Large Language Models' by @deboranozza.bsky.social, Federico Bianchi, @dirkhovy.bsky.social (2022). Proposes social bias tests akin to software testing in AI dev pipelines.
11.08.2025 15:01 — 👍 5 🔁 2 💬 0 📌 0