Multiple LLM Agents Debate for Equitable Cultural Alignment
Large Language Models (LLMs) need to adapt their predictions to diverse cultural contexts to benefit diverse communities across the world. While previous efforts have focused on single-LLM, single-tur...
8/ ๐ Huge thanks to @marinecarpuat.bsky.social, Rachel, and @zhoutianyi.bsky.social for their guidance โ and special shoutout to the amazing UMD CLIP team!
Check out our paper and code below ๐
๐ Paper: arxiv.org/abs/2505.24671
๐คย Dataset: github.com/dayeonki/cul...
12.06.2025 23:33 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
7/ ๐ Whatโs next for Multi-Agent Debate?
Some exciting future directions:
1๏ธโฃ Assigning specific roles to represent diverse cultural perspectives
2๏ธโฃ Discovering optimal strategies for multi-LLM collaboration
3๏ธโฃ Designing better adjudication methods to resolve disagreements fairly ๐ค
12.06.2025 23:33 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
6/ But do these gains hold across cultures? ๐พ
๐ซ We measure cultural parity across diverse groups โ and find that Multi-Agent Debate not only boosts average accuracy but also leads to more equitable cultural alignment ๐
12.06.2025 23:33 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
5/ How do model decisions evolve through debate?
We track three phases of LLM behavior:
๐ Initial decision correctness
๐ Final decision correctness
๐ Judgeโs decision correctness
โจ Multi-Agent Debate is most valuable when models initially disagree!
12.06.2025 23:33 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
4/ ๐ฅย Distinct LLMs are complementary!
We find that:
๐คฏ Multi-Agent Debate lets smaller LLMs (7B) match the performance of much larger ones (27B)
๐ Best combo? Gemma-2 9B + EXAONE-3 7B ๐ช
12.06.2025 23:33 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
3/ Before bringing in two #LLMs, we first ๐ maximize single-LLM performance through:
1๏ธโฃ Cultural Contextualization: adding relevant rules-of-thumb for the target culture
2๏ธโฃ Self-Reflection: evaluating and improve its own outputs
These serve as strong baselines before we introduce collaboration ๐ค
12.06.2025 23:33 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
2/ ๐คย Why involve multiple #LLMs?
Different LLMs bring complementary perspectives and reasoning paths, thanks to variations in:
๐ฝ Training data
๐ง Alignment processes
๐ Language and cultural coverage
We explore one common form of collaboration: debate.
12.06.2025 23:33 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
1/ Are two #LLMs better than one for equitable cultural alignment? ๐
We introduce a Multi-Agent Debate framework โ where two LLM agents debate the cultural adaptability of a given scenario.
#ACL2025 ๐งต๐
12.06.2025 23:33 โ ๐ 6 ๐ 0 ๐ฌ 1 ๐ 1
Trying to collect all the MT people here. I probably missed many. Ping me!
bsky.app/starter-pack...
02.12.2024 08:39 โ ๐ 24 ๐ 8 ๐ฌ 9 ๐ 0
7/ Can AskQE handle naturally occurring translation errors too? ๐
Yes! It shows:
๐โโ๏ธ Stronger correlation with human judgments
โ
Better decision-making accuracy than standard QE metrics
21.05.2025 17:48 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
6/ ๐ค What kinds of questions does AskQE generate?
Most commonly:
๐ Extent โ How many COVID-19 cases were reported today? (24.6%)
๐ก Concept โ What is another name for paracetamol? (23.6%)
21.05.2025 17:48 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
5/ ๐ฅ We test AskQE on ContraTICO and find:
๐ It effectively distinguishes minor to critical translation errors
๐ญ It aligns closely with established quality estimation (QE) metrics
21.05.2025 17:48 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
4/ We introduce ContraTICO, a dataset of 8 contrastive MT error types in the COVID-19 domain ๐ท๐ฆ
โ ๏ธ Minor errors: spelling, word order, synonym, intensifier, expansion (no impact)
๐ Critical errors: expansion (impact), omission, alteration
21.05.2025 17:48 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
3/ AskQE has two main components:
โ Question Generation (QG): conditioned on the source + its entailed facts
โ Question Answering (QA): based on the source and backtranslated MT
If the answers donโt match... there's likely an error โ ๏ธ
21.05.2025 17:48 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
2/ But why question answering? ๐ค
1๏ธโฃ Provides functional explanations of MT quality
2๏ธโฃ Users can weigh the evidence based on their own judgment
3๏ธโฃ Aligns well with real-world cross-lingual communication strategies ๐
21.05.2025 17:48 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
1/ How can a monolingual English speaker ๐บ๐ธ decide if an automatic French translation ๐ซ๐ท is good enough to be shared?
Introducing โAskQEโ, an #LLM-based Question Generation + Answering framework that detects critical MT errors and provides actionable feedback ๐ฃ๏ธ
#ACL2025
21.05.2025 17:48 โ ๐ 1 ๐ 2 ๐ฌ 1 ๐ 0
How does the public conceptualize AI? Rather than self-reported measures, we use metaphors to understand the nuance and complexity of peopleโs mental models. In our #FAccT2025 paper, we analyzed 12,000 metaphors collected over 12 months to track shifts in public perceptions.
02.05.2025 01:19 โ ๐ 49 ๐ 14 ๐ฌ 3 ๐ 1
Multilinguality is happening at #NAACL2025
@crystinaz.bsky.social
@oxxoskeets.bsky.social
@dayeonki.bsky.social @onadegibert.bsky.social
30.04.2025 23:18 โ ๐ 14 ๐ 1 ๐ฌ 0 ๐ 0
"It was 80% me, 20% AI": Seeking Authenticity in Co-Writing with Large Language Models
Given the rising proliferation and diversity of AI writing assistance tools, especially those powered by large language models (LLMs), both writers and readers may have concerns about the impact of th...
Starting my journey on Bluesky with a topic that I care deeply about: AI tools can support creators in various ways, but disclosing AI use may risk devaluing creative work.
Check out our abstract here: angelhwang.github.io/doc/ic2s2_AI...
Inspired by our past work: arxiv.org/abs/2411.13032
18.04.2025 21:38 โ ๐ 24 ๐ 4 ๐ฌ 1 ๐ 1
7/ Taken together, we show that simpler texts are more translatable โ and more broadly, #LLM-assisted input rewriting is a promising direction for improving translations! ๐ฅ
As LLM-based writing assistants grow, we encourage future work on interactive, rewriting-based approaches to MT ๐ซก
17.04.2025 01:32 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
6/ ๐งโโ๏ธ Do humans actually prefer translations of simplified inputs?
Yes! They rated these to be:
๐ More contextually appropriate
๐๏ธ Easier to read
๐ค More comprehensible
compared to translations of original inputs!
17.04.2025 01:32 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
5/ What does input rewriting actually change? ๐ง
Here are 3 key findings:
1๏ธโฃย Better translatability trades-off meaning preservation
2๏ธโฃ Simplification boosts both input & output readability ๐
3๏ธโฃ Input rewriting > Output post-editing ๐คฏ
17.04.2025 01:32 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
4/ ๐คย Can we have more selective strategies?
Yes! By selecting rewrites based on translatability scores at inference time, we outperform all other methods ๐ฅ
17.04.2025 01:32 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
3/ ๐ Which rewriting strategy works best?
Simpler texts are easier to translate!
But... simplification isn't always a win for MT quality ๐
17.04.2025 01:32 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
2/ How should inputs be rewritten for machine translations? โ๏ธ
We explore 21 methods with different levels of MT-awareness ๐
๐ย MT-Agnostic: no knoweldge of the task
๐ย Task-Aware: aware of the end task (MT)
๐
ย Translatability-Aware: guided by quality estimation scores
17.04.2025 01:32 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
๐จย New Paper ๐จ
1/ We often assume that well-written text is easier to translate โ๏ธ
But can #LLMs automatically rewrite inputs to improve machine translation? ๐
Hereโs what we found ๐งต
17.04.2025 01:32 โ ๐ 8 ๐ 4 ๐ฌ 1 ๐ 0
Tokenization Workshop @ ICML 2025
๐จ NEW WORKSHOP ALERT ๐จ
We're thrilled to announce the first-ever Tokenization Workshop (TokShop) at #ICML2025 @icmlconf.bsky.social! ๐
Submissions are open for work on tokenization across all areas of machine learning.
๐
Submission deadline: May 30, 2025
๐ tokenization-workshop.github.io
15.04.2025 17:23 โ ๐ 23 ๐ 7 ๐ฌ 1 ๐ 4
Thrilled our global data ecosystem audit was accepted to #ICLR2025!
Empirically, it shows:
1๏ธโฃ Soaring synthetic text data: ~10M tokens (pre-2018) to 100B+ (2024).
2๏ธโฃ YouTube is now 70%+ of speech/video data but could block third-party collection.
3๏ธโฃ <0.2% of data from Africa/South America.
1/
14.04.2025 15:28 โ ๐ 12 ๐ 4 ๐ฌ 1 ๐ 1
PhD Student at @gronlp.bsky.social ๐ฎ, core dev @inseq.org. Interpretability โฉ HCI โฉ #NLProc.
gsarti.com
Language and keyboard stuff at Google + PhD student at Tokyo Institute of Technology.
I like computers and Korean and computers-and-Korean and high school CS education.
Georgia Tech โ ์ฐ์ธ๋ํ๊ต โ ๆฑไบฌๅทฅๆฅญๅคงๅญฆ.
https://theoreticallygoodwithcomputers.com/
PhD student in linguistics at the University of Kansas. Morphosyntax, variation, change, revitalization, and a whole lot of food. https://theycallmezeal.me he/him
(Sworn) Translator (ella/she). Translation teacher at Universitat Rovira i Virgili. PhD student; diss. on translation teacher's competence in Spain.
sarahorcas@gmail.com
PhD, CDT in NLP, University of Edinburgh. Prev: IIT Madras | University of Mumbai. She/her.
SNSF Professor at University of Zurich. #NLP / #ML.
http://www.cl.uzh.ch/sennrich
Prof. @ Karlsruhe Institute for Technology, NLP
CTO of the MITRA project @BAIR, UC Berkeley.
Research in ancient Asian low resource languages, especially text reuse, machine translation, semantic similarity search.
Buddhist studies MA, now PhD in computational linguistics @Duesseldorf university.
Dublin. 29. ๐ณ๏ธโ๐. He/Him. Brazilian. Partnered. Instructional Designer. PhD researcher in AI - QA for machine translation. Keratoconus is my nemesis.
This is a personal account.
Insta is @johnrihawf.
Associate Professor of Translation and Human-Centred AI @LeidenHumanities (NL). Loves metaphor, stylistics and (machine) translation. PI of NWO-Vidi project "Metaphors in Machine Translation: Reactions, Responses, Repercussions" (2025-2030).
Postdoc at @hitz-zentroa.bsky.social / internship @IKER zentroa (UMR5478)
Participatory research
Human-Centered NLP
Machine Translation
(eu) kontu pertsonala:
https://mastodon.eus/@XabierSoto
Researcher in ML/NLP at the University of Edinburgh (faculty at Informatics and EdinburghNLP), Co-Founder/CTO at www.miniml.ai, ELLIS (@ELLIS.eu) Scholar, Generative AI Lab (GAIL, https://gail.ed.ac.uk/) Fellow -- www.neuralnoise.com, he/they
Assistant professor at Universitat Autรฒnoma de Barcelona (UAB) researching on machine translation
Computational Linguistics, Speech Technology
Postdoc @ Saarland University ๐ฆ
Postdoc @ Brown DSI
VP R&D @ ClearMash
๐ฌ Passionate about high-fidelity numerical representations of reality, aligned with human perception.
https://omri.alphaxiv.io/
#nlp #multimodality #retrieval #hci #multi-agent
NLP. NMT. Main author of Marian NMT. Research Scientist at Microsoft Translator.
https://marian-nmt.github.io
Principal Research Scientist at IBM Research AI in New York. Speech, Formal/Natural Language Processing. Currently LLM post-training, structured SDG and RL. Opinions my own and non stationary.
ramon.astudillo.com
Research Scientist at FAIR, Meta. ๐ฌ My opinions are my own.
Researcher at @fbk-mt.bsky.social | inclusive and trustworthy machine translation | #NLP #Fairness #Ethics | she/her
https://yuzhaouoe.github.io/ | PhD Student @ University of Edinburgh | Opening the Black Box for Efficient Training/Inference