Dayeon (Zoey) Ki's Avatar

Dayeon (Zoey) Ki

@dayeonki.bsky.social

CS PhD @umdclip Multilingual / Culture #NLProc, MT https://dayeonki.github.io/

157 Followers  |  219 Following  |  24 Posts  |  Joined: 05.12.2024  |  1.785

Latest posts by dayeonki.bsky.social on Bluesky

Preview
Multiple LLM Agents Debate for Equitable Cultural Alignment Large Language Models (LLMs) need to adapt their predictions to diverse cultural contexts to benefit diverse communities across the world. While previous efforts have focused on single-LLM, single-tur...

8/ ๐Ÿ’Œ Huge thanks to @marinecarpuat.bsky.social, Rachel, and @zhoutianyi.bsky.social for their guidance โ€” and special shoutout to the amazing UMD CLIP team!

Check out our paper and code below ๐Ÿš€
๐Ÿ“„ Paper: arxiv.org/abs/2505.24671
๐Ÿค–ย Dataset: github.com/dayeonki/cul...

12.06.2025 23:33 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

7/ ๐ŸŒŸ Whatโ€™s next for Multi-Agent Debate?

Some exciting future directions:
1๏ธโƒฃ Assigning specific roles to represent diverse cultural perspectives
2๏ธโƒฃ Discovering optimal strategies for multi-LLM collaboration
3๏ธโƒฃ Designing better adjudication methods to resolve disagreements fairly ๐Ÿค

12.06.2025 23:33 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

6/ But do these gains hold across cultures? ๐Ÿ—พ

๐Ÿซ‚ We measure cultural parity across diverse groups โ€” and find that Multi-Agent Debate not only boosts average accuracy but also leads to more equitable cultural alignment ๐ŸŒ

12.06.2025 23:33 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

5/ How do model decisions evolve through debate?

We track three phases of LLM behavior:
๐Ÿ’— Initial decision correctness
๐Ÿ’š Final decision correctness
๐Ÿ’™ Judgeโ€™s decision correctness

โœจ Multi-Agent Debate is most valuable when models initially disagree!

12.06.2025 23:33 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

4/ ๐Ÿ”ฅย Distinct LLMs are complementary!

We find that:
๐Ÿคฏ Multi-Agent Debate lets smaller LLMs (7B) match the performance of much larger ones (27B)
๐Ÿ† Best combo? Gemma-2 9B + EXAONE-3 7B ๐Ÿ’ช

12.06.2025 23:33 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

3/ Before bringing in two #LLMs, we first ๐Ÿ“ˆ maximize single-LLM performance through:

1๏ธโƒฃ Cultural Contextualization: adding relevant rules-of-thumb for the target culture
2๏ธโƒฃ Self-Reflection: evaluating and improve its own outputs

These serve as strong baselines before we introduce collaboration ๐Ÿค

12.06.2025 23:33 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

2/ ๐Ÿค”ย Why involve multiple #LLMs?

Different LLMs bring complementary perspectives and reasoning paths, thanks to variations in:
๐Ÿ’ฝ Training data
๐Ÿง  Alignment processes
๐ŸŒ Language and cultural coverage

We explore one common form of collaboration: debate.

12.06.2025 23:33 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

1/ Are two #LLMs better than one for equitable cultural alignment? ๐ŸŒ

We introduce a Multi-Agent Debate framework โ€” where two LLM agents debate the cultural adaptability of a given scenario.

#ACL2025 ๐Ÿงต๐Ÿ‘‡

12.06.2025 23:33 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

Trying to collect all the MT people here. I probably missed many. Ping me!

bsky.app/starter-pack...

02.12.2024 08:39 โ€” ๐Ÿ‘ 24    ๐Ÿ” 8    ๐Ÿ’ฌ 9    ๐Ÿ“Œ 0
Preview
AskQE: Question Answering as Automatic Evaluation for Machine Translation How can a monolingual English speaker determine whether an automatic translation in French is good enough to be shared? Existing MT error detection and quality estimation (QE) techniques do not addres...

8/ โค๏ธย Huge thanks to @marinecarpuat.bsky.social, Kevin duh, and the amazing UMD CLIP team for all the feedback and inspiration throughout this work!

Weโ€™d love for you to check it out ๐Ÿš€
๐Ÿ“„ Paper: arxiv.org/abs/2504.11582
๐Ÿค–ย Dataset: github.com/dayeonki/askqe

21.05.2025 17:48 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

7/ Can AskQE handle naturally occurring translation errors too? ๐Ÿƒ

Yes! It shows:
๐Ÿ’โ€โ™€๏ธ Stronger correlation with human judgments
โœ… Better decision-making accuracy than standard QE metrics

21.05.2025 17:48 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

6/ ๐Ÿค– What kinds of questions does AskQE generate?

Most commonly:
๐Ÿ“ Extent โ€” How many COVID-19 cases were reported today? (24.6%)
๐Ÿ’ก Concept โ€” What is another name for paracetamol? (23.6%)

21.05.2025 17:48 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image Post image

5/ ๐Ÿ”ฅ We test AskQE on ContraTICO and find:

๐Ÿ“‰ It effectively distinguishes minor to critical translation errors
๐Ÿ‘ญ It aligns closely with established quality estimation (QE) metrics

21.05.2025 17:48 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

4/ We introduce ContraTICO, a dataset of 8 contrastive MT error types in the COVID-19 domain ๐Ÿ˜ท๐Ÿฆ 

โš ๏ธ Minor errors: spelling, word order, synonym, intensifier, expansion (no impact)
๐Ÿ“› Critical errors: expansion (impact), omission, alteration

21.05.2025 17:48 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

3/ AskQE has two main components:

โ“ Question Generation (QG): conditioned on the source + its entailed facts
โ• Question Answering (QA): based on the source and backtranslated MT

If the answers donโ€™t match... there's likely an error โš ๏ธ

21.05.2025 17:48 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

2/ But why question answering? ๐Ÿค”

1๏ธโƒฃ Provides functional explanations of MT quality
2๏ธโƒฃ Users can weigh the evidence based on their own judgment
3๏ธโƒฃ Aligns well with real-world cross-lingual communication strategies ๐ŸŒ

21.05.2025 17:48 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

1/ How can a monolingual English speaker ๐Ÿ‡บ๐Ÿ‡ธ decide if an automatic French translation ๐Ÿ‡ซ๐Ÿ‡ท is good enough to be shared?

Introducing โ“AskQEโ“, an #LLM-based Question Generation + Answering framework that detects critical MT errors and provides actionable feedback ๐Ÿ—ฃ๏ธ

#ACL2025

21.05.2025 17:48 โ€” ๐Ÿ‘ 1    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

How does the public conceptualize AI? Rather than self-reported measures, we use metaphors to understand the nuance and complexity of peopleโ€™s mental models. In our #FAccT2025 paper, we analyzed 12,000 metaphors collected over 12 months to track shifts in public perceptions.

02.05.2025 01:19 โ€” ๐Ÿ‘ 49    ๐Ÿ” 14    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 1
Post image

Multilinguality is happening at #NAACL2025

@crystinaz.bsky.social
@oxxoskeets.bsky.social
@dayeonki.bsky.social @onadegibert.bsky.social

30.04.2025 23:18 โ€” ๐Ÿ‘ 14    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
"It was 80% me, 20% AI": Seeking Authenticity in Co-Writing with Large Language Models Given the rising proliferation and diversity of AI writing assistance tools, especially those powered by large language models (LLMs), both writers and readers may have concerns about the impact of th...

Starting my journey on Bluesky with a topic that I care deeply about: AI tools can support creators in various ways, but disclosing AI use may risk devaluing creative work.

Check out our abstract here: angelhwang.github.io/doc/ic2s2_AI...
Inspired by our past work: arxiv.org/abs/2411.13032

18.04.2025 21:38 โ€” ๐Ÿ‘ 24    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Preview
Automatic Input Rewriting Improves Translation with Large Language Models Can we improve machine translation (MT) with LLMs by rewriting their inputs automatically? Users commonly rely on the intuition that well-written text is easier to translate when using off-the-shelf M...

8/ ๐Ÿซถย Huge thanks to my advisor @marinecarpuat.bsky.social and the amazing UMD CLIP folks for all the insightful discussions!

Please check out our paper accepted to NAACL 2025 ๐Ÿš€
๐Ÿ“„ Paper: arxiv.org/abs/2502.16682
๐Ÿค–ย Code: github.com/dayeonki/rew...

17.04.2025 01:32 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

7/ Taken together, we show that simpler texts are more translatable โ€” and more broadly, #LLM-assisted input rewriting is a promising direction for improving translations! ๐Ÿ’ฅ

As LLM-based writing assistants grow, we encourage future work on interactive, rewriting-based approaches to MT ๐Ÿซก

17.04.2025 01:32 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

6/ ๐Ÿง‘โ€โš–๏ธ Do humans actually prefer translations of simplified inputs?

Yes! They rated these to be:
๐Ÿ“ More contextually appropriate
๐Ÿ‘๏ธ Easier to read
๐Ÿค— More comprehensible
compared to translations of original inputs!

17.04.2025 01:32 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

5/ What does input rewriting actually change? ๐Ÿง

Here are 3 key findings:
1๏ธโƒฃย Better translatability trades-off meaning preservation
2๏ธโƒฃ Simplification boosts both input & output readability ๐Ÿ“–
3๏ธโƒฃ Input rewriting > Output post-editing ๐Ÿคฏ

17.04.2025 01:32 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

4/ ๐Ÿค”ย Can we have more selective strategies?

Yes! By selecting rewrites based on translatability scores at inference time, we outperform all other methods ๐Ÿ”ฅ

17.04.2025 01:32 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

3/ ๐Ÿ” Which rewriting strategy works best?

Simpler texts are easier to translate!
But... simplification isn't always a win for MT quality ๐Ÿ˜ž

17.04.2025 01:32 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

2/ How should inputs be rewritten for machine translations? โœ๏ธ

We explore 21 methods with different levels of MT-awareness ๐Ÿ‘‡
๐Ÿ“ย MT-Agnostic: no knoweldge of the task
๐ŸŒย Task-Aware: aware of the end task (MT)
๐Ÿ…ย Translatability-Aware: guided by quality estimation scores

17.04.2025 01:32 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿšจย New Paper ๐Ÿšจ

1/ We often assume that well-written text is easier to translate โœ๏ธ

But can #LLMs automatically rewrite inputs to improve machine translation? ๐ŸŒ

Hereโ€™s what we found ๐Ÿงต

17.04.2025 01:32 โ€” ๐Ÿ‘ 8    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Tokenization Workshop @ ICML 2025

๐Ÿšจ NEW WORKSHOP ALERT ๐Ÿšจ

We're thrilled to announce the first-ever Tokenization Workshop (TokShop) at #ICML2025 @icmlconf.bsky.social! ๐ŸŽ‰

Submissions are open for work on tokenization across all areas of machine learning.

๐Ÿ“… Submission deadline: May 30, 2025
๐Ÿ”— tokenization-workshop.github.io

15.04.2025 17:23 โ€” ๐Ÿ‘ 23    ๐Ÿ” 7    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 4
Post image

Thrilled our global data ecosystem audit was accepted to #ICLR2025!

Empirically, it shows:

1๏ธโƒฃ Soaring synthetic text data: ~10M tokens (pre-2018) to 100B+ (2024).

2๏ธโƒฃ YouTube is now 70%+ of speech/video data but could block third-party collection.

3๏ธโƒฃ <0.2% of data from Africa/South America.

1/

14.04.2025 15:28 โ€” ๐Ÿ‘ 12    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

@dayeonki is following 20 prominent accounts