Raphaรซl Merx's Avatar

Raphaรซl Merx

@rapha.dev.bsky.social

PhD @ UniMelb NLP, with a healthy dose of MT Based in ๐Ÿ‡ฎ๐Ÿ‡ฉ, worked in ๐Ÿ‡น๐Ÿ‡ฑ ๐Ÿ‡ต๐Ÿ‡ฌ , from ๐Ÿ‡ซ๐Ÿ‡ท

42 Followers  |  87 Following  |  25 Posts  |  Joined: 16.11.2024  |  2.0687

Latest posts by rapha.dev on Bluesky

Preview
Tulun: Transparent and Adaptable Low-resource Machine Translation Raphael Merx, Hanna Suominen, Lois Yinghui Hong, Nick Thieberger, Trevor Cohn, Ekaterina Vylomova. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 3: Sy...

paper: aclanthology.org/2025.acl-dem...
demo: youtu.be/fQFwOxzR4MI

27.07.2025 15:59 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image Post image

in Vienna for ACL, presenting Tulun, a system for low-resource in-domain translation, using LLMs
Tuesday @ 4pm

Working w 2 real use cases: medical translation into Tetun ๐Ÿ‡น๐Ÿ‡ฑ & disaster relief speech translation in Bislama ๐Ÿ‡ป๐Ÿ‡บ

27.07.2025 15:59 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Cool paper, at the intersection of grammar and LLM interpretability.

I like that they use linguistic datasets for their experiments, then get results that can contribute to linguistics as a field too! (on structural priming vs L1/L2)

08.06.2025 04:47 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Thanks a lot! I didn't make it to Albuquerque unfortunately, but I hope to be in Vienna for ACL. Might see you there?

26.05.2025 02:25 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Low-resource Machine Translation: what for? who for? An observational study on a dedicated Tetun language translation service Raphael Merx, Adรฉrito Josรฉ Guterres Correia, Hanna Suominen, Ekaterina Vylomova. Proceedings of the Eighth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2025). 20...

Many thanks to Adรฉrito Correia (Timor-Leste INL), and my supervisors Hanna Suominen Katerina Vylomova!

Paper at aclanthology.org/2025.loresmt... , video presentation at youtu.be/8zenieJWRyg

25.05.2025 01:11 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

(3) The vast majority of usage is on mobile (over 90% of users / over 80k devices)

Takeaway: publishing MT model in mobile apps is probably more impactful than setting up a website / HuggingFace space.

25.05.2025 01:11 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

(2) Translation into Tetun is in higher demand (by >2x) than translation from Tetun

Takeaway for us MT folks: focus on translation into low-res langs, harder but more impactful

25.05.2025 01:11 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

We find that
(1) a LOT of usage is for educational purposes (>50% of translated text)
--> contrasts sharply with Tetun corpora (e.g. MADLAD), dominated by news & religion.

Takeaway: don't evaluate MT on overrepresented domains (e.g. religion)! You risk misrepresenting end-user exp.

25.05.2025 01:11 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Our paper on who uses tetun.org, and what for, got published at the LoResMT 2025 workshop! An emotional paper for me, going back to the project that got me into a machine learning PhD in the first place.

25.05.2025 01:11 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

Very interesting findings, particularly the benefit (or lack thereof) of test-time scaling across domains

13.05.2025 00:40 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

My favourite ICLR paper so far. Methodology, findings and their implications are all very cool.

In particular Fig. 2 + this discussion point:

08.05.2025 10:20 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Incredible paper, finding that large companies can game the LMArena through statistical noise (via many model submissions), over-sampling of their models, and overfitting to Arena-style prompts (without real gains on model reasoning)

The experiments they run to show this are pretty cool too!

02.05.2025 13:28 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Cool summary of issues with multilingual LLM eval, and potential solutions!

If you're doubtful of all these non-reproducible evals on translated multiple choice questions, this paper is for you

23.04.2025 09:50 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
GitHub - MaLA-LM/GlotEval: GlotEval: a unified evaluation toolkit designed to benchmark Large Language Models (LLMs) in a language-specific way GlotEval: a unified evaluation toolkit designed to benchmark Large Language Models (LLMs) in a language-specific way - MaLA-LM/GlotEval

GlotEval - a unified framework for multilingual eval of LLMs, on 7 different tasks, by @tiedeman.bsky.social @helsinki-nlp.bsky.social

Just wish it supported eval of closed models (e.g. through LiteLLM?)

github.com/MaLA-LM/Glot...

11.04.2025 07:41 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
PyConAU We are on BlueSky! Follow us and stay tuned! @pyconau.bsky.social

PyConAU We are on BlueSky! Follow us and stay tuned! @pyconau.bsky.social

๐Ÿ‘‹ Hey Bluesky!

Weโ€™ve just touched down and weโ€™re excited to be here ๐ŸŒค๏ธ๐Ÿ

This is the official PyCon AU account, your go-to space for updates, announcements, and all things Python in Australiaโœจ

Hit that follow button and stay tuned because weโ€™ve got some awesome things coming your way!

#PyConAU

30.03.2025 22:30 โ€” ๐Ÿ‘ 6    ๐Ÿ” 7    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

AI dev tools. In particular agents: are they hype or useful or both?

31.03.2025 03:20 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Perceptricon

26.03.2025 08:29 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

The right thing to do, thanks for this *SEM

17.03.2025 08:19 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Super impactful, thank you for this! A natural sequel of Gatitos.

I'm esp. fond of your "researcher in the loop" method to ensure wide vocab coverage.

20.02.2025 22:23 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

๐Ÿ˜ผSMOL DATA ALERT! ๐Ÿ˜ผAnouncing SMOL, a professionally-translated dataset for 115 very low-resource languages! Paper: arxiv.org/pdf/2502.12301
Huggingface: huggingface.co/datasets/goo...

19.02.2025 17:36 โ€” ๐Ÿ‘ 14    ๐Ÿ” 8    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1

Been hearing a lot about recency bias lately. Must be pretty important

15.01.2025 03:10 โ€” ๐Ÿ‘ 121    ๐Ÿ” 27    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Such a well put together video! Gherkins in the background got a supporting role

17.02.2025 01:35 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Congrats! I'm just getting started but really liked your papers. Cool, impactful and well-written

24.01.2025 08:57 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image Post image Post image Post image

Our paper on generating bilingual example sentences with LLMs got best paper award @ ALTA in Canberra!

arxiv.org/abs/2410.03182

We work with French / Indonesian / Tetun, find that annotators don't agree about what's a "good example", but that LLMs can align with a specific annotator.

05.12.2024 03:12 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
OpenAIโ€™s Whisper Faces Bad Press for Hallucinations in Healthcare Transcription The Associated Press, WIRED, Fortune, and other major media sources report hallucinations in healthcare by the OpenAI transcription tool Whisper.

Another example of why we need evals that take clinical risk into account when training NLP models for health
slator.com/openais-whis...

22.11.2024 13:27 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Yes pls!

21.11.2024 01:32 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image Post image Post image

What do it mean to be a โ€œlow resourcedโ€ language? Iโ€™ve seen definitions for less training data to low number of speakers. Great to see this important clarifying work at #EMNLP2024 from @hellinanigatu.bsky.social et al

aclanthology.org/2024.emnlp-m...

15.11.2024 21:43 โ€” ๐Ÿ‘ 24    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Video thumbnail

this guy lives rent free in my hippocampus

20.11.2024 07:01 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
The Bitter Religion: AIโ€™s Holy War Over Scaling Laws | The Generalist The AI community is locked in a doctrinal battle about its future and whether sufficient scale will create God.

Is productionisation (and move to gimmicks like CoT in o1-preview) at OpenAI and Anthropic a sign that scaling laws are slowing? And if so, where are we headed in LLMs?

Slightly pretentious but enjoyable read: www.generalist.com/briefing/the...

20.11.2024 06:55 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@rapha.dev is following 20 prominent accounts