Jamo-Level Subword Tokenization in Low-Resource Korean Machine Translation
Junyoung Lee, Marco Cognetta, Sangwhan Moon, Naoaki Okazaki. Proceedings of the Eighth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2025). 2025.
Looks like NAACL is up on ACL Anthology, and so too is our ( Junyoung Lee + @sxm.bsky.social + Naoaki Okazaki) paper on Jamo-level Subword Tokenization for Korean Machine Translation (from LoResMT).
#tokenization #korean #nlp
28.04.2025 23:29 โ ๐ 3 ๐ 5 ๐ฌ 1 ๐ 1
Vivaldi Mail really is intolerant to fat fingers - wonder if I should file a feature request for a manual "send all in draft" button...
09.07.2023 11:17 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
Oh yes, you do. I should have read more carefully. :-)
09.07.2023 07:34 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
There is also the negger variant of that pattern, "X, Y, and Z has this - and if you don't implement it we have no choice but to switch".
09.07.2023 04:48 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
Been sifting through common crawl a bit due to a side project. Fresh reminder that the internet is full of spam and garbage...
I'm fairly confident I can get any model that has seen common crawl to write online casino and viagra spam in Korean
07.07.2023 16:47 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
Starting to post here as Threads made a conscious decision to not have a web client. Let's see which one sticks. (I like the concept of Mastodon, but the clients frankly are sort of crap.)
07.07.2023 15:04 โ ๐ 5 ๐ 0 ๐ฌ 0 ๐ 0
VP of Ecosystem Innovation at Raptive, usually either the privacy person in the ad meeting or the ad person in the privacy meeting. Speaking only for myself here he/his/him
https://blog.zgp.org/about/
Engineer on Google Chrome. Involved in CSS, W3C, and WHATWG standards. Previously Mozilla (2003-2020), W3C TAG (2015-2021). Massachusetts, USA.
Drinking best practices; Ex Fastly, W3C TAG, Financial Times. I will sort your pens in order of length while you're not looking. ๐๐ธ๐ช๐บ๐
๐
NLP PhD student @ Northeastern
Multilingual NLP, tokenizers
https://genesith.github.io/
Community Products @ Twitch
Formerly known as a Silicon Valley engineer, now in Honolulu and still miss SF Mission taquerias ๐ฏ.
๐ฑ-as-a-Service
HTTP Status Cats author ๐ซ
Tech doodler ๐จ
Works at Microsoft
Loves animals ๐๐ฆญ๐ข๐
@girlie_mac on Twitter
@girliemac on GitHub
Long live the web! Engineer on Google Chrome, ex-Opera.
PhD student @ ETH Zรผrich | all aspects of NLP but mostly evaluation and MT | go vegan | https://vilda.net
Now DuckDuckGo, formerly W3C & Opera.
Got a ukulele too. ๐
ๅๆผขๆๆทท็จ๋งจ. https://hongminhee.org/
Google Chrome DevRel Identity Tech Lead - Anything about browser identity features: passwords, OTPs, passkeys, identity federation, digital credentials, etc
SWE at Google DeepMind. Based in California. Likes/RPs โ endorsements. Opinions are my own. ่ถฃๅณใฏๅคๅใใคๆดป in the US. ๆ็จฟๅ
ๅฎนใฏๅฝ็ถใชใใ้็จไธป็ญใจใฏไธๅ้ขไฟใชใๅไบบ็ใชๆ่ฆ.
๐โจ๏ธ
Software engineer, mother
Putting human agency back into technology. Brussels, ๐ช๐บ.
โข tech, governance, science, politics, philosophy, infrastructure, cats, terrible puns
โข blog: https://berjon.com/
โข fmr W3C, NYT, ScienceAI, Protocol Labs
โข he/him/Ishmael
โข Signal robin.77
On a mission to make the web faster, one perf feature at a time. Web platform @ Shopify. WebPerfWG and WICG co-chair. RICG4life. Opinions are my own, etc.
Brian, you know, from the Internet. Dev Advocate at Igalia | Co-author Extensible Web Manifesto | Standards Dude (Igalia AC/OpenJS) https://bkardell.com/links | he/him
Architect for #openstandards, the Web and barcodes of all types. Tech policy. Ex-UK gov. Active in GS1, NHS & W3C. @hadleybeeman@w3c.social