Check the BLT poster at @aclmeeting.bsky.social . Itβs just fortaste before the main presentation at @tokshop.bsky.social next week from Artidoro Pagnoni!
18.07.2025 20:11 β π 10 π 0 π¬ 0 π 0@tomlim.bsky.social
Postdoc at Meta and university of Washington in NLP. Before: PhD from Charles University (Prague π°). Interested in going into the inner workings of neural networks π, multilinguality π, tokenization π‘ and fairer NLP βοΈ (he/him)
Check the BLT poster at @aclmeeting.bsky.social . Itβs just fortaste before the main presentation at @tokshop.bsky.social next week from Artidoro Pagnoni!
18.07.2025 20:11 β π 10 π 0 π¬ 0 π 0Looking forward for out panel at 3:30. Weβll talk about future of tokenization: BLT, SuperBPE @alisawuffles.bsky.social, H-nets Albert Gu and further breakthroughs in tokenization @uvp.bsky.social, Sander Land, Kris Cao
bsky.app/profile/toks...
Itβd be great to meet at Tokenization Workshop @tokshop.bsky.social #icml
tomorrow July 18 starting at 8:45 in Meeting 112-113!
The TokShop schedule is now live! Join us at #ICML2025 for invited talks, poster sessions, and a panel on the future of tokenization. tokenization-workshop.github.io/schedule #Tokenization #LLM #NLP
15.07.2025 22:28 β π 11 π 3 π¬ 0 π 1I'm pleased to be in Vancouver for @ICML this week π¨π¦π€. I'll be happy to chat about multilingual, multimodal LMs and tokenization(free).
16.07.2025 01:16 β π 4 π 0 π¬ 0 π 0If you have experience with tokenization (who doesnβt) your help with reviewing will be hugely appreciated! π π‘
02.06.2025 21:23 β π 2 π 0 π¬ 0 π 0Got a good tokenization paper under review at COLM, but the scores were a letdown? π¬
Why bother with rebuttal when the perfect venue is right around the corner!
Submit your paper to the #ICML2025 Tokenization Workshop (TokShop) by May 30! π
#NAACL2025 ended more than a week ago & @ufal-cuni.bsky.social folks were there:
Main conf: @kathaem.bsky.social presented joint work w/ @tomlim.bsky.social, @jlibovicky.bsky.social and Alex Fraser: Beyond Literal Token Overlap: Token Alignability for Multilinguality aclanthology.org/2025.naacl-s...
π£ Call for Paper Alert: TokShop @ ICML 2025
TokShop explores tokenization across all data modalities. Topics include: subword NLP techniques, multimodal approaches, multilingual challenges, post-training modification, alternative representations, and statistical perspectives.
Itβs finally official: the long-awaited Tokenization Workshop is here!
15.04.2025 17:10 β π 1 π 1 π¬ 0 π 0So, apparently, confusing these two buttons can ignite a serious flame-war in reviewer-author discussionπ₯ @aclmeeting.bsky.social
03.04.2025 17:01 β π 6 π 0 π¬ 0 π 0Excited to continue my research adventure as a postdoc at @uwnlp.bsky.social and Meta! Iβve joined @lukezettlemoyer.bsky.socialβs fantastic lab. Together, we plan to rethink how LLMs perceive data to unlock their capabilities to uncharted language and, further, beyond text!
31.03.2025 14:23 β π 14 π 1 π¬ 1 π 0Paper πBeyond Literal Token Overlap: Token Alignability for Multilingualityπ by @kathaem.bsky.social, @tomlim.bsky.social, @jlibovicky.bsky.social and Alex Fraser will appear at #NAACL2025! arxiv.org/abs/2502.06468 Congratulations to all authors! π₯³
10.03.2025 15:52 β π 5 π 1 π¬ 0 π 0Happy to say that our paper "Beyond Literal Token Overlap: Token Alignability for Multilinguality" will be presented at #NAACL2025!
This is work with @tomlim.bsky.social, @jlibovicky.bsky.social, and Alex Fraser.
arxiv.org/abs/2502.06468
#newpaper #NLP #NLProc
Itβd be great to stay in touch!
21.11.2024 10:59 β π 0 π 0 π¬ 0 π 0Work in progress -- suggestions for NLP-ers based in the EU/Europe & already on Bluesky very welcome!
go.bsky.app/NZDc31B
Haha, that's me, both name and surname π
21.11.2024 10:15 β π 0 π 0 π¬ 0 π 0Ahh, that's a pitty to miss that .
20.11.2024 09:57 β π 0 π 0 π¬ 0 π 0Thanks, I'm happy to hear that π. Do you have a rough estimate of when to expect a call for workshop proposals?
17.11.2024 21:49 β π 0 π 0 π¬ 1 π 0How about workshops before or after the main conference?
17.11.2024 21:42 β π 0 π 0 π¬ 1 π 0Good to see you here! #nlp
14.11.2024 17:50 β π 0 π 0 π¬ 0 π 0Also this one:
Lexically Grounded Subword Segmentation
aclanthology.org/2024.emnlp-m...
Poster Session Nov 12 (Tue) 2 pm π
Fantastic list, thank you!
12.11.2024 08:44 β π 0 π 0 π¬ 0 π 0Tokenization is so back! at #EMNLP
12.11.2024 08:41 β π 7 π 0 π¬ 0 π 0also, if you are in Miami for EMNLP this week donβt miss Hila Gonen's MRL keynote about fair multilingual tokenization (including MYTE).
β¨Happening on Saturday (Nov 16) at 9:50 am ET MRL workshop (room: Jasmine).
from transformers import T5ForConditionalGeneration from transformers import MyT5Tokenizer MODEL_SIZE = "large" # small, base, or large MODEL = f"Tomlim/myt5_{MODEL_SIZE}" model = T5ForConditionalGeneration.from_pretrained( MODEL, use_safetensors=True) tokenizer = MyT5Tokenizer.from_pretrained(MODEL)
#firstpost
Are you working on NLP for low-resource or non-Latin script languages?
If yes, I have great news for you! Our MYTE tokenizer and MyT5 models πͺ² are now easily available throughπ€. Itβs easy to try:
If you are interested in AI, follow the folks in this starter pack! I have just updated it to include a few new arrivals here, but please let me know who else is missing
go.bsky.app/SipA7it
Great list, thanks for making start at π¦ easier. Iβd also love to be added to the list!
08.11.2024 21:18 β π 1 π 0 π¬ 1 π 0That's awesome. Time for a fresh start at π¦
08.11.2024 14:38 β π 1 π 0 π¬ 0 π 0A starter pack for #NLP #NLProc researchers! π
go.bsky.app/SngwGeS