Tomasz Limisiewicz's Avatar

Tomasz Limisiewicz

@tomlim.bsky.social

Postdoc at Meta and university of Washington in NLP. Before: PhD from Charles University (Prague 🏰). Interested in going into the inner workings of neural networks πŸ”, multilinguality 🌍, tokenization πŸ”‘ and fairer NLP βš–οΈ (he/him)

3,499 Followers  |  724 Following  |  21 Posts  |  Joined: 02.12.2023  |  2.0369

Latest posts by tomlim.bsky.social on Bluesky

Post image

Check the BLT poster at @aclmeeting.bsky.social . It’s just fortaste before the main presentation at @tokshop.bsky.social next week from Artidoro Pagnoni!

18.07.2025 20:11 β€” πŸ‘ 10    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Looking forward for out panel at 3:30. We’ll talk about future of tokenization: BLT, SuperBPE @alisawuffles.bsky.social, H-nets Albert Gu and further breakthroughs in tokenization @uvp.bsky.social, Sander Land, Kris Cao

bsky.app/profile/toks...

18.07.2025 02:24 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

It’d be great to meet at Tokenization Workshop @tokshop.bsky.social #icml
tomorrow July 18 starting at 8:45 in Meeting 112-113!

18.07.2025 02:24 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

The TokShop schedule is now live! Join us at #ICML2025 for invited talks, poster sessions, and a panel on the future of tokenization. tokenization-workshop.github.io/schedule #Tokenization #LLM #NLP

15.07.2025 22:28 β€” πŸ‘ 11    πŸ” 3    πŸ’¬ 0    πŸ“Œ 1
Post image

I'm pleased to be in Vancouver for @ICML this week πŸ‡¨πŸ‡¦πŸ€–. I'll be happy to chat about multilingual, multimodal LMs and tokenization(free).

16.07.2025 01:16 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

If you have experience with tokenization (who doesn’t) your help with reviewing will be hugely appreciated! πŸ” πŸ”‘

02.06.2025 21:23 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Got a good tokenization paper under review at COLM, but the scores were a letdown? 😬

Why bother with rebuttal when the perfect venue is right around the corner!

Submit your paper to the #ICML2025 Tokenization Workshop (TokShop) by May 30! πŸš€

28.05.2025 08:24 β€” πŸ‘ 11    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0
Post image

#NAACL2025 ended more than a week ago & @ufal-cuni.bsky.social folks were there:
Main conf: @kathaem.bsky.social presented joint work w/ @tomlim.bsky.social, @jlibovicky.bsky.social and Alex Fraser: Beyond Literal Token Overlap: Token Alignability for Multilinguality aclanthology.org/2025.naacl-s...

16.05.2025 12:07 β€” πŸ‘ 14    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Preview
ICML 2025 Workshop TokShop Welcome to the OpenReview homepage for ICML 2025 Workshop TokShop

πŸ“£ Call for Paper Alert: TokShop @ ICML 2025
TokShop explores tokenization across all data modalities. Topics include: subword NLP techniques, multimodal approaches, multilingual challenges, post-training modification, alternative representations, and statistical perspectives.

14.05.2025 13:31 β€” πŸ‘ 18    πŸ” 12    πŸ’¬ 1    πŸ“Œ 2

It’s finally official: the long-awaited Tokenization Workshop is here!

15.04.2025 17:10 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

So, apparently, confusing these two buttons can ignite a serious flame-war in reviewer-author discussionπŸ”₯ @aclmeeting.bsky.social

03.04.2025 17:01 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image Post image

Excited to continue my research adventure as a postdoc at @uwnlp.bsky.social and Meta! I’ve joined @lukezettlemoyer.bsky.social’s fantastic lab. Together, we plan to rethink how LLMs perceive data to unlock their capabilities to uncharted language and, further, beyond text!

31.03.2025 14:23 β€” πŸ‘ 14    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Paper πŸ‘‰Beyond Literal Token Overlap: Token Alignability for MultilingualityπŸ‘ˆ by @kathaem.bsky.social, @tomlim.bsky.social, @jlibovicky.bsky.social and Alex Fraser will appear at #NAACL2025! arxiv.org/abs/2502.06468 Congratulations to all authors! πŸ₯³

10.03.2025 15:52 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
Beyond Literal Token Overlap: Token Alignability for Multilinguality Previous work has considered token overlap, or even similarity of token distributions, as predictors for multilinguality and cross-lingual knowledge transfer in language models. However, these very li...

Happy to say that our paper "Beyond Literal Token Overlap: Token Alignability for Multilinguality" will be presented at #NAACL2025!

This is work with @tomlim.bsky.social, @jlibovicky.bsky.social, and Alex Fraser.

arxiv.org/abs/2502.06468

#newpaper #NLP #NLProc

03.03.2025 17:04 β€” πŸ‘ 11    πŸ” 3    πŸ’¬ 1    πŸ“Œ 2

It’d be great to stay in touch!

21.11.2024 10:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Work in progress -- suggestions for NLP-ers based in the EU/Europe & already on Bluesky very welcome!

go.bsky.app/NZDc31B

10.11.2024 17:24 β€” πŸ‘ 67    πŸ” 20    πŸ’¬ 48    πŸ“Œ 0

Haha, that's me, both name and surname 😁

21.11.2024 10:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Ahh, that's a pitty to miss that .

20.11.2024 09:57 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Thanks, I'm happy to hear that πŸ™‚. Do you have a rough estimate of when to expect a call for workshop proposals?

17.11.2024 21:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

How about workshops before or after the main conference?

17.11.2024 21:42 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Good to see you here! #nlp

14.11.2024 17:50 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Lexically Grounded Subword Segmentation JindΕ™ich LibovickΓ½, JindΕ™ich Helcl. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024.

Also this one:

Lexically Grounded Subword Segmentation
aclanthology.org/2024.emnlp-m...

Poster Session Nov 12 (Tue) 2 pm πŸ™‚

12.11.2024 10:55 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Fantastic list, thank you!

12.11.2024 08:44 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Tokenization is so back! at #EMNLP

12.11.2024 08:41 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

also, if you are in Miami for EMNLP this week don’t miss Hila Gonen's MRL keynote about fair multilingual tokenization (including MYTE).

Happening on Saturday (Nov 16) at 9:50 am ET MRL workshop (room: Jasmine).

11.11.2024 16:13 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
from transformers import T5ForConditionalGeneration 
from transformers import MyT5Tokenizer

MODEL_SIZE = "large" # small, base, or large
MODEL = f"Tomlim/myt5_{MODEL_SIZE}"

model = T5ForConditionalGeneration.from_pretrained(
  MODEL, use_safetensors=True)
  
tokenizer = MyT5Tokenizer.from_pretrained(MODEL)

from transformers import T5ForConditionalGeneration from transformers import MyT5Tokenizer MODEL_SIZE = "large" # small, base, or large MODEL = f"Tomlim/myt5_{MODEL_SIZE}" model = T5ForConditionalGeneration.from_pretrained( MODEL, use_safetensors=True) tokenizer = MyT5Tokenizer.from_pretrained(MODEL)

#firstpost

Are you working on NLP for low-resource or non-Latin script languages?

If yes, I have great news for you! Our MYTE tokenizer and MyT5 models πŸͺ² are now easily available throughπŸ€—. It’s easy to try:

11.11.2024 16:13 β€” πŸ‘ 8    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

If you are interested in AI, follow the folks in this starter pack! I have just updated it to include a few new arrivals here, but please let me know who else is missing

go.bsky.app/SipA7it

08.11.2024 09:44 β€” πŸ‘ 63    πŸ” 26    πŸ’¬ 26    πŸ“Œ 1

Great list, thanks for making start at πŸ¦‹ easier. I’d also love to be added to the list!

08.11.2024 21:18 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

That's awesome. Time for a fresh start at πŸ¦‹

08.11.2024 14:38 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

A starter pack for #NLP #NLProc researchers! πŸŽ‰

go.bsky.app/SngwGeS

04.11.2024 10:01 β€” πŸ‘ 254    πŸ” 101    πŸ’¬ 45    πŸ“Œ 14

@tomlim is following 20 prominent accounts