Raphael Schumann's Avatar

Raphael Schumann

@schumann.bsky.social

Natural Language Processing PhD Student @ Heidelberg University. https://schumann.pub #NLP #NLProc #ML #AI

1,805 Followers  |  868 Following  |  12 Posts  |  Joined: 13.09.2023  |  1.7284

Latest posts by schumann.bsky.social on Bluesky

Same boat as your AC

02.03.2025 11:13 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Could you add me please?

14.01.2025 18:31 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

CBOW vs. Skip-gram

20.12.2024 11:59 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Great work! Are you going to release the models?

14.12.2024 11:16 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

A starter pack for #NLP #NLProc researchers! πŸŽ‰

go.bsky.app/SngwGeS

04.11.2024 10:01 β€” πŸ‘ 253    πŸ” 100    πŸ’¬ 45    πŸ“Œ 13

#EMNLP has a nice set of tokenization/subword modeling papers this year.

It's a good mix of tokenization algorithms, tokenization evaluation, tokenization-free methods, and subword embedding probing. Lmk if I missed some!

Here is a list with links + presentation time (in chronological order).

11.11.2024 22:38 β€” πŸ‘ 48    πŸ” 16    πŸ’¬ 5    πŸ“Œ 2

First time ML/NLP Bluesky feels alive.

07.11.2024 21:39 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This helped a lot!

07.11.2024 21:27 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I make sure to even delete paths with my username from code in supplementary material

05.01.2024 15:49 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
State of the art - ACL Wiki

TIL that the ACL Wiki has/had a state-of-the-art overview:

aclweb.org/aclwiki/Stat...

27.11.2023 09:12 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

It also works with Flash Attention 2, although I don't see additional speedups. I don't think FA is optimized for generation.

13.10.2023 11:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Using padding and prefill during inference in huggingface transformers Using padding and prefill during inference in huggingface transformers - run_padding_prefill.py

Conceptually it is clear that this works but I wasn't aware that huggingface passes this through correctly.
Github Gist to reproduce:
gist.github.com/raphael-sch/...

13.10.2023 11:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

You have to place the padding tokens in between the prefill and input tokens (example with 3 prefilled tokens):
input_ids: [0, 0, X, X, X, X]
position_ids: [0, 0, 3, 4, 5, 6]
attn_mask: [1, 1, 1, 0, 0, 1, 1, 1, 1]

13.10.2023 11:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Turns out that with the right attention_mask and position_ids you can prefill tokens AND pad batches in huggingface transformers. This speeds up inference, especially if if each instance has the same system prompt prepended. Code below ↓

13.10.2023 11:34 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1

@schumann is following 20 prominent accounts