Raphael Schumann @schumann - Bluesky Profile

Raphael Schumann

@schumann.bsky.social

Natural Language Processing PhD Student @ Heidelberg University. https://schumann.pub #NLP #NLProc #ML #AI

1,805 Followers | 868 Following | 12 Posts | Joined: 13.09.2023 | 1.7284

Latest posts by schumann.bsky.social on Bluesky

Same boat as your AC

02.03.2025 11:13 — 👍 2 🔁 0 💬 1 📌 0

Could you add me please?

14.01.2025 18:31 — 👍 5 🔁 0 💬 0 📌 0

CBOW vs. Skip-gram

20.12.2024 11:59 — 👍 6 🔁 0 💬 0 📌 0

Great work! Are you going to release the models?

14.12.2024 11:16 — 👍 6 🔁 0 💬 0 📌 0

A starter pack for #NLP #NLProc researchers! 🎉

go.bsky.app/SngwGeS

04.11.2024 10:01 — 👍 253 🔁 100 💬 45 📌 13

#EMNLP has a nice set of tokenization/subword modeling papers this year.

It's a good mix of tokenization algorithms, tokenization evaluation, tokenization-free methods, and subword embedding probing. Lmk if I missed some!

Here is a list with links + presentation time (in chronological order).

11.11.2024 22:38 — 👍 48 🔁 16 💬 5 📌 2

First time ML/NLP Bluesky feels alive.

07.11.2024 21:39 — 👍 3 🔁 0 💬 0 📌 0

This helped a lot!

07.11.2024 21:27 — 👍 1 🔁 0 💬 0 📌 0

I make sure to even delete paths with my username from code in supplementary material

05.01.2024 15:49 — 👍 1 🔁 0 💬 0 📌 0

State of the art - ACL Wiki

TIL that the ACL Wiki has/had a state-of-the-art overview:

aclweb.org/aclwiki/Stat...

27.11.2023 09:12 — 👍 1 🔁 0 💬 0 📌 0

It also works with Flash Attention 2, although I don't see additional speedups. I don't think FA is optimized for generation.

13.10.2023 11:35 — 👍 0 🔁 0 💬 0 📌 0

Using padding and prefill during inference in huggingface transformers Using padding and prefill during inference in huggingface transformers - run_padding_prefill.py

Conceptually it is clear that this works but I wasn't aware that huggingface passes this through correctly.
Github Gist to reproduce:
gist.github.com/raphael-sch/...

13.10.2023 11:35 — 👍 0 🔁 0 💬 1 📌 0

You have to place the padding tokens in between the prefill and input tokens (example with 3 prefilled tokens):
input_ids: [0, 0, X, X, X, X]
position_ids: [0, 0, 3, 4, 5, 6]
attn_mask: [1, 1, 1, 0, 0, 1, 1, 1, 1]

13.10.2023 11:35 — 👍 0 🔁 0 💬 1 📌 0

Turns out that with the right attention_mask and position_ids you can prefill tokens AND pad batches in huggingface transformers. This speeds up inference, especially if if each instance has the same system prompt prepended. Code below ↓

13.10.2023 11:34 — 👍 4 🔁 0 💬 1 📌 1

@schumann is following 20 prominent accounts

Benjamin Minixhofer
@bminixhofer

Anne Lauscher
@a-lauscher

Professor of Data Science Lead of @ds-hamburg.bsky.social Researching Safe Generative AI

Marc Marone
@marcmarone.com

PhD student at JHU. @Databricks MosaicML, Microsoft Semantic Machines/Translate, Georgia Tech. I like datasets! https://marcmarone.com/

Kate Sanders
@kesnet50

Final year Ph.D. candidate in NLP, CV at JHU. Researching reasoning systems, multimodality, and AI for science. On the job market for full-time industry positions! #NLProc https://katesanders9.github.io/

Carlos Aguirre
@carlosaguirre

Applied Scientist @ Amazon (Posts are my own opinion) Previously PhD@JHU

Yu Lu Liu
@liuyulu

PhD student at Johns Hopkins University Alumni from McGill University & MILA Working on NLP Evaluation, Responsible AI, Human-AI interaction she/her 🇨🇦

Krithika Ramesh
@stolenpyjak

(she/her) ¯⁠\⁠_⁠(⁠ツ⁠)⁠_⁠/⁠¯ PhD student @jhuclsp | Prev @IndiaMSR

Henry Li Xinyuan
@hstehste

PhD student @JHU CLSP hstehstehste.github.io

Kaiser Sun
@kaiserwholearns

Ph.D. student at @jhuclsp, human LM that hallucinates. Formerly @MetaAI, @uwnlp, and @AWS they/them🏳️‍🌈 #NLProc #NLP Crossposting on X.

Alexander Martin
@alexmartin314

PhD student at JHU; NSF GRF; BS in CS UofR alexmartin1722.github.io

Neha Verma
@nverma1

PhD student @jhuclsp. Previously @yale, intern @AIatMeta | Efficient models, merging, MT

Rachel Wicks
@rewicks

PhD student @jhuclsp I work on multilingual data for training and evaluation. rewicks.github.io

David Mueller
@david-mueller

PhD Candidate @ JHU CLSP Machine Learning and NLP Research www.damueller.com I study transfer and multi-task learning. Incoming applied scientist at Amazon; prev at Netflix

William Jurayj
@williamjurayj

PhD student at Johns Hopkins CLSP (@jhuclsp.bsky.social). Researching natural and formal language processing. williamjurayj.com

Aleem Khan
@akhan62

PhD student at JHU https://aleemkhan62.github.io

Aaron Gokaslan
@skylion

Maker of the OpenWebText. Mozilla Rise25 @PyTorch Core Reviewer. PhD Candidate at @Cornell (interning at @MosaicML) Previously @FacebookAI and @BrownUniversity

Stella Li
@stellali

PhD student @uwnlp.bsky.social @uwcse.bsky.social | visiting researcher @MetaAI | previously @jhuclsp.bsky.social https://stellalisy.com

@marcmarone

Caleb Ziems
@calebziems.com

PhD student at Stanford NLP. Working on Social NLP and CSS. Previously at GaTech, Meta AI, Emory. 📍Palo Alto, CA 🔗 calebziems.com

Yash Kumar Lal ✈️ #NAACL2025
@ykl7

PhD candidate at Stony Brook University; Prev: Google Research, AI2 Aristo, Salesforce Research; MS from JHU https://ykl7.github.io