All this history is nice, but which method actually performs best for math?
Read our latest blog to find out:
huggingface.co/spaces/huggi...
[6/N]
@garrethlee.bsky.social
๐ฎ๐ฉ | Co-Founder at Mundo AI (YC W25) | ex-{Hugging Face, Cohere}
All this history is nice, but which method actually performs best for math?
Read our latest blog to find out:
huggingface.co/spaces/huggi...
[6/N]
Rumor has it that earlier Claude models used a modified three-digit tokenization, processing numbers right-to-left instead of left-to-right.
This method mirrors how we often read and interpret numbers, like grouping digits with commas. Theoretically, this should help with math reasoning!
[5/N]
Alas, tokenizing numbers as digits was costly:
A 10-digit numbers now took 10 tokens instead of 3-4, which is ~2-3x more than before. That's a significant hit on training & inference costs!
LLaMA 3 fixed this by grouping numbers into threes, balancing compression and consistency.
[4/N]
Then came LLaMA 1, which took a clever approach to fix number inconsistencies: it tokenized numbers into individual digits (0-9), meaning any large number could now be represented with just 10 tokens.
The consistent representation of numbers made mathematical reasoning much better!
[3/N]
When GPT-2 came out in 2019, its tokenizer used byte-pair encoding (BPE), still common today:
โข Merges frequent substrings, saving memory vs. inputting single characters
โข However, vocabulary depends on training data
โข Common numbers (e.g., 1999) get single tokens; others are split
[2/N]
๐ With Meta's recent paper replacing tokenization in LLMs with patches ๐ฉน, I figured that it's a great time to revisit how tokenization has evolved over the years using everyone's favourite medium - memes!
Let's take a trip down memory lane!
[1/N]
Shouted out by the goat ๐ฅน๐ค
25.11.2024 16:07 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0I made a simple CLI tool to write conventional git commit messages using the Hugging Face Inference API ๐ค (with some useful functionality baked into it)
โก๏ธ To install: `pip install gcmt`