Cong Lu @cong-ml - Bluesky Profile

Are you interested in Open-Endedness and AI for Science? 🧪

I'm hiring a Student Researcher at Google DeepMind for a 6-month role. Join us to work on building agents capable of novel scientific discoveries! 🔬

Reach out if this sounds like you, and apply here 👇

docs.google.com/forms/d/e/1F...

11.11.2025 11:47 — 👍 4 🔁 0 💬 1 📌 0

StochasTok: Improving Fine-Grained Subword Understanding in LLMs Subword-level understanding is integral to numerous tasks, including understanding multi-digit numbers, spelling mistakes, abbreviations, rhyming, and wordplay. Despite this, current large language mo...

📄 Paper: arxiv.org/abs/2506.01687
💻 Code: github.com/anyasims/sto...
A massive 🙏 to my incredible co-authors: Anya Sims, Thom Foster, @klarakaleb.bsky.social, Tuan-Duy H. Nguyen, Joseph Lee, @jfoerst.bsky.social, @yeewhye.bsky.social!

[8/8]

11.06.2025 12:09 — 👍 1 🔁 0 💬 0 📌 0

The significant gains from this minimal change are super exciting, and we see huge potential for larger models and more complex tasks like coding, scientific reasoning, and beyond! We invite you to explore the paper and code!

[7/]

11.06.2025 12:09 — 👍 0 🔁 0 💬 1 📌 0

More major advantages! 🌟

COST-EFFECTIVE: StochasTok allows enhanced subword skills to be seamlessly 'retrofitted' into existing pretrained models - thus avoiding costly pretraining!
ENHANCED ROBUSTNESS: Improves resilience to alternative tokenizations! (see examples)

[6/]

11.06.2025 12:09 — 👍 0 🔁 0 💬 1 📌 0

Empirically, we find:
LANGUAGE: As hoped, StochasTok unlocks language manipulation ability! (see task examples below)
MATH: Furthermore, StochasTok dramatically changes multi-digit addition, enabling grokking and even generalization to UNSEEN TOKENIZERS!🤯

[5/]

11.06.2025 12:09 — 👍 0 🔁 0 💬 1 📌 0

Practically, StochasTok is:
✅Computationally lightweight🪶
✅A simple dataset preprocessing step — No training loop or inference time changes required!🛠️
✅Compatible with ANY base tokenizer — Allows us to retrofit pretrained models!💰
✅Robust to hyperparameter choice!🔥

[4/]

11.06.2025 12:09 — 👍 1 🔁 0 💬 1 📌 0

The underlying StochasTok algorithm is extremely simple!

1️⃣ Simply tokenize text with ANY base tokenizer,
2️⃣ Then, stochastically split some of those tokens into equivalent token pairs.

That’s basically it! Repeat step 2 for the desired granularity.

[3/]

11.06.2025 12:09 — 👍 0 🔁 0 💬 1 📌 0

🤔The problem: Standard tokenization gives distinct token IDs for each token - making it unnecessarily hard to learn, e.g., ‘book’=3092 and ‘cook’=171691 differ by a single letter.

🎉The solution: Allow LLMs to naturally 'see inside' tokens via alternative tokenizations!

[2/]

11.06.2025 12:09 — 👍 0 🔁 0 💬 1 📌 0

🚀Introducing “StochasTok: Improving Fine-Grained Subword Understanding in LLMs”!🚀

LLMs are incredible but still struggle disproportionately with subword tasks, e.g., for character counts, wordplay, multi-digit numbers, fixing typos… Enter StochasTok, led by Anya Sims!

[1/]

11.06.2025 12:09 — 👍 4 🔁 2 💬 1 📌 1

It was an honor to be on Quirks and Quarks (the
CBC science show) with @cong-ml.bsky.social talking about The AI Scientist and the impact of AI on science.

Science is being transformed by the AI revolution
cbc.ca/listen/live-...

14.02.2025 22:26 — 👍 8 🔁 2 💬 1 📌 0

Introducing Automated Capability Discovery!

ACD automatically identifies surprising new capabilities and failure modes in foundation models, via "self-exploration" (models exploring their own abilities).

Led by @cong-ml.bsky.social & @shengranhu.bsky.social
🔬🤖🧠🔎 [1/9]

12.02.2025 06:59 — 👍 19 🔁 3 💬 1 📌 0

It's an honor that The AI Scientist is #1 on this list!

www.linkedin.com/feed/update/...

Congrats @chris-lu.bsky.social @cong-ml.bsky.social @RobertTLange @hardmaru.bsky.social @jfoerst.bsky.social

08.01.2025 18:50 — 👍 23 🔁 3 💬 0 📌 0

Lots of interest in ADAS! Thanks everyone, and congrats
Shengran Hu and @cong-ml.bsky.social! 🚀🚀🚀

16.12.2024 18:19 — 👍 10 🔁 3 💬 0 📌 0

Honored to receive this award for ADAS!!

16.12.2024 21:33 — 👍 5 🔁 0 💬 0 📌 0

Our in-progress work Quality-Diversity Self-Play (w/ @cong-ml.bsky.social and @jeffclune.com) will have a poster presentation at #NeurIPS2024 workshops (@IMOLNeurIPS2024 Sunday West meeting room 217 - 219 and OpenworldAgents Sunday East Meeting Room 1-3, Foyer). Please come visit us!

14.12.2024 18:59 — 👍 9 🔁 1 💬 0 📌 1

Our work Automated Design of Agentic Systems (w/
Shengran Hu & @cong-ml.bsky.social) will have ✨two orals✨ @ #NeurIPS2024 workshops (LanGame Sat 10:20, OWA Sun 4:50). Please come visit us😃

We would also love to chat about open-endedness, LLM agents, etc. Come by if you want to meet!

10.12.2024 21:49 — 👍 12 🔁 2 💬 0 📌 0

Interested in robust model-based offline RL algorithms? Come check out Anya Sims presenting our new paper investigating the edge of reach problem in offline MBRL!

📍East Exhibit Hall A-C #4603

#NeurIPS2024

12.12.2024 00:34 — 👍 1 🔁 0 💬 0 📌 0

A new golden age of discovery In this essay, we take a tour of how AI is transforming scientific disciplines from genomics to computer science to weather forecasting. Some scientists are training their own AI models, while...

A great new essay on AI for Science from our colleagues here:

deepmind.google/public-polic...

26.11.2024 13:35 — 👍 22 🔁 5 💬 0 📌 1

The RL (and some non-RL folks) starter pack is almost full. Pretty clear that the academic move here has succeeded
go.bsky.app/3WPHcHg

18.11.2024 20:30 — 👍 104 🔁 32 💬 12 📌 3

Now that @jeffclune.bsky.social and @joelbot3000.bsky.social are here, time for an Open-Endedness starter pack.

go.bsky.app/MdVxrtD

20.11.2024 07:08 — 👍 105 🔁 32 💬 16 📌 5

Cong Lu

Latest posts by cong-ml.bsky.social on Bluesky

@cong-ml is following 19 prominent accounts