Kyle Lo @kylelo - Bluesky Profile

yess!! sry bout the x-axis, still thinkin how to make figure clearer

it's exactly what you're saying -- each point refers to a stage of development. our release has data+ckpts+evals for all stages we use (figure) and wanted to show how it compares to other models which typically only few stages

21.11.2025 22:00 — 👍 2 🔁 0 💬 0 📌 0

Research Internship, OLMo Seattle, WA

We're hiring too!

Olmo 3 was our biggest effort yet, but we're still a small team (67 authors!) compared to a lot of the big labs, which means everyone (especially interns) gets to own a major piece of the Olmo puzzle

job-boards.greenhouse.io/thealleninst...

20.11.2025 18:20 — 👍 2 🔁 1 💬 0 📌 0

🍏Try the model: playground.allenai.org
🍍Download the collection: huggingface.co/collections/...
🍌Read the blog: allenai.org/blog/olmo3
🍎And our 100+ page paper lol www.datocms-assets.com/64837/176364...

20.11.2025 18:20 — 👍 3 🔁 0 💬 1 📌 0

🍕Finally, we all know midtraining is an exciting time to get a ton of performance boost

But team organization to sustain consistent model improvements (without burnout) is important!

We have explorers "own" target capabilities & centralized assessment team run "integration tests"

20.11.2025 18:20 — 👍 1 🔁 0 💬 1 📌 0

🍨Data quality signals matter but also how you use them!

Traditional ways of using data quality is to threshold: Define a cutoff and take all the documents above that threshold.

But why not sample *proportional* to data quality?

We use Quality-Aware Upsampling to do exactly this

20.11.2025 18:20 — 👍 3 🔁 0 💬 1 📌 0

🍣Data mixing is a little too powerful

It's easy to learn "optimal" mixes that oversample from certain pockets heavily. eg, STEM docs are valuable for climbing MMLU & but you don't have infinite STEM docs

We approach mixing as Token Constrained Optimization over diverse evals

20.11.2025 18:20 — 👍 4 🔁 0 💬 1 📌 0

🦈Invest in your experimental design!

We create evals better suited for different compute scales, with our "easy" set of tasks+metrics able to support very small scale experiments before switching to our "main" set of evals, on which smaller models are below noise floor

20.11.2025 18:20 — 👍 6 🔁 0 💬 1 📌 1

we released Olmo 3! lot of exciting stuff but wanna focus on:

🐟Olmo 3 32B Base, the best fully-open base model to-date, near Qwen 2.5 & Gemma 3 on diverse evals
🐠Olmo 3 32B Think, first fully-open reasoning model approaching Qwen 3 levels
🐡12 training datasets corresp to different staged training

20.11.2025 18:20 — 👍 42 🔁 7 💬 1 📌 1

going live with a mukbang tmr 🍱

19.11.2025 17:35 — 👍 3 🔁 0 💬 0 📌 0

not happy abt gpt 5.1 update. it's making way more mistakes compared to gpt 5 on basic stuff

latex table formatting errors (straight up missing "&" so columns misaligned, or dropping a whole column, or shifting values by 1 position), feels unusable imo 😒

14.11.2025 12:26 — 👍 4 🔁 0 💬 0 📌 0

congrats!! 🎊

13.11.2025 00:18 — 👍 0 🔁 0 💬 0 📌 0

omg is this a “simple trick they don’t want u to know”?

13.11.2025 00:17 — 👍 1 🔁 0 💬 1 📌 0

picking between 3 checkpoints w/ same benchmark scores but what if one of them is agi

12.11.2025 17:31 — 👍 11 🔁 0 💬 1 📌 0

Yay congrats!!

07.11.2025 17:18 — 👍 3 🔁 0 💬 1 📌 0

correct framing can make or break research contributions

06.11.2025 00:32 — 👍 1 🔁 0 💬 0 📌 0

Research Internship, OLMo Seattle, WA

apply here: job-boards.greenhouse.io/thealleninst...

i answer some FAQs on my site: kyleclo.com/mentorship/

05.11.2025 23:11 — 👍 4 🔁 1 💬 0 📌 0

why intern at Ai2?

🐟interns own major parts of our model development, sometimes even leading whole projects
🐡we're committed to open science & actively help our interns publish their work

reach out if u wanna build open language models together 🤝

links 👇

05.11.2025 23:11 — 👍 27 🔁 8 💬 2 📌 1

🙌🏻🙌🏻🙌🏻

05.11.2025 01:56 — 👍 1 🔁 0 💬 0 📌 0

congrats to our olmo earth team 🌎

small multimodal foundation language models + system for finetuning for important uses like agriculture, wildfire management, conservation & more 🌿

04.11.2025 17:57 — 👍 10 🔁 0 💬 0 📌 0

thanks for explaining & sorry it's come to this 😮‍💨

curious your thoughts on other measures like restricting such pieces to senior authors who have a publication record in the surveyed area?

01.11.2025 20:17 — 👍 3 🔁 0 💬 0 📌 0

more stories from scholar land 😅

gs tends to have higher cite counts than s2

s2 clusters paper copies & each cluster grants only +1 citation. without proper clusters, each version (eg, preprint vs published) grants citations.

sadly, users can be unhappy when s2 cite counts are lower cuz of this😥

28.10.2025 00:58 — 👍 5 🔁 0 💬 1 📌 1

lol yea

also not widely known but a core difference between gscholar & semantic scholar (s2)

gscholar separates UI & data, so when you merge papers, change is only local to your page but not your coauthors

s2 updates the underlying data, and UI reflects ground truth for all users

27.10.2025 18:26 — 👍 4 🔁 0 💬 1 📌 0

Google Scholar is manipulatable Citations are widely considered in scientists' evaluation. As such, scientists may be incentivized to inflate their citation counts. While previous literature has examined self-citations and citation ...

lol also arxiv.org/abs/2402.04607

26.10.2025 06:38 — 👍 2 🔁 0 💬 2 📌 0

woah guess VLMs for OCR the hottest research topic this week😆 since the first olmOCR, we've been..

🔥training our VLM using RLVR with binary unit test rewards🔥

it's incredibly effective & unit test creation easy to scale w synthetic data pipelines

check it out at olmocr.allen.ai

22.10.2025 18:02 — 👍 21 🔁 3 💬 0 📌 0

nice read thx for sharing! I think the piece could use a follow up / complement discussing misaligned incentives that push scientists to compete rather than collaborate (notably, the section on data fragmentation)

12.10.2025 03:03 — 👍 6 🔁 0 💬 1 📌 0

bye #colm2025 big fan of the montreal bagels 🥯 hot take I like them better than

11.10.2025 18:15 — 👍 12 🔁 0 💬 0 📌 0

lol so much love for prepost-postpre training

09.10.2025 17:13 — 👍 0 🔁 0 💬 0 📌 0

any other fans of pre-pretraining?

09.10.2025 14:53 — 👍 2 🔁 0 💬 1 📌 0

come say hi at posters this morning for OLMo 2 and fluid benchmarking posters 👋 and dont miss @valentinhofmann.bsky.social's talk in morning #colm2025 @ai2.bsky.social vry proud of my gifs

09.10.2025 13:14 — 👍 7 🔁 0 💬 2 📌 0

@josephc.bsky.social @mariaa.bsky.social and I are at poster #21

findings from large scale survey of 800 researchers on how they use LMs in their research #colm2025

08.10.2025 20:12 — 👍 15 🔁 3 💬 0 📌 0

Kyle Lo

Latest posts by kylelo.bsky.social on Bluesky

@kylelo is following 20 prominent accounts