Kyle Lo's Avatar

Kyle Lo

@kylelo.bsky.social

language model pretraining @ai2.bsky.social, co-lead of data research w/ @soldaini.net, statistics @uw, open science, tabletop, seattle, he/him,๐Ÿง‹ kyleclo.com

6,567 Followers  |  590 Following  |  525 Posts  |  Joined: 17.02.2023  |  2.6578

Latest posts by kylelo.bsky.social on Bluesky

Post image

yess!! sry bout the x-axis, still thinkin how to make figure clearer

it's exactly what you're saying -- each point refers to a stage of development. our release has data+ckpts+evals for all stages we use (figure) and wanted to show how it compares to other models which typically only few stages

21.11.2025 22:00 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Research Internship, OLMo Seattle, WA

We're hiring too!

Olmo 3 was our biggest effort yet, but we're still a small team (67 authors!) compared to a lot of the big labs, which means everyone (especially interns) gets to own a major piece of the Olmo puzzle

job-boards.greenhouse.io/thealleninst...

20.11.2025 18:20 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

๐ŸTry the model: playground.allenai.org
๐ŸDownload the collection: huggingface.co/collections/...
๐ŸŒRead the blog: allenai.org/blog/olmo3
๐ŸŽAnd our 100+ page paper lol www.datocms-assets.com/64837/176364...

20.11.2025 18:20 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿ•Finally, we all know midtraining is an exciting time to get a ton of performance boost

But team organization to sustain consistent model improvements (without burnout) is important!

We have explorers "own" target capabilities & centralized assessment team run "integration tests"

20.11.2025 18:20 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐ŸจData quality signals matter but also how you use them!

Traditional ways of using data quality is to threshold: Define a cutoff and take all the documents above that threshold.

But why not sample *proportional* to data quality?

We use Quality-Aware Upsampling to do exactly this

20.11.2025 18:20 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐ŸฃData mixing is a little too powerful

It's easy to learn "optimal" mixes that oversample from certain pockets heavily. eg, STEM docs are valuable for climbing MMLU & but you don't have infinite STEM docs

We approach mixing as Token Constrained Optimization over diverse evals

20.11.2025 18:20 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐ŸฆˆInvest in your experimental design!

We create evals better suited for different compute scales, with our "easy" set of tasks+metrics able to support very small scale experiments before switching to our "main" set of evals, on which smaller models are below noise floor

20.11.2025 18:20 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Post image Post image

we released Olmo 3! lot of exciting stuff but wanna focus on:

๐ŸŸOlmo 3 32B Base, the best fully-open base model to-date, near Qwen 2.5 & Gemma 3 on diverse evals
๐Ÿ Olmo 3 32B Think, first fully-open reasoning model approaching Qwen 3 levels
๐Ÿก12 training datasets corresp to different staged training

20.11.2025 18:20 โ€” ๐Ÿ‘ 42    ๐Ÿ” 7    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

going live with a mukbang tmr ๐Ÿฑ

19.11.2025 17:35 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

not happy abt gpt 5.1 update. it's making way more mistakes compared to gpt 5 on basic stuff

latex table formatting errors (straight up missing "&" so columns misaligned, or dropping a whole column, or shifting values by 1 position), feels unusable imo ๐Ÿ˜’

14.11.2025 12:26 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

congrats!! ๐ŸŽŠ

13.11.2025 00:18 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

omg is this a โ€œsimple trick they donโ€™t want u to knowโ€?

13.11.2025 00:17 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

picking between 3 checkpoints w/ same benchmark scores but what if one of them is agi

12.11.2025 17:31 โ€” ๐Ÿ‘ 11    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Yay congrats!!

07.11.2025 17:18 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

correct framing can make or break research contributions

06.11.2025 00:32 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Research Internship, OLMo Seattle, WA

apply here: job-boards.greenhouse.io/thealleninst...

i answer some FAQs on my site: kyleclo.com/mentorship/

05.11.2025 23:11 โ€” ๐Ÿ‘ 4    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

why intern at Ai2?

๐ŸŸinterns own major parts of our model development, sometimes even leading whole projects
๐Ÿกwe're committed to open science & actively help our interns publish their work

reach out if u wanna build open language models together ๐Ÿค

links ๐Ÿ‘‡

05.11.2025 23:11 โ€” ๐Ÿ‘ 27    ๐Ÿ” 8    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1

๐Ÿ™Œ๐Ÿป๐Ÿ™Œ๐Ÿป๐Ÿ™Œ๐Ÿป

05.11.2025 01:56 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

congrats to our olmo earth team ๐ŸŒŽ

small multimodal foundation language models + system for finetuning for important uses like agriculture, wildfire management, conservation & more ๐ŸŒฟ

04.11.2025 17:57 โ€” ๐Ÿ‘ 10    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

thanks for explaining & sorry it's come to this ๐Ÿ˜ฎโ€๐Ÿ’จ

curious your thoughts on other measures like restricting such pieces to senior authors who have a publication record in the surveyed area?

01.11.2025 20:17 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

more stories from scholar land ๐Ÿ˜…

gs tends to have higher cite counts than s2

s2 clusters paper copies & each cluster grants only +1 citation. without proper clusters, each version (eg, preprint vs published) grants citations.

sadly, users can be unhappy when s2 cite counts are lower cuz of this๐Ÿ˜ฅ

28.10.2025 00:58 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

lol yea

also not widely known but a core difference between gscholar & semantic scholar (s2)

gscholar separates UI & data, so when you merge papers, change is only local to your page but not your coauthors

s2 updates the underlying data, and UI reflects ground truth for all users

27.10.2025 18:26 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Google Scholar is manipulatable Citations are widely considered in scientists' evaluation. As such, scientists may be incentivized to inflate their citation counts. While previous literature has examined self-citations and citation ...

lol also arxiv.org/abs/2402.04607

26.10.2025 06:38 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

woah guess VLMs for OCR the hottest research topic this week๐Ÿ˜† since the first olmOCR, we've been..

๐Ÿ”ฅtraining our VLM using RLVR with binary unit test rewards๐Ÿ”ฅ

it's incredibly effective & unit test creation easy to scale w synthetic data pipelines

check it out at olmocr.allen.ai

22.10.2025 18:02 โ€” ๐Ÿ‘ 21    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

nice read thx for sharing! I think the piece could use a follow up / complement discussing misaligned incentives that push scientists to compete rather than collaborate (notably, the section on data fragmentation)

12.10.2025 03:03 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

bye #colm2025 big fan of the montreal bagels ๐Ÿฅฏ hot take I like them better than

11.10.2025 18:15 โ€” ๐Ÿ‘ 12    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

lol so much love for prepost-postpre training

09.10.2025 17:13 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

any other fans of pre-pretraining?

09.10.2025 14:53 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Video thumbnail

come say hi at posters this morning for OLMo 2 and fluid benchmarking posters ๐Ÿ‘‹ and dont miss @valentinhofmann.bsky.social's talk in morning #colm2025 @ai2.bsky.social vry proud of my gifs

09.10.2025 13:14 โ€” ๐Ÿ‘ 7    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

@josephc.bsky.social @mariaa.bsky.social and I are at poster #21

findings from large scale survey of 800 researchers on how they use LMs in their research #colm2025

08.10.2025 20:12 โ€” ๐Ÿ‘ 15    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@kylelo is following 20 prominent accounts