yess!! sry bout the x-axis, still thinkin how to make figure clearer
it's exactly what you're saying -- each point refers to a stage of development. our release has data+ckpts+evals for all stages we use (figure) and wanted to show how it compares to other models which typically only few stages
21.11.2025 22:00 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0
Research Internship, OLMo
Seattle, WA
We're hiring too!
Olmo 3 was our biggest effort yet, but we're still a small team (67 authors!) compared to a lot of the big labs, which means everyone (especially interns) gets to own a major piece of the Olmo puzzle
job-boards.greenhouse.io/thealleninst...
20.11.2025 18:20 โ ๐ 2 ๐ 1 ๐ฌ 0 ๐ 0
๐Try the model: playground.allenai.org
๐Download the collection: huggingface.co/collections/...
๐Read the blog: allenai.org/blog/olmo3
๐And our 100+ page paper lol www.datocms-assets.com/64837/176364...
20.11.2025 18:20 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0
๐Finally, we all know midtraining is an exciting time to get a ton of performance boost
But team organization to sustain consistent model improvements (without burnout) is important!
We have explorers "own" target capabilities & centralized assessment team run "integration tests"
20.11.2025 18:20 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
๐จData quality signals matter but also how you use them!
Traditional ways of using data quality is to threshold: Define a cutoff and take all the documents above that threshold.
But why not sample *proportional* to data quality?
We use Quality-Aware Upsampling to do exactly this
20.11.2025 18:20 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0
๐ฃData mixing is a little too powerful
It's easy to learn "optimal" mixes that oversample from certain pockets heavily. eg, STEM docs are valuable for climbing MMLU & but you don't have infinite STEM docs
We approach mixing as Token Constrained Optimization over diverse evals
20.11.2025 18:20 โ ๐ 4 ๐ 0 ๐ฌ 1 ๐ 0
๐ฆInvest in your experimental design!
We create evals better suited for different compute scales, with our "easy" set of tasks+metrics able to support very small scale experiments before switching to our "main" set of evals, on which smaller models are below noise floor
20.11.2025 18:20 โ ๐ 6 ๐ 0 ๐ฌ 1 ๐ 1
we released Olmo 3! lot of exciting stuff but wanna focus on:
๐Olmo 3 32B Base, the best fully-open base model to-date, near Qwen 2.5 & Gemma 3 on diverse evals
๐ Olmo 3 32B Think, first fully-open reasoning model approaching Qwen 3 levels
๐ก12 training datasets corresp to different staged training
20.11.2025 18:20 โ ๐ 42 ๐ 7 ๐ฌ 1 ๐ 1
going live with a mukbang tmr ๐ฑ
19.11.2025 17:35 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0
not happy abt gpt 5.1 update. it's making way more mistakes compared to gpt 5 on basic stuff
latex table formatting errors (straight up missing "&" so columns misaligned, or dropping a whole column, or shifting values by 1 position), feels unusable imo ๐
14.11.2025 12:26 โ ๐ 4 ๐ 0 ๐ฌ 0 ๐ 0
congrats!! ๐
13.11.2025 00:18 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
omg is this a โsimple trick they donโt want u to knowโ?
13.11.2025 00:17 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
picking between 3 checkpoints w/ same benchmark scores but what if one of them is agi
12.11.2025 17:31 โ ๐ 11 ๐ 0 ๐ฌ 1 ๐ 0
Yay congrats!!
07.11.2025 17:18 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0
correct framing can make or break research contributions
06.11.2025 00:32 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
Research Internship, OLMo
Seattle, WA
apply here: job-boards.greenhouse.io/thealleninst...
i answer some FAQs on my site: kyleclo.com/mentorship/
05.11.2025 23:11 โ ๐ 4 ๐ 1 ๐ฌ 0 ๐ 0
why intern at Ai2?
๐interns own major parts of our model development, sometimes even leading whole projects
๐กwe're committed to open science & actively help our interns publish their work
reach out if u wanna build open language models together ๐ค
links ๐
05.11.2025 23:11 โ ๐ 27 ๐ 8 ๐ฌ 2 ๐ 1
๐๐ป๐๐ป๐๐ป
05.11.2025 01:56 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
congrats to our olmo earth team ๐
small multimodal foundation language models + system for finetuning for important uses like agriculture, wildfire management, conservation & more ๐ฟ
04.11.2025 17:57 โ ๐ 10 ๐ 0 ๐ฌ 0 ๐ 0
thanks for explaining & sorry it's come to this ๐ฎโ๐จ
curious your thoughts on other measures like restricting such pieces to senior authors who have a publication record in the surveyed area?
01.11.2025 20:17 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0
more stories from scholar land ๐
gs tends to have higher cite counts than s2
s2 clusters paper copies & each cluster grants only +1 citation. without proper clusters, each version (eg, preprint vs published) grants citations.
sadly, users can be unhappy when s2 cite counts are lower cuz of this๐ฅ
28.10.2025 00:58 โ ๐ 5 ๐ 0 ๐ฌ 1 ๐ 1
lol yea
also not widely known but a core difference between gscholar & semantic scholar (s2)
gscholar separates UI & data, so when you merge papers, change is only local to your page but not your coauthors
s2 updates the underlying data, and UI reflects ground truth for all users
27.10.2025 18:26 โ ๐ 4 ๐ 0 ๐ฌ 1 ๐ 0
woah guess VLMs for OCR the hottest research topic this week๐ since the first olmOCR, we've been..
๐ฅtraining our VLM using RLVR with binary unit test rewards๐ฅ
it's incredibly effective & unit test creation easy to scale w synthetic data pipelines
check it out at olmocr.allen.ai
22.10.2025 18:02 โ ๐ 21 ๐ 3 ๐ฌ 0 ๐ 0
nice read thx for sharing! I think the piece could use a follow up / complement discussing misaligned incentives that push scientists to compete rather than collaborate (notably, the section on data fragmentation)
12.10.2025 03:03 โ ๐ 6 ๐ 0 ๐ฌ 1 ๐ 0
bye #colm2025 big fan of the montreal bagels ๐ฅฏ hot take I like them better than
11.10.2025 18:15 โ ๐ 12 ๐ 0 ๐ฌ 0 ๐ 0
lol so much love for prepost-postpre training
09.10.2025 17:13 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
any other fans of pre-pretraining?
09.10.2025 14:53 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0
come say hi at posters this morning for OLMo 2 and fluid benchmarking posters ๐ and dont miss @valentinhofmann.bsky.social's talk in morning #colm2025 @ai2.bsky.social vry proud of my gifs
09.10.2025 13:14 โ ๐ 7 ๐ 0 ๐ฌ 2 ๐ 0
@josephc.bsky.social @mariaa.bsky.social and I are at poster #21
findings from large scale survey of 800 researchers on how they use LMs in their research #colm2025
08.10.2025 20:12 โ ๐ 15 ๐ 3 ๐ฌ 0 ๐ 0
Berkeley professor (Bioeng, Compbio). Visiting Scientist at Calico. JBrowse genome browser / Apollo annotation editor, ML for gene regulation / molecular evolution / synbio. Occasional music, games, jokes
MIT // researching fairness, equity, & pluralistic alignment in LLMs
previously @ MIT media lab, mila / mcgill
i like language and dogs and plants and ultimate frisbee and baking and sunsets
https://elinorp-d.github.io
CS PhD @UIUC | Prev. MSc @UMich & BEng @SJTU.
Interested in #NLProc & #AI.
leczhang.com
Mayor-Elect of New York City
Librarian and Community Builder
https://www.thomaspadilla.org/
AI Researcher at the Samsung SAIT AI Lab ๐ฑโ๐ป
I build generative models for images, videos, text, tabular data, NN weights, molecules, and now video games!
Building ventures. Educating leaders. Creating new technology. All at #CornellTech
Still writing. Editor at Large WIRED. Hackers, Crypto, Facebook: The Inside Story, Insanely Great and other books. Signal: stevenlevy.72
Mastodon: @nafnlaus@fosstodon.org
Twitter: @enn_nafnlaus
URL: https://softmaxdroptableartists.bandcamp.com/
#Energy #EVs #Ukraine #AI #Horticulture #Research
AI technical gov & risk management research. PhD student @MIT_CSAIL, fmr. UK AISI. I'm on the CS faculty job market! https://stephencasper.com/
Pre-doc @ai2.bsky.social
davidheineman.com
Unofficial bot by @vele.bsky.social w/ http://github.com/so-okada/bXiv https://arxiv.org/list/cs.LG/new
List https://bsky.app/profile/vele.bsky.social/lists/3lim7ccweqo2j
ModList https://bsky.app/profile/vele.bsky.social/lists/3lim3qnexsw2g
Language and keyboard stuff at Google + PhD student at Tokyo Institute of Technology.
I like computers and Korean and computers-and-Korean and high school CS education.
Georgia Tech โ ์ฐ์ธ๋ํ๊ต โ ๆฑไบฌๅทฅๆฅญๅคงๅญฆ.
https://theoreticallygoodwithcomputers.com/
๐ PhD student @uwcse @uwnlp. ๐ฉ Private pilot. Previously: ๐งโ๐ป @oculus, ๐ @IllinoisCS. ๐ ๐ฅพ ๐ดโโ๏ธ ๐ต โ ๏ธ
๐ ๏ธ Actionable Interpretability๐ @icmlconf.bsky.social 2025 | Bridging the gap between insights and actions โจ https://actionable-interpretability.github.io
Anthropologist from UniCPH and MS student in Machine Learning at Techincal University of Denmark. Apart from uncertainty in modelling, I care a lot about regenerative farming and food.
You can also find me here: https://www.linkedin.com/in/simoneiriksson
seeks to understand language.
Head of Cohere Labs
@Cohere_Labs @Cohere
PhD from @UvA_Amsterdam
https://marziehf.github.io/
SmolLMs & Data @huggingface
Training SmolLMs and curating high quality web and synthetic datasets โจ
https://loubnabnl.github.io/