Almost all post-training is "dusting off capable base models"
28.07.2025 02:52 β π 1 π 0 π¬ 1 π 0@mechanicaldirk.bsky.social
Training big models at @ai2.bsky.social.
Almost all post-training is "dusting off capable base models"
28.07.2025 02:52 β π 1 π 0 π¬ 1 π 0Unverified second hand information: In the US, all fish has to be flash-frozen before being served raw. In Canada, it does not.
18.07.2025 22:39 β π 0 π 0 π¬ 0 π 0I think for the moment we're competing on a different axis. They do quite well on impact per GPU hour. We do well on impact per person hour.
24.06.2025 06:28 β π 1 π 0 π¬ 0 π 0In ML, you can get surprisingly far without ever looking at your training data, and yet you'll always be limited. Thus, in ML, "look at the data" means, "Don't just stir the pot of linear algebra, find out what's really happening."
08.06.2025 16:48 β π 4 π 0 π¬ 0 π 0This project is a perfect model of an OLMo contribution. Well scoped, practical, sound theoretical underpinnings, and @lambdaviking.bsky.social
submitted the paper 24h before the deadline π.
It's integrated into the OLMo trainer here: github.com/allenai/OLMo...
Meanwhile, OLMo is now the citation for QK norm, which we definitely didn't invent? You win some, you lose some.
13.05.2025 17:22 β π 2 π 0 π¬ 0 π 0Finally, OLMo 1B. This is the most commonly requested OLMo feature l, and it's finally here.
01.05.2025 22:31 β π 1 π 0 π¬ 0 π 0After ICML, I decided all conferences should be in Vienna from now on.
23.04.2025 21:34 β π 0 π 0 π¬ 0 π 0I'm in Singapore for @iclr-conf.bsky.social ! Come check out our spotlight paper on the environmental impact of training OLMo (link in next tweet) during the Saturday morning poster session from 10-12:30 -- happy to chat about this or anything else! DMs should be open, email works too
23.04.2025 15:21 β π 10 π 5 π¬ 1 π 1Came across arxiv.org/pdf/2504.05058 today. What a cool example of work you can do when LLM training data is open!
18.04.2025 17:46 β π 7 π 0 π¬ 1 π 0Plot shows the relationship between compute used to predict a ranking of datasets and how accurately that ranking reflects performance at the target (1B) scale of models pretrained from scratch on those datasets.
Ever wonder how LLM developers choose their pretraining data? Itβs not guessworkβ all AI labs create small-scale models as experiments, but the models and their data are rarely shared.
DataDecide opens up the process: 1,050 models, 30k checkpoints, 25 datasets & 10 benchmarks π§΅
Today we're unveiling OLMoTrace, a tool that enables everyone to understand the outputs of LLMs by connecting to their training data.
We do this on unprecedented scale and in real time: finding matching text between model outputs and 4 trillion training tokens within seconds. β¨
The fact that my Bsky feed is all tariffs and none Llama 4 means the platform is pretty much cooked for research purposes.
07.04.2025 16:15 β π 1 π 0 π¬ 1 π 0Segmentation of the sentence "By the way, I am a fan of the Milky Way" under BPE and SuperBPE.
We created SuperBPEπ, a *superword* tokenizer that includes tokens spanning multiple words.
When pretraining at 8B scale, SuperBPE models consistently outperform the BPE baseline on 30 downstream tasks (+8% MMLU), while also being 27% more efficient at inference time.π§΅
It costs $90k. The $1000 are just a down payment.
16.03.2025 18:56 β π 1 π 0 π¬ 1 π 0Error bars! @hails.computer will be so proud!
13.03.2025 22:32 β π 2 π 0 π¬ 0 π 0Biggest one yet! Best one yet!
Plus, some fun training stories at the bottom of the blog post (allenai.org/blog/olmo2-32B).
When I played Civilization, I always named my religion "PDF", so I can convert cities to PDF.
10.03.2025 21:05 β π 2 π 0 π¬ 0 π 0Introducing olmOCR, our open-source tool to extract clean plain text from PDFs!
Built for scale, olmOCR handles many document types with high throughput. Run it on your own GPU for freeβat over 3000 token/s, equivalent to $190 per million pages, or 1/32 the cost of GPT-4o!
That seems like a completely normal sleep schedule for a 3 month old. Source: My kid.
19.02.2025 23:54 β π 1 π 0 π¬ 0 π 0We took our most efficient model and made an open-source iOS appπ±but why?
As phones get faster, more AI will happen on device. With OLMoE, researchers, developers, and users can get a feel for this future: fully private LLMs, available anytime.
Learn more from @soldaini.netπ youtu.be/rEK_FZE5rqQ
#humblebrag
10.02.2025 17:52 β π 4 π 0 π¬ 0 π 0I used Thinkmate. I want to roughly pick my own specs while knowing nothing about compatibility. I don't like RGB lights everywhere. And I want the thing to be reliable. No regrets.
26.01.2025 01:00 β π 0 π 0 π¬ 0 π 014.8T tokens in 2.8M hours is about 1500 tokens per second. That's a very good number for 37B active parameters, but by no means unbelievable.
26.01.2025 00:57 β π 0 π 0 π¬ 0 π 0You posted about AI.
26.01.2025 00:52 β π 1 π 0 π¬ 0 π 0I haven't read it. But I did listen to an AI generated conversation about its contents...
25.01.2025 06:15 β π 3 π 0 π¬ 0 π 0Behind the scenes with what its like to build language models and pursue (hopefully) cutting edge AI research
Interviewing OLMo 2 leads: Open secrets of training language models
What we have learned and are going to do next.
YouTube: https://buff.ly/40IlSFF
Podcast / notes:
In November, every post here was about NLP. Now it's all about TikTok. We're doing the Twitter speed run.
19.01.2025 20:15 β π 2 π 0 π¬ 0 π 0A few days ago, we did finally release the OLMo 2 tech report: arxiv.org/pdf/2501.00656. There is a lot of good stuff in there, but the stability work we did over the summer makes me particularly proud.
06.01.2025 20:03 β π 1 π 0 π¬ 0 π 0Everyone wants open-source language models but no one wants to lift these heavy ass weights.
We just released our paper "2 OLMo 2 Furious"
Can't stop us in 2025. Links below.