Mechanical Dirk's Avatar

Mechanical Dirk

@mechanicaldirk.bsky.social

Training big models at @ai2.bsky.social.

504 Followers  |  241 Following  |  51 Posts  |  Joined: 13.12.2023  |  2.0185

Latest posts by mechanicaldirk.bsky.social on Bluesky

Almost all post-training is "dusting off capable base models"

28.07.2025 02:52 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Unverified second hand information: In the US, all fish has to be flash-frozen before being served raw. In Canada, it does not.

18.07.2025 22:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I think for the moment we're competing on a different axis. They do quite well on impact per GPU hour. We do well on impact per person hour.

24.06.2025 06:28 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

In ML, you can get surprisingly far without ever looking at your training data, and yet you'll always be limited. Thus, in ML, "look at the data" means, "Don't just stir the pot of linear algebra, find out what's really happening."

08.06.2025 16:48 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This project is a perfect model of an OLMo contribution. Well scoped, practical, sound theoretical underpinnings, and @lambdaviking.bsky.social
submitted the paper 24h before the deadline 😍.

It's integrated into the OLMo trainer here: github.com/allenai/OLMo...

03.06.2025 17:06 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Meanwhile, OLMo is now the citation for QK norm, which we definitely didn't invent? You win some, you lose some.

13.05.2025 17:22 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Finally, OLMo 1B. This is the most commonly requested OLMo feature l, and it's finally here.

01.05.2025 22:31 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

After ICML, I decided all conferences should be in Vienna from now on.

23.04.2025 21:34 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I'm in Singapore for @iclr-conf.bsky.social ! Come check out our spotlight paper on the environmental impact of training OLMo (link in next tweet) during the Saturday morning poster session from 10-12:30 -- happy to chat about this or anything else! DMs should be open, email works too

23.04.2025 15:21 β€” πŸ‘ 10    πŸ” 5    πŸ’¬ 1    πŸ“Œ 1

Came across arxiv.org/pdf/2504.05058 today. What a cool example of work you can do when LLM training data is open!

18.04.2025 17:46 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Plot shows the relationship between compute used to predict a ranking of datasets and how accurately that ranking reflects performance at the target (1B) scale of models pretrained from scratch on those datasets.

Plot shows the relationship between compute used to predict a ranking of datasets and how accurately that ranking reflects performance at the target (1B) scale of models pretrained from scratch on those datasets.

Ever wonder how LLM developers choose their pretraining data? It’s not guessworkβ€” all AI labs create small-scale models as experiments, but the models and their data are rarely shared.
DataDecide opens up the process: 1,050 models, 30k checkpoints, 25 datasets & 10 benchmarks 🧡

15.04.2025 13:01 β€” πŸ‘ 53    πŸ” 11    πŸ’¬ 1    πŸ“Œ 3

Today we're unveiling OLMoTrace, a tool that enables everyone to understand the outputs of LLMs by connecting to their training data.

We do this on unprecedented scale and in real time: finding matching text between model outputs and 4 trillion training tokens within seconds. ✨

09.04.2025 13:37 β€” πŸ‘ 40    πŸ” 5    πŸ’¬ 1    πŸ“Œ 2

The fact that my Bsky feed is all tariffs and none Llama 4 means the platform is pretty much cooked for research purposes.

07.04.2025 16:15 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Segmentation of the sentence "By the way, I am a fan of the Milky Way" under BPE and SuperBPE.

Segmentation of the sentence "By the way, I am a fan of the Milky Way" under BPE and SuperBPE.

We created SuperBPEπŸš€, a *superword* tokenizer that includes tokens spanning multiple words.

When pretraining at 8B scale, SuperBPE models consistently outperform the BPE baseline on 30 downstream tasks (+8% MMLU), while also being 27% more efficient at inference time.🧡

21.03.2025 16:48 β€” πŸ‘ 85    πŸ” 17    πŸ’¬ 3    πŸ“Œ 5

It costs $90k. The $1000 are just a down payment.

16.03.2025 18:56 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Error bars! @hails.computer will be so proud!

13.03.2025 22:32 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
OLMo 2 32B: First fully open model to outperform GPT 3.5 and GPT 4o mini | Ai2 Introducing OLMo 2 32B, the most capable and largest model in the OLMo 2 family.

Biggest one yet! Best one yet!
Plus, some fun training stories at the bottom of the blog post (allenai.org/blog/olmo2-32B).

13.03.2025 18:39 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

When I played Civilization, I always named my religion "PDF", so I can convert cities to PDF.

10.03.2025 21:05 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

Introducing olmOCR, our open-source tool to extract clean plain text from PDFs!

Built for scale, olmOCR handles many document types with high throughput. Run it on your own GPU for freeβ€”at over 3000 token/s, equivalent to $190 per million pages, or 1/32 the cost of GPT-4o!

25.02.2025 17:03 β€” πŸ‘ 82    πŸ” 13    πŸ’¬ 3    πŸ“Œ 3

That seems like a completely normal sleep schedule for a 3 month old. Source: My kid.

19.02.2025 23:54 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Ai2 OLMoE: Fully open source, running entirely on-device
YouTube video by Ai2 Ai2 OLMoE: Fully open source, running entirely on-device

We took our most efficient model and made an open-source iOS appπŸ“±but why?

As phones get faster, more AI will happen on device. With OLMoE, researchers, developers, and users can get a feel for this future: fully private LLMs, available anytime.

Learn more from @soldaini.netπŸ‘‡ youtu.be/rEK_FZE5rqQ

11.02.2025 14:04 β€” πŸ‘ 30    πŸ” 14    πŸ’¬ 2    πŸ“Œ 5

#humblebrag

10.02.2025 17:52 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I used Thinkmate. I want to roughly pick my own specs while knowing nothing about compatibility. I don't like RGB lights everywhere. And I want the thing to be reliable. No regrets.

26.01.2025 01:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

14.8T tokens in 2.8M hours is about 1500 tokens per second. That's a very good number for 37B active parameters, but by no means unbelievable.

26.01.2025 00:57 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

You posted about AI.

26.01.2025 00:52 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I haven't read it. But I did listen to an AI generated conversation about its contents...

25.01.2025 06:15 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Interviewing OLMo 2 leads: Open secrets of training language models What we have learned and are going to do next.

Behind the scenes with what its like to build language models and pursue (hopefully) cutting edge AI research

Interviewing OLMo 2 leads: Open secrets of training language models
What we have learned and are going to do next.
YouTube: https://buff.ly/40IlSFF
Podcast / notes:

22.01.2025 15:52 β€” πŸ‘ 33    πŸ” 8    πŸ’¬ 1    πŸ“Œ 0

In November, every post here was about NLP. Now it's all about TikTok. We're doing the Twitter speed run.

19.01.2025 20:15 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

A few days ago, we did finally release the OLMo 2 tech report: arxiv.org/pdf/2501.00656. There is a lot of good stuff in there, but the stability work we did over the summer makes me particularly proud.

06.01.2025 20:03 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Everyone wants open-source language models but no one wants to lift these heavy ass weights.

We just released our paper "2 OLMo 2 Furious"
Can't stop us in 2025. Links below.

03.01.2025 19:13 β€” πŸ‘ 56    πŸ” 10    πŸ’¬ 6    πŸ“Œ 2

@mechanicaldirk is following 20 prominent accounts