Tobi's Avatar

Tobi

@mocutobi.bsky.social

Founder @ ??? Ex-Machine Learning Engineer @ FAANG. Researched DeepRL+Affective Computing @usc

6,335 Followers  |  2,092 Following  |  2,357 Posts  |  Joined: 12.04.2023
Posts Following

Posts by Tobi (@mocutobi.bsky.social)

Text Shot: New findings challenge the widespread belief that AI is an environmental villain. By analyzing U.S. economic data and AI usage across industries, researchers discovered that AI’s energy consumption—while significant locally—barely registers at national or global scales. Even more surprising, AI could help accelerate green technologies rather than hinder them.

Text Shot: New findings challenge the widespread belief that AI is an environmental villain. By analyzing U.S. economic data and AI usage across industries, researchers discovered that AI’s energy consumption—while significant locally—barely registers at national or global scales. Even more surprising, AI could help accelerate green technologies rather than hinder them.

AI’s climate impact is much smaller than many feared www.sciencedaily.com/releases/2025/… #AI #environment

06.02.2026 03:13 — 👍 38    🔁 4    💬 0    📌 1

🔁 If you are enjoying the feed, please like and share it with others for discoverability!

22.09.2025 16:52 — 👍 32    🔁 7    💬 1    📌 2

This is the what I’ve been feeling as Twitter hypes up Claude Code like it’s the second coming of Jesus. Yes it’s very good but it struggles tremendously with high complexity projects and produces consistently worse designs then I’d implement myself, and people aren’t honest about admitting this

06.02.2026 22:15 — 👍 0    🔁 0    💬 0    📌 0
The generated code is not very efficient. Even with all optimizations enabled, it outputs less efficient code than GCC with all optimizations disabled.
The Rust code quality is reasonable, but is nowhere near the quality of what an expert Rust programmer might produce.

The generated code is not very efficient. Even with all optimizations enabled, it outputs less efficient code than GCC with all optimizations disabled. The Rust code quality is reasonable, but is nowhere near the quality of what an expert Rust programmer might produce.

this "claude made a c compiler thing" feels pretty dishonest and marketing-hype whenever you scroll to the very bottom to find these lines and the token cost lol

i mean, sure its impressive (particularly that it was a fully offline environment), but seems like hype bait

05.02.2026 22:54 — 👍 110    🔁 4    💬 9    📌 2

I tend to prefer GLM 4.5 Air when I’m going for cheap and fast, but Gemini Flash is pretty solid

06.02.2026 19:53 — 👍 2    🔁 0    💬 1    📌 0

Also good thing to keep in mind, that afaik the current "subscription token economy" is heavily subsidized and I am not sure Anthropic/OpenAI are charging necessarily full amount (i.e the same price that would charge for API cost)

06.02.2026 19:42 — 👍 3    🔁 2    💬 1    📌 0

This is why I genuinely believe that in the model game, there is no moat, and there will always be tremendous pressure for open source and cheaper models. The majority of the world economy doesn't want to pay $21 (!!!) per million tokens, even if they can afford to

06.02.2026 19:08 — 👍 7    🔁 0    💬 1    📌 1
Post image

I've been evaluating Claude Opus 4.6, GPT 5.2, and other models in a simulation environment I've built. The verdict: I ran out of money

06.02.2026 19:07 — 👍 23    🔁 2    💬 2    📌 0

If Trump hadn’t sued Trevor Noah for that joke at the Grammy’s, I probably never would’ve heard it. Talk about Streisand Effect

02.02.2026 17:57 — 👍 6    🔁 2    💬 1    📌 0
Post image

ByteDance Seed's ConceptMoE: moving beyond uniform token-level processing to adaptive concept-level computation in LLMs!

Why waste equal compute on trivially predictable tokens? When you can merge similar tokens into concepts while preserving fine-grained processing for complex content.

02.02.2026 14:09 — 👍 28    🔁 3    💬 1    📌 1
Post image

Thought this was referring to @lastpositivist.bsky.social and got really worried

02.02.2026 00:03 — 👍 2    🔁 0    💬 0    📌 0

What batch sizes and sequence lengths are you about to get up to on your dgx spark? I was doing 32b 2048L but saw much faster throughput when I dropped down to 16b

01.02.2026 23:15 — 👍 0    🔁 0    💬 1    📌 0

What do you use to have Claude make slides?

01.02.2026 22:05 — 👍 0    🔁 0    💬 0    📌 0

Sidenote: All experiments were performed on the Nvidia DGX Spark machine, my new favorite toy.

01.02.2026 21:55 — 👍 2    🔁 0    💬 0    📌 0

These experiments clarify the importance of more holistic evaluation metrics during pretraining. As we move labeled data for reasoning, planning, and other higher order thinking into the pretraining stage, pretraining evaluation that's more aligned with downstream tasks becomes more important

01.02.2026 21:55 — 👍 1    🔁 0    💬 1    📌 0

Outside of HumanEval, the model pretrained on the SYNTH dataset outperforms the standard nanochat on every task. I report results for both the standard chat template (Harmony) as well as the Qwen3 Chat template used in the SYNTH dataset.

01.02.2026 21:55 — 👍 4    🔁 0    💬 1    📌 1
Markdown representation of the above table:

| Task | Default Template | Qwen3 Template | Synthetic + Fineweb (Default) | Synthetic + Fineweb (Qwen3) | Synthetic + Fineweb (Qwen3, no thinking) |
|------|------------------|----------------|-------------------------------|----------------------------|-------------------------------------------|
| ARC-Easy | 38.85% | 25.34% | 42.85% | 30.26% | 25.08% |
| ARC-Challenge | 29.95% | 22.78% | 32.94% | 25.85% | 23.04% |
| MMLU | 32.69% | 24.08% | 33.13% | 26.98% | 24.21% |
| GSM8K | 3.64% | 0.08% | 6.14% | 2.20% | 0.08% |
| HumanEval | 8.54% | 1.22% | 1.83% | 0.00% | 0.61% |
| SpellingBee | 98.05% | 0.00% | 98.05% | 35.94% | 0.39% |

Markdown representation of the above table: | Task | Default Template | Qwen3 Template | Synthetic + Fineweb (Default) | Synthetic + Fineweb (Qwen3) | Synthetic + Fineweb (Qwen3, no thinking) | |------|------------------|----------------|-------------------------------|----------------------------|-------------------------------------------| | ARC-Easy | 38.85% | 25.34% | 42.85% | 30.26% | 25.08% | | ARC-Challenge | 29.95% | 22.78% | 32.94% | 25.85% | 23.04% | | MMLU | 32.69% | 24.08% | 33.13% | 26.98% | 24.21% | | GSM8K | 3.64% | 0.08% | 6.14% | 2.20% | 0.08% | | HumanEval | 8.54% | 1.22% | 1.83% | 0.00% | 0.61% | | SpellingBee | 98.05% | 0.00% | 98.05% | 35.94% | 0.39% |

The results seemed to speak for themselves, but I was skeptical. The SYNTH dataset is composed of reasoning-style chat messages, it seemed unlikely that general internet data would perform better on downstream chat-style tasks. So I decided to do midtraining, where the results flipped:

01.02.2026 21:55 — 👍 0    🔁 0    💬 1    📌 0
Post image Post image

I used a 3:1 SYNTH:FineWeb data ratio when pretraining. I evaluated nanochat with and without SYNTH data for both 1B pretraining tokens and 3B pretraining tokens to see if adding the data lead to faster convergence. While SYNTH data led to lower val/bpb, CORE metric was consistently lower.

01.02.2026 21:55 — 👍 0    🔁 0    💬 1    📌 0
Pretraining results for training on SYNTH + FineWeb vs FineWeb alone. 

Four rows are displayed: dgx1-continued, dgx1-synth-contued, dgx1-synth, and dgx1.

Rows labeled *-synth are trained with 3:1 SYNTH:FineWeb mixture. Results with *-continued* are trained for 3B tokens, 1B otherwise. CORE metric:

nanochat + SYNTH, 3B tokens: 0.147 CORE Metric
nanochat, 3B tokens: 0.166
nanochat + SYNTH, 1B tokens: 0.123
nanotchat, 3B tokens: 0.139

Pretraining results for training on SYNTH + FineWeb vs FineWeb alone. Four rows are displayed: dgx1-continued, dgx1-synth-contued, dgx1-synth, and dgx1. Rows labeled *-synth are trained with 3:1 SYNTH:FineWeb mixture. Results with *-continued* are trained for 3B tokens, 1B otherwise. CORE metric: nanochat + SYNTH, 3B tokens: 0.147 CORE Metric nanochat, 3B tokens: 0.166 nanochat + SYNTH, 1B tokens: 0.123 nanotchat, 3B tokens: 0.139

I did some experiments with nanochat and the SYNTH dataset from Pleias. I wanted to compare how a model pretrained with a mixture of SYNTH and FineWeb would compare to the standard FineWeb mixture. Results: Pretraining on Fineweb alone looks good (see below), but the story changes after midtraining:

01.02.2026 21:55 — 👍 9    🔁 1    💬 1    📌 0

Can anyone find that meme with the cartoon cats where all of their butts are AI company logos? I need it for a project

28.01.2026 21:20 — 👍 0    🔁 0    💬 0    📌 0

Been running comparisons nanochat pretrained on FineWebEdu vs synthetic data and it’s not looking great for the synthetic data runs thus far. We’ll see what it looks like after mid training

22.01.2026 00:09 — 👍 1    🔁 0    💬 0    📌 0

At my day job, I’m building a lot of Agent stuff! On my free time, I’m experimenting with model architectures and pretraining recipes!

12.01.2026 02:17 — 👍 5    🔁 0    💬 0    📌 0

Excited to see the AI community has grown on here! Gonna try sharing more about my work here!

12.01.2026 00:30 — 👍 47    🔁 1    💬 4    📌 0

When Bill Clinton left office, we had a budget surplus. After 8 years of disastrous foreign interventions and tax cuts, we had deficit and a recession. Now with Trump we’re repeating repeating the same mistakes made under Bush

03.01.2026 21:31 — 👍 5    🔁 0    💬 0    📌 0

I heard this place is getting a dislike button?

01.11.2025 07:12 — 👍 2    🔁 0    💬 0    📌 0

If you assume that there is an infinite number of tech workers with six figure salaries who want to live in the Bay or NY then the NIMBYs are right and no amount of new housing will help. But there is not an infinite number of tech workers, and the ones here are being laid off.

10.08.2025 23:09 — 👍 5    🔁 0    💬 0    📌 0

People talk about artificial intelligence but don't understand what that technology actually is, what the term "AI" is referring to, or best use cases for all the different tools that exist.

Nor do they understand that they've already been using AI assistance for YEARS before LLMs became public.

09.08.2025 17:29 — 👍 1    🔁 1    💬 0    📌 0

All these things apply to a small percentage of AI research and companies, yet this website treats all AI models and research like it comes from xAI. You ran the actual open source and AI safety researchers off the app

09.08.2025 17:24 — 👍 1    🔁 0    💬 2    📌 0

I think it’s because they’re threatened by it honestly

08.08.2025 11:35 — 👍 6    🔁 0    💬 4    📌 3

Them: It's not about race, it's about class.

Hopkins: Ok, we're making a class-based policy.

Them: We still don't like it, because what we actually meant was...

18.07.2025 14:10 — 👍 415    🔁 98    💬 8    📌 1