Alexander Doria's Avatar

Alexander Doria

@dorialexander.bsky.social

LLM for the commons.

7,478 Followers  |  683 Following  |  1,863 Posts  |  Joined: 02.09.2023  |  2.094

Latest posts by dorialexander.bsky.social on Bluesky

yeah, the correct implementation would be something like agentic gimp (+ capability to generate artifacts on demand on top)

12.02.2026 18:02 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Right now the most limiting factor for AI capabilities is synthetic environment design. It's scalable but costly, as you need simultaneously high-level AI training, data design and in-domain skills. Not worth it for small markets, so art will likely stay for a while with semi-bad diffusion

12.02.2026 17:35 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

+1. Existing agentic models could be considerably better at art/illustration design (not just creating an image in one batch, but iteratively enhancing it, adding details, etc.) if there was the incentives for it. But programming/law/management are just much more lucrative.

12.02.2026 17:27 β€” πŸ‘ 18    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
Post image

Full list of conferences/interventions, some with usual suspects (obviously Wikimedia)

12.02.2026 15:22 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

If you're ever in Delhi next week: Pleias has been invited to the AI Impact Summit and my co-founder will give several conferences on SLMs and building AI infrastructure as a commons.

12.02.2026 14:53 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

In my experience: they are good provided they have some ability to backtrack/self-correct. Might be a trade-off with the overwhelming agent design.

11.02.2026 16:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Could it be delegation to Haiku? Opus self-corrected course by itself, but just yesterday it had offloaded some annotation tasks to Haiku sub agents, and a few of them settled on bad rule-based design that generated pure junks.

11.02.2026 14:08 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Censure ? Colle avec en tout cas pour la sΓ©quence 1805-1814 qui voit vraiment la naissance du feuilleton.

10.02.2026 15:56 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Pierre-Carl Langlais Personal website of Pierre-Carl Langlais.

I still have the older DH pilled version (in French). vintagedata.org/pclanglais And maybe next blogpost will be about seeing synth generation through DH lenses.

08.02.2026 15:45 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

As a result, did the long delayed/much needed update of personal web page: vintagedata.org

08.02.2026 15:28 β€” πŸ‘ 19    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

this time absolutely not better of this side of the Atlantic. France/Italy have nearly everything blocked.

06.02.2026 21:46 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

I’m not sure if it’s limited to the EU but sex censorship of the web is getting absolutely ridiculous. 1950s level.

06.02.2026 21:42 β€” πŸ‘ 27    πŸ” 1    πŸ’¬ 2    πŸ“Œ 1

i guess i’m really an ai researcher now

06.02.2026 16:42 β€” πŸ‘ 66    πŸ” 0    πŸ’¬ 3    πŸ“Œ 1

Tiens me rappelle un peu en.wikipedia.org/wiki/Celesti... (si jamais tu l'as pas dΓ©jΓ  lu)

06.02.2026 10:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Wouldn’t have bet +2 years ago on Bender & co getting stuck in a weird cultish mood while EA people counteract very sensibly, but here we are. benthams.substack.com/p/the-ai-con...

03.02.2026 18:27 β€” πŸ‘ 12    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Part of the issue: it’s difficult to say what is actually "hard" for a model and frequently counter-intuitive. It could work for more compositional problems of even agentic trajectories but we just don’t know for now.

02.02.2026 20:20 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Yes it's curriculum training. Used to be more discussed 1-2 years ago, still roughly what mid-training is doing in a way, but in a very broad sense. Current consensus is rather in the direction of pooling everything from the start and let the model sort it out.

02.02.2026 18:45 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
Post image

Was literally one of my SYNTH slide.

(unfortunately practical here as people are too confused otherwise with mid/post-training, but I trust the plan)

02.02.2026 14:57 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 3    πŸ“Œ 0

Meanwhile, further independent evaluation confirms SYNTH performs better than Nanochat/Fineweb across nearly all benchmarks.

(Humaneval to be expected: we did not out any code inside yet)

01.02.2026 22:50 β€” πŸ‘ 16    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

Hopefully more like months :)

(Very fun practical industrial use case too, millions of daily users)

01.02.2026 18:33 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

*a tentative typology of different shapes of synthetic compilations from memorization, to hardwired logic,
*speculative discussions drawn from model interpretability research (quanta & reasoning primitives),
*and why in the end, maybe we should actually plan for greatness.

Enjoy.

01.02.2026 17:54 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Featuring:
*first attempt at systematic definition informed from practical realities on the field (scaling + scheduling),
*historical context going back from Phi 1.5, why it failed and why you need to treat data as a research axis to make it happen,

01.02.2026 17:54 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

It took me weeks, but finally it's there: an overlong blogpost on synthetic pretraining. vintagedata.org/blog/posts/s...

01.02.2026 17:53 β€” πŸ‘ 87    πŸ” 21    πŸ’¬ 3    πŸ“Œ 2

maybe it's awfully european of me, but i'm not convinced dumping a mass of non-contextualized private documents is ever a good thing

01.02.2026 14:42 β€” πŸ‘ 19    πŸ” 2    πŸ’¬ 2    πŸ“Œ 1

seems we’re dangerously close to a point where someone will tweet you can slurp three gas town juices in a moltbot - and kill a few ai bubbles.

01.02.2026 11:56 β€” πŸ‘ 86    πŸ” 8    πŸ’¬ 0    πŸ“Œ 1

Common Corpus paper is going to @iclr-conf.bsky.social !

26.01.2026 13:08 β€” πŸ‘ 58    πŸ” 2    πŸ’¬ 1    πŸ“Œ 1

My point is that it’s not at all a contrarian idea anymore. In the EU fast becoming default funding option for grant as I’m afraid people internalize LLM/verbal reasoning research is beyond grasp, so actually damaging.

25.01.2026 17:43 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

If I see another headline about Yann le Cun "contrarian" bet, I’m killing a non-verbal model.

25.01.2026 17:26 β€” πŸ‘ 25    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Post image

Now starting.

From what I see generally compared to Joyce and Kafka but for now the obvious comparison point is Robert Musil (did Musil read it?)

25.01.2026 15:25 β€” πŸ‘ 12    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

If you ever wondered if SYNTH could be usable in mid-training for larger models: Step-DeepResearch (from StepFun) is now out as private beta. stepfun.ai/deep-researc...

25.01.2026 09:11 β€” πŸ‘ 24    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

@dorialexander is following 20 prominent accounts