yeah, the correct implementation would be something like agentic gimp (+ capability to generate artifacts on demand on top)
12.02.2026 18:02 β π 2 π 0 π¬ 0 π 0@dorialexander.bsky.social
LLM for the commons.
yeah, the correct implementation would be something like agentic gimp (+ capability to generate artifacts on demand on top)
12.02.2026 18:02 β π 2 π 0 π¬ 0 π 0Right now the most limiting factor for AI capabilities is synthetic environment design. It's scalable but costly, as you need simultaneously high-level AI training, data design and in-domain skills. Not worth it for small markets, so art will likely stay for a while with semi-bad diffusion
12.02.2026 17:35 β π 4 π 0 π¬ 0 π 0+1. Existing agentic models could be considerably better at art/illustration design (not just creating an image in one batch, but iteratively enhancing it, adding details, etc.) if there was the incentives for it. But programming/law/management are just much more lucrative.
12.02.2026 17:27 β π 18 π 0 π¬ 2 π 0Full list of conferences/interventions, some with usual suspects (obviously Wikimedia)
12.02.2026 15:22 β π 0 π 0 π¬ 0 π 0If you're ever in Delhi next week: Pleias has been invited to the AI Impact Summit and my co-founder will give several conferences on SLMs and building AI infrastructure as a commons.
12.02.2026 14:53 β π 7 π 0 π¬ 1 π 0In my experience: they are good provided they have some ability to backtrack/self-correct. Might be a trade-off with the overwhelming agent design.
11.02.2026 16:47 β π 1 π 0 π¬ 0 π 0Could it be delegation to Haiku? Opus self-corrected course by itself, but just yesterday it had offloaded some annotation tasks to Haiku sub agents, and a few of them settled on bad rule-based design that generated pure junks.
11.02.2026 14:08 β π 2 π 0 π¬ 1 π 0Censure ? Colle avec en tout cas pour la sΓ©quence 1805-1814 qui voit vraiment la naissance du feuilleton.
10.02.2026 15:56 β π 3 π 0 π¬ 1 π 0I still have the older DH pilled version (in French). vintagedata.org/pclanglais And maybe next blogpost will be about seeing synth generation through DH lenses.
08.02.2026 15:45 β π 4 π 0 π¬ 2 π 0As a result, did the long delayed/much needed update of personal web page: vintagedata.org
08.02.2026 15:28 β π 19 π 0 π¬ 1 π 0this time absolutely not better of this side of the Atlantic. France/Italy have nearly everything blocked.
06.02.2026 21:46 β π 4 π 0 π¬ 2 π 0Iβm not sure if itβs limited to the EU but sex censorship of the web is getting absolutely ridiculous. 1950s level.
06.02.2026 21:42 β π 27 π 1 π¬ 2 π 1i guess iβm really an ai researcher now
06.02.2026 16:42 β π 66 π 0 π¬ 3 π 1Tiens me rappelle un peu en.wikipedia.org/wiki/Celesti... (si jamais tu l'as pas dΓ©jΓ lu)
06.02.2026 10:00 β π 0 π 0 π¬ 1 π 0Wouldnβt have bet +2 years ago on Bender & co getting stuck in a weird cultish mood while EA people counteract very sensibly, but here we are. benthams.substack.com/p/the-ai-con...
03.02.2026 18:27 β π 12 π 0 π¬ 1 π 0Part of the issue: itβs difficult to say what is actually "hard" for a model and frequently counter-intuitive. It could work for more compositional problems of even agentic trajectories but we just donβt know for now.
02.02.2026 20:20 β π 1 π 0 π¬ 1 π 0Yes it's curriculum training. Used to be more discussed 1-2 years ago, still roughly what mid-training is doing in a way, but in a very broad sense. Current consensus is rather in the direction of pooling everything from the start and let the model sort it out.
02.02.2026 18:45 β π 2 π 0 π¬ 2 π 0Was literally one of my SYNTH slide.
(unfortunately practical here as people are too confused otherwise with mid/post-training, but I trust the plan)
Meanwhile, further independent evaluation confirms SYNTH performs better than Nanochat/Fineweb across nearly all benchmarks.
(Humaneval to be expected: we did not out any code inside yet)
Hopefully more like months :)
(Very fun practical industrial use case too, millions of daily users)
*a tentative typology of different shapes of synthetic compilations from memorization, to hardwired logic,
*speculative discussions drawn from model interpretability research (quanta & reasoning primitives),
*and why in the end, maybe we should actually plan for greatness.
Enjoy.
Featuring:
*first attempt at systematic definition informed from practical realities on the field (scaling + scheduling),
*historical context going back from Phi 1.5, why it failed and why you need to treat data as a research axis to make it happen,
It took me weeks, but finally it's there: an overlong blogpost on synthetic pretraining. vintagedata.org/blog/posts/s...
01.02.2026 17:53 β π 87 π 21 π¬ 3 π 2maybe it's awfully european of me, but i'm not convinced dumping a mass of non-contextualized private documents is ever a good thing
01.02.2026 14:42 β π 19 π 2 π¬ 2 π 1seems weβre dangerously close to a point where someone will tweet you can slurp three gas town juices in a moltbot - and kill a few ai bubbles.
01.02.2026 11:56 β π 86 π 8 π¬ 0 π 1Common Corpus paper is going to @iclr-conf.bsky.social !
26.01.2026 13:08 β π 58 π 2 π¬ 1 π 1My point is that itβs not at all a contrarian idea anymore. In the EU fast becoming default funding option for grant as Iβm afraid people internalize LLM/verbal reasoning research is beyond grasp, so actually damaging.
25.01.2026 17:43 β π 3 π 0 π¬ 1 π 0If I see another headline about Yann le Cun "contrarian" bet, Iβm killing a non-verbal model.
25.01.2026 17:26 β π 25 π 3 π¬ 1 π 0Now starting.
From what I see generally compared to Joyce and Kafka but for now the obvious comparison point is Robert Musil (did Musil read it?)
If you ever wondered if SYNTH could be usable in mid-training for larger models: Step-DeepResearch (from StepFun) is now out as private beta. stepfun.ai/deep-researc...
25.01.2026 09:11 β π 24 π 2 π¬ 1 π 0