Alexander Doria's Avatar

Alexander Doria

@dorialexander.bsky.social

LLM for the commons.

6,475 Followers  |  624 Following  |  1,197 Posts  |  Joined: 02.09.2023  |  1.6143

Latest posts by dorialexander.bsky.social on Bluesky

It's going to be much faster. Everything is going to be much faster (and that's a major issue).

03.08.2025 18:31 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Post image

Iโ€™d need 1-2 hours of careful reading but one the most significant factor is the introduction of a dedicated model system for geometry/spatial reasoning. Auto-regressive LLMs are notoriously deficient in this front.

03.08.2025 12:54 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

The one paper to read this week-end. ByteDance seems to have achieved a major breakthrough in AI math: new LLM systems claim to solve half PutnamBench โ€” with previous SORA from DeepSeek and Goedel at 10-15%. arxiv.org/abs/2507.23726

03.08.2025 12:50 โ€” ๐Ÿ‘ 15    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Correct and the reason you donโ€™t see a lot of AI regulations in the foreseeable future:

03.08.2025 12:47 โ€” ๐Ÿ‘ 31    ๐Ÿ” 6    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I'm increasingly suspicious a lot of gender/sex issue stems straight from the lack of sex positive discourse for men.

03.08.2025 00:03 โ€” ๐Ÿ‘ 19    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Kind of a black pill for quick AGI: no lab seemingly attempted to compete at the International Physics Olympiad.

02.08.2025 10:19 โ€” ๐Ÿ‘ 14    ๐Ÿ” 0    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 0

Also code models frequently include a "fill in the middle" feature which does include future tokens at prediction time, but not predicting backwards.

01.08.2025 14:38 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Not directly but one of the interesting features of diffusion models is kind of close: since they generate the whole answer in one batch and denoise they do reassess past tokens in light of (likely) future tokens.

01.08.2025 14:36 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Someone got access to the weights and attempted to reverse engineering inference without success so far. Still plenty of architecture details. gist.github.com/main-horse/a...

01.08.2025 09:23 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

None yet but I really Sam Altman committed to an actual open one.

01.08.2025 07:57 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Someone reposted what seems to be the config on discord. Maybe a mixture of experts after all.

01.08.2025 07:51 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Post image

New OpenAI open source model briefly appeared on HuggingFace and itโ€™s not only phone-sized: 120 billion parameters, apparently dense. Thereโ€™s also a 20b. Unclear if either of them is Horizon-Aloha currently on preview on OpenRouter.

01.08.2025 07:39 โ€” ๐Ÿ‘ 22    ๐Ÿ” 3    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Post image

Since they focused on triplet combinations, almost surprising they did not work straight from Wikidata or other RDF sources.

31.07.2025 18:01 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Very intriguing paper on agentic search from Qwen team. Introduce advanced synth methods for query and retrieval with heavy symbolic approach (almost look like a math prover - they do cite deepseek prover). arxiv.org/pdf/2507.15061

31.07.2025 18:01 โ€” ๐Ÿ‘ 12    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Relatedly I was surprised people in symbolic AI are so negatively polarized against generative AI given the many opportunities for crossover. But now I'm going deep in expert systems it does make some sense: really really hard to get a LM working reliably.

31.07.2025 15:48 โ€” ๐Ÿ‘ 11    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

2027: you need 2FA digital id to read wikipedia article on contraception all the while your phone generates 5 hours of endless hentai.

30.07.2025 19:36 โ€” ๐Ÿ‘ 14    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

local models improving a lot in a context of widespread internet censorship will certainly be funny contrast

30.07.2025 19:36 โ€” ๐Ÿ‘ 11    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Hum je pense vraiment que c'est plutรดt un choix d'architecture โ€” probablement mix auto-regressif + diffusion avec un drift naturel vers les couleurs chaudes. La donnรฉe synthรฉtique est utilisรฉe depuis un moment pour les modรจles d'image (enfin on parlait plus de data augmentation ร  l'รฉpoque)

27.07.2025 15:27 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Been creating fake synthetic wikipedia articles as I plan to use real ones as seeds for math problem and some are accidentally funny.

27.07.2025 14:15 โ€” ๐Ÿ‘ 12    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I dare say the concept of synthetic playground has completely changed the way I approach pre/mid/post-training (is it training now?). Next Pleias releases will be very different.

27.07.2025 08:05 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

We now get a major preview of the next chapter of Physics of language model. And it's not toy models anymore: systematic synthetic data/design experiments to create SOTA 7B, especially for memorization/recall. github.com/facebookrese...

27.07.2025 08:05 โ€” ๐Ÿ‘ 15    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

anyway, thanks mastercard and visa for reviving the big tent anti-censorship movements from the early 2010s. i can confirm it isn't any more popular on the other website

25.07.2025 19:28 โ€” ๐Ÿ‘ 8    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Just started it and goes really straight to the theme. Really well written and interesting, surprised itโ€™s not more well-known.

16.07.2025 19:42 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image Post image

Except we might now argue computers can somewhat "think", not much has changed

16.07.2025 19:41 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Randomly bought one of the few (?) novels about digital humanities. From 1968 but completely fitting the definition: narrator is going to do text statistics on Italian cultural heritage with a giant IBM computer.

16.07.2025 19:39 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

This is especially welcome as models deal much more with structured data. For our semantic data experiments I ended up dropping the classic rdf triplets as they were too token consuming: no longer the case here.

14.07.2025 22:20 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Great breakdown of h-net, the new rising alternative to vanilla tokenization. With a demonstration of its strength: encoding low information/repetitive patterns (like sequences of the same letter) as a single token. main-horse.github.io/posts/hnet-i...

14.07.2025 22:19 โ€” ๐Ÿ‘ 27    ๐Ÿ” 4    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 0

Yes! As far as I know they seem to have used some parts of common corpus (and associated tooling as well)

13.07.2025 20:04 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

En rรฉalitรฉ je pense quโ€™il nโ€™y a plus rien dโ€™inaccessible en terme de courtes productions รฉcrites/visuelles. Suffit dโ€™avoir les donnรฉes et une maniรจre de modรฉliser lโ€™objectif ร  atteindre (pas forcรฉment vรฉrifiable : lร  lโ€™essentiel de la recherche en reinforcement learning est sur les soft rewards)

13.07.2025 14:19 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Le problรจme de ce type dโ€™argument cโ€™est quโ€™on court derriรจre les รฉvolutions de la technique. Les labs commencent ร  investir le crรฉneau esthรฉtique comme les รฉvaluations de math sont saturรฉes : OpenAI prรฉparerait un modรจle "littรฉraire" et k2 (Moonshot) se fait en partie remarquer sur ce plan

13.07.2025 14:13 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@dorialexander is following 20 prominent accounts