Al_th @alth.fr - Bluesky Profile

Poivre blanc du penja, c’est un de mes préféré si jamais tu as l’occasion

15.11.2025 11:43 — 👍 1 🔁 0 💬 1 📌 0

Que vaut la vanille ?

J’ai vu les mêmes pub sur Instagram et vu le prix j’avoue que j’étais un peu refroidi… j’ai cru à une arnaque

15.11.2025 11:42 — 👍 0 🔁 0 💬 1 📌 0

Le running, c’est pas du flan (quoique) La multiplication des clubs de course à but gourmand prouve que le réconfort vaut désormais autant que l’effort. Au Running Flan Club, à Paris, on s’épuise sur quelques kilomètres avant de se retrouve...

www.lemonde.fr/m-perso/arti...

"Le flan n’est pas prétentieux comme un macaron ; il est « terroir », mais pas snob comme le pâté en croûte. "

Je vous jure, je suis trigger de fou.

21.03.2025 10:43 — 👍 0 🔁 0 💬 0 📌 0

🔥🔥🔥 CV Folks, I have some news! We're organizing a 1-day meeting in center Paris on June 6th before CVPR called CVPR@Paris (similar as NeurIPS@Paris) 🥐🍾🥖🍷

Registration is open (it's free) with priority given to authors of accepted papers: cvprinparis.github.io/CVPR2025InPa...

Big 🧵👇 with details!

21.03.2025 06:43 — 👍 137 🔁 51 💬 8 📌 10

A bit frustrated by how arXiv accounts are integrated in the #MLSky feed.

Endless scrolling of links without context is uninformative, and just leads to me to ignore them all.

I can block but is this really a good route…

25.02.2025 06:58 — 👍 1 🔁 1 💬 0 📌 0

Vivre… vivre… c’est un grand mot.

Y’a des jours c’est de la survie 🤣

03.03.2025 06:36 — 👍 0 🔁 0 💬 0 📌 0

Crossing the uncanny valley of conversational voice At Sesame, our goal is to achieve “voice presence”—the magical quality that makes spoken interactions feel real, understood, and valued.

Impressive.

I really like the fact that you can interrupt. It's always difficult to speak with an AI algorithm, or even do speech to text as the moment you stop talking, it's the algorithm's "turn".

IRL we do not play turn-based, it's much more subtle than that.

28.02.2025 12:36 — 👍 0 🔁 0 💬 0 📌 0

A bit frustrated by how arXiv accounts are integrated in the #MLSky feed.

Endless scrolling of links without context is uninformative, and just leads to me to ignore them all.

I can block but is this really a good route…

25.02.2025 06:58 — 👍 1 🔁 1 💬 0 📌 0

Il faut les comprendre. Le MO coûte cher ! 😂

24.02.2025 07:02 — 👍 0 🔁 0 💬 0 📌 0

Il y a une discussion entre deux personnes dans cette salle.

Celle qui porte la culotte, si vous me pardonnez cette expression un peu datée, n’est pas celle que vous croyez.

Go girl !

22.02.2025 09:31 — 👍 1 🔁 0 💬 0 📌 0

Some time ago, I DM'd @dorialexander.bsky.social about a similar (yet somewhat diff) idea :

While there is a point in fixing the generated tokens, we do squash enormous amount of information by actually looking if the cat is dead or alive.

AFAIK, the issue with diff is fixed context size tho

19.02.2025 08:45 — 👍 1 🔁 0 💬 0 📌 0

An arguably "easy to read" simple GRPO implementation, for teaching purpose

#MLSky

07.02.2025 12:46 — 👍 1 🔁 0 💬 0 📌 0

It’s really funny to me that the hottest RL algorithm in town is just a simplification (z-score normalization for advantage calculation) of a simplification (KL penalization over hard KL constraint).

GRPO is quite intuitive, although I guess the devil is in the details and « convergence » speeds

06.02.2025 17:07 — 👍 1 🔁 0 💬 0 📌 0

The new policy logprob computation seems a bit clunky for now.

It's currently generic enough to use any generation length in the grpo output generation step, but I guess it would be much more efficient to generate only a context size chunk and use the fact that you have the full logits available...

06.02.2025 10:14 — 👍 2 🔁 0 💬 0 📌 0

GitHub - Al-th/grpo_experiment: Experiment on reimplementation of GRPO RL Experiment on reimplementation of GRPO RL . Contribute to Al-th/grpo_experiment development by creating an account on GitHub.

github.com/Al-th/grpo_e...

I hope it's a reasonable implementation...

Tokenizer and Transformer models are very naive, based on Karpathy's transformer from scratch video. Data is also based on Karpathy's video.

06.02.2025 10:00 — 👍 1 🔁 0 💬 1 📌 1

Probably can share that yeah

Needs a bit of cleanup first but I’ll ping you.

05.02.2025 19:53 — 👍 1 🔁 0 💬 0 📌 0

To be fair, the GRPO optimized model doesnt shout, the RL cheated by having more people speak (as names are capitalized in the dataset I'm using)

(Left is base transformer, right is post GRPO)

05.02.2025 17:22 — 👍 1 🔁 0 💬 0 📌 0

I implemented GRPO from scratch to RL a tiny toy LLM and it works surprisingly well.

Rule base reward inspired by @dorialexander.bsky.social to make my Shakespeare shout more.

I went for Outcome Supervision as both OS and PS we’re kind of close in DeepseekMath paper…

05.02.2025 17:20 — 👍 2 🔁 0 💬 2 📌 0

Vu les niveaux de radioactivité rapportés : 2.7->26.4 avec une médiane de 14.4 Bq/kg, honnêtement je suis pas expert mais je pense qu’on peut dire « vu et s’en tape »…

J’ai bien aimé ce passage aussi « These values are 10ˆ8 times lower than levels authorized by EU (55) (3.10−3 mSv day−1) »

03.02.2025 21:14 — 👍 12 🔁 1 💬 0 📌 0

Poussières sahariennes : la radioactivité ne provient pas des essais nucléaires menés par la France Les poussières désertiques représentent la première source mondiale en masse d’aérosols dans l’atmosphère.

2/2

Cette conclusion provient de plusieurs types d'analyses combinées (géochimie, granulométrie, minéralogie des argiles, activités des radionucléides et de leur signature isotopique, rétro-trajectoires des masses d’air...)

Source @cnrs.bsky.social INSU : www.insu.cnrs.fr/fr/cnrsinfo/...

03.02.2025 20:50 — 👍 72 🔁 12 💬 4 📌 0

I release my first attempts at training a base model with GRPO. In a similar spirit to R0, this colab notebook transforms Pleias-350m into an RL poet without any post-training data, using only reward functions. t.co/tYSp8NYI1s

02.02.2025 23:30 — 👍 45 🔁 9 💬 1 📌 0

Dans mon job, en interne donc, ça fait deux ans qu’une décision doit être prise. Je craque.

Et pendant ce temps, obviously, le contexte change, les concurrents avancent, ect…

02.02.2025 07:32 — 👍 1 🔁 0 💬 1 📌 0

Is it really challenging conventional AI wisdom though ?

It is know for quite a bit of time that training data quality is one of the most important factor when working with supervised algorithms, even though the real world data might be noisy.

Isn’t it the same but in the RL environment ?

30.01.2025 07:01 — 👍 0 🔁 0 💬 0 📌 0

Tbh I don’t think any of it is (in case this was what you implied) a shift in cultural behavior.

In my view, it’s more the manifestation of the economical benefit: you are the first, you don’t disclose to keep your advantage. You are not, then open sourcing can hurt the top player.

28.01.2025 07:23 — 👍 0 🔁 0 💬 0 📌 0

Ping ! Au rapport ;)

27.01.2025 07:01 — 👍 1 🔁 0 💬 0 📌 0

Oui sans doute… mais pour 15 ordonnances par an, c’est vraiment 😵

26.01.2025 18:40 — 👍 0 🔁 0 💬 0 📌 0

Je suis mort de rire : un médecin retraité >>>>n’exerçant plus aucune activité médicale rémunérée <<<< doit toujours filer ses 100 balles a l’ordre des médecins 😂

26.01.2025 08:19 — 👍 0 🔁 0 💬 1 📌 0

Vraiment il sera intéressant de voir un peu les algos de "X". Notre compte est interdit de publier des "notes de la communauté". Nous aurions trop de statut "inutile". Pourtant la 2ème capture prouve que c'est faux; et nous sommes allé vérifier.
Conclusion? Les algos de X manipulent les résultats.

24.01.2025 15:45 — 👍 84 🔁 22 💬 5 📌 3

AGI milestone passed

24.01.2025 09:15 — 👍 0 🔁 0 💬 0 📌 0

J’aime cet article parce qu’il utilise le prétexte d’un fait divers pour éduquer.

Mieux, il éduque à la fois sur la sémantique et sur la prise en charge des cancers.

23.01.2025 10:55 — 👍 2 🔁 0 💬 1 📌 0

Al_th

Latest posts by alth.fr on Bluesky

@alth.fr is following 20 prominent accounts