Lex's Avatar

Lex

@notesbylex.com.bsky.social

Senior MLE at @Canva. Full-stack developer. Kaggle Notebook Master and collector of competition silver medals. https://www.kaggle.com/lextoumbourou Talks about Software dev, ML, generative art, note-taking, dog photos. Mostly cross-posting from Mastodon.

71 Followers  |  566 Following  |  43 Posts  |  Joined: 29.11.2023  |  1.6253

Latest posts by notesbylex.com on Bluesky

Post image

This new image editing model from Black Forest Labs called **FLUX.1 Kontext** is really good. I ran some experiments on photos of Doggo, and couldn't believe how well it could maintain character consistent across multiple turns of editing.

https://notesbylex.com/absurdly-good-doggo-consistency-wit…

01.06.2025 23:25 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Learning to Reason without External Rewards (aka Self-confidence Is All You Need)

Turns out we can just use the LLM's internal sense of confidence as the reward signal to train a reasoning model, no reward model / ground-truth examples / self-play needed.

Amazing.

https://notesbylex.com/learning…

01.06.2025 23:25 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

"My new hobby: watching AI slowly drive Microsoft employees insane" https://old.reddit.com/r/ExperiencedDevs/comments/1krttqo/my_new_hobby_watching_ai_slowly_drive_microsoft/

22.05.2025 06:43 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

A cool approach to iteratively improving generated images, using o3 as an LLM-judge to generate targeted masks for improvements: https://simulate.trybezel.com/research/image_agent

21.05.2025 22:34 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

If Trump was really looking out for Elon, he would have posted that on Twitter.

11.03.2025 12:10 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

You're giving them way too much credit. Trump has always had really really bad ideas, but this time around there are no adults in government to shut them down, only sycophantic yes men.

11.03.2025 11:12 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

All that pep talk just to make social media content.

10.03.2025 20:24 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Can you link me to it? Surprisingly hard to Google for.

09.03.2025 01:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Teaser figures in ACL template papers Teaser figures in ACL template papers. GitHub Gist: instantly share code, notes, and snippets.

ARR deadline is coming up! If you're wondering how to make a beautiful full-width teaser figure on your first page, above the abstract, in LaTeX, check out this gist I made showing how I do it!

gist.github.com/michaelsaxon...

12.02.2025 21:35 β€” πŸ‘ 7    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

πŸ”₯ allenai/Llama-3.1-Tulu-3-8B (trained with PPO) -> allenai/Llama-3.1-Tulu-3.1-8B (trained with GRPO)

We are happy to "quietly" release our latest GRPO-trained Tulu 3.1 model, which is considerably better in MATH and GSM8K!

12.02.2025 17:33 β€” πŸ‘ 23    πŸ” 5    πŸ’¬ 1    πŸ“Œ 2

"As a former tech lead at Meta for 6 years... I got 'meets all' or 'exceeds' every single half except the one in which I took parental leave."

www.reddit.com/r/business/c...

12.02.2025 19:58 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Won't someone please think of the child processes?

10.02.2025 03:32 β€” πŸ‘ 14    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Media Watch has Chas derangement syndrome!

08.02.2025 03:49 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Really awesome work and big thank you for sharing it on Bluesky!

07.02.2025 23:35 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Example of injecting Wait token into the model generation.

Example of injecting Wait token into the model generation.

A hilariously simple repro of OpenAI's test-time scaling paradigm called "Budget Scaling": end the thinking when your token budget is met, or append "Wait" to the model's generation to keep thinking, allowing the model to fix incorrect reasoning steps.

arxiv.org/abs/2501.19393

03.02.2025 06:14 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

True.

31.01.2025 05:49 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Abstract and figures from paper R.I.P.: Better Models by Survival of the Fittest Prompts

Abstract and figures from paper R.I.P.: Better Models by Survival of the Fittest Prompts

A method for evaluating data for preference optimisation.

Rejecting Instruction Preferences (RIP) can filter prompts from existing training sets or make high-quality synthetic datasets. They see large performance gains across various benchmarks compared to unfiltered data.

arxiv.org/abs/2501.18578

31.01.2025 05:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - Jiayi-Pan/TinyZero: Clean, accessible reproduction of DeepSeek R1-Zero Clean, accessible reproduction of DeepSeek R1-Zero - Jiayi-Pan/TinyZero

A reproduction of Deepseek R1-Zero.

"The recipe:

We follow DeepSeek R1-Zero alg -- Given a base LM, prompts and ground-truth reward, we run RL.

We apply it to CountDown: a game where players combine numbers with basic arithmetic to reach a target number."

github.com/Jiayi-Pan/Ti...

30.01.2025 20:05 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Reasoning models can be useful for generating high-quality few-shot examples:

1. generate 10-20 examples from criteria in different styles with r1/o1/CoT, etc
2. have a model rate for each example based on quality + adherence.
3. filter/edit top examples by hand

Repeat for each category of output.

29.01.2025 04:32 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
My dog Doggo, a stag hound X bull arab, chilling in the grass

My dog Doggo, a stag hound X bull arab, chilling in the grass

Happy dog.

28.01.2025 21:36 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

The Illustrated DeepSeek-R1

Spent the weekend reading the paper and sorting through the intuitions. Here's a visual guide and the main intuitions to understand the model and the process that created it.

newsletter.languagemodels.co/p/the-illust...

27.01.2025 20:22 β€” πŸ‘ 71    πŸ” 22    πŸ’¬ 1    πŸ“Œ 4
Preview
mlx-examples/llms/mlx_lm/models/deepseek_v3.py at main Β· ml-explore/mlx-examples Examples in the MLX framework. Contribute to ml-explore/mlx-examples development by creating an account on GitHub.

The DeepSeek V3 model file in ~450 lines of code in MLX LM.

github.com/ml-explore/m...

via awnihannun on Twitter.

28.01.2025 05:48 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

So DeepSeek found a way to train a gpt4 quality model for *only* 6M worth of Nvidia hardware, and the market thinks this is bad for Nvidia?

27.01.2025 19:53 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Trump's Achilles heel is that he's a sucker for compliments and flattery. Just tell him how smart and good at business he is, and you'll get whatever you want from him.

You can see the billionaires leaning into that approach this time around, and it really seems to be working for them.

24.01.2025 22:57 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Imo the Google AI summaries are really helpful. The usefulness of an LLM increases substantially when it can reference its sources

24.01.2025 00:07 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
It's a chart of OpenAI outages reported in last 24 hours by Down Detector (which is just user reports). In the last few minutes, there's a huge spike.

It's a chart of OpenAI outages reported in last 24 hours by Down Detector (which is just user reports). In the last few minutes, there's a huge spike.

ChatGPT seems to be down.

23.01.2025 11:54 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Announcing The Stargate Project Announcing The Stargate Project

"The Stargate Project is a new company which intends to invest $500 billion over the next four years building new AI infrastructure for OpenAI in the United States"

$500 billion! For comparison, the 1960s Apollo project, when adjusted for inflation, cost around $250B.

openai.com/index/announ...

22.01.2025 01:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
x.com

"Verified DeepSeek performance on ARC-AGI's Public Eval (400 tasks) + Semi-Private (100 tasks)

DeepSeek V3:
* Semi-Private: 7.3%
* Public Eval: 14%

DeepSeek Reasoner:
* Semi-Private: 15.8%
* Public Eval: 20.5%

Performance is on par, albeit slightly lower, than o1-preview"

x.com/arcprize/sta...

21.01.2025 21:28 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
how_many_rs.md GitHub Gist: instantly share code, notes, and snippets.

This is gold. DeepSeek-R1's thought process for "how many 'r's in strawberry."

"So positions 3, 8, and 9 are Rs? No, that can't be right because the word is spelled as S-T-R-A-W-B-E-R-R-Y, which has two Rs at the end...

Wait, maybe I'm overcomplicating it...."

gist.github.com/IAmStoxe/1a1...

21.01.2025 04:05 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

@notesbylex.com is following 20 prominent accounts