Aakash Kumar Nain's Avatar

Aakash Kumar Nain

@ak-nain.bsky.social

Sr. ML Engineer | Keras 3 Collaborator | @GoogleDevExpert in Machine Learning | @TensorFlow addons maintainer l ML is all I do | Views are my own!

907 Followers  |  133 Following  |  141 Posts  |  Joined: 06.11.2024  |  1.8605

Latest posts by ak-nain.bsky.social on Bluesky

Post image

I want to share my latest (very short) blog post: "Active Learning vs. Data Filtering: Selection vs. Rejection."

What is the fundamental difference between active learning and data filtering?

Well, obviously, the difference is that:

1/11

17.05.2025 11:47 β€” πŸ‘ 39    πŸ” 11    πŸ’¬ 1    πŸ“Œ 1

This is peak performance whether you believe it or not πŸ˜‚πŸ˜‚

10.03.2025 14:08 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Aakash Kumar Nain on X: "What if you want to control the length of CoT sequences? Can you put a budget constraint at test time for the reasoner models while maintaining performance? This latest paper from CMU addresses these two questions via RL. Here is a summary of LCPO in case you are interested: Why https://t.co/q6YpqXX4SL" / X What if you want to control the length of CoT sequences? Can you put a budget constraint at test time for the reasoner models while maintaining performance? This latest paper from CMU addresses these two questions via RL. Here is a summary of LCPO in case you are interested: Why https://t.co/q6YpqXX4SL

Summary:
x.com/A_K_Nain/sta...

10.03.2025 01:51 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

What if you want to control the length of CoT sequences? Can you put a budget constraint at test time for the reasoner models while maintaining performance? This latest paper from CMU addresses these two questions via RL. Here is a summary of LCPO in case you are interested:

10.03.2025 01:50 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
x.com

x.com/A_K_Nain/sta...

14.02.2025 02:39 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
https://x.com/A_K_Nain/status/1890226873332092997

https://x.com/A_K_Nain/status/1890226873332092997

Matryoshka Quantization: Another fantastic paper from GDM! MatQuant came out last week. It was a very refreshing read. Here is a summary in case you are interested:

14.02.2025 02:38 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
The Latent: Code the Maths - Vector Fields

3/3
magic-with-latents.github.io/latent/posts...

12.02.2025 02:41 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

2/3
Now, we are starting a new series on Flow Matching with the same objective. To that end, I am happy to announce the first post of the series (link in the thread). Enjoy! 🍻

12.02.2025 02:41 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

1/3
Two years ago, we started a series on Diffusion Models that covered everything related to these models in-depth. We decided to write those tutorials covering intuition and the fundamentals because we could not find any high-quality diffusion tutorials then.

12.02.2025 02:40 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
x.com

x.com/A_K_Nain/sta...

28.01.2025 02:21 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

JanusPro is here, the next generation of the Janus model, with a few surprises (even for me!). I liked JanusFlow a lot, but the JanusPro 1B is what caught my eye. Here is a summary of the paper in case you are interested:

28.01.2025 02:20 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Aakash Nain - Blog Posts

I put up all these summaries here as well. Will update by EOD:
aakashkumarnain.github.io/blog.html

21.01.2025 07:23 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Yes, that is doable. Thanks

21.01.2025 07:22 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This site does not allow long content yet 😞

21.01.2025 07:05 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
x.com

x.com/A_K_Nain/sta...

21.01.2025 02:23 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

I read the R1 paper last night, and here is a summary cum highlights from the paper (technical report to be more precise)

21.01.2025 02:22 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
x.com

x.com/A_K_Nain/sta...

20.01.2025 03:53 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Everyone has heard enough about the scaling inference-time compute for LLMs in the past month. Diffusion models, on the other hand, have an innate flexibility for allocating varied compute at inference time. Here is a summary of how researchers at GDM exploit this property: πŸ‘‡

20.01.2025 03:53 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

I just finished reading the DeepSeekv3 paper. Here is everything you need to know about it: πŸ‘‡

x.com/A_K_Nain/sta...

27.12.2024 13:14 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
https://x.com/A_K_Nain/status/1870068712709173645

https://x.com/A_K_Nain/status/1870068712709173645

I just finished reading one of the latest papers from Meta Research, MetaMorph. Except for two things (both not good), it is an okay paper, simple, concise, and to the point. Here is a quick summary in case you are interested:
x.com/A_K_Nain/sta...

20.12.2024 11:34 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Super cool. Congrats! πŸ’₯

16.12.2024 17:25 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Veo 2 Veo is our state-of-the-art video generation model. It creates high quality video clips that match the style and content of a user's prompts, in resolutions up to 4K resolution.

Proud to see the release of Veo V2! deepmind.google/technologies...

"Veo has achieved state of the art results in head-to-head comparisons of outputs by human raters over top video generation models"

16.12.2024 17:15 β€” πŸ‘ 11    πŸ” 4    πŸ’¬ 2    πŸ“Œ 0
x.com

What if I tell you you can train a SOTA Gaze estimation model in 1 hour on an RTX4090 GPU? Too good to be true? I was also skeptical of that claim made in the Gaze-LLE paper, but it is true. DINOv2 FTW! I finished reading the paper, and here is a summary :
x.com/A_K_Nain/sta...

16.12.2024 08:06 β€” πŸ‘ 8    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
x.com

Summary:
x.com/A_K_Nain/sta...

13.12.2024 11:50 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Can you pre-train and fine-tune your VLMs in FP8? Can you get more than 2x efficiency with some simple tricks? Nvidia presents NVILA, an efficient frontier VLM that achieves all of the above. I finished reading the paper, and here is a summary in case you are interested:

13.12.2024 11:50 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Aakash Nain - Rotary Position Encoding

aakashkumarnain.github.io/posts/ml_dl_...

11.12.2024 03:48 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
https://aakashkumarnain.github.io/posts/ml_dl_concepts/rope.html

https://aakashkumarnain.github.io/posts/ml_dl_concepts/rope.html

I am back to writing math-heavy yet intuitive blog posts. Almost two years ago, I wrote the diffusion tutorials with a similar intention. This time, I am targeting the fundamental concepts of LLMs and MLLMs. And here is the first post in that direction: Rotary Position Encodings. Enjoy reading! 🍻

11.12.2024 03:04 β€” πŸ‘ 13    πŸ” 0    πŸ’¬ 2    πŸ“Œ 1
x.com

x.com/A_K_Nain/sta...

09.12.2024 08:43 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

1/2
Google DeepMind announced PaliGemma 2 last week. It is an upgrade of the PaliGemma open Vision-Language Model (VLM) based on the Gemma 2 family of language models. What does this generation of PaliGemma bring to the table? I finished reading the technical report, and here is a summary:

09.12.2024 08:43 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

It will be interesting to find out how it will fare with Sonnet. Sonnet has been my go-to model for a while now. The one thing I hate about it is that it is too verbose and chatty. If Gemini 2.0 performs on a similar scale without being verbose, I would happily use it for my tasks.

08.12.2024 17:31 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@ak-nain is following 20 prominent accounts