kabir's Avatar

kabir

@kabir25.bsky.social

20| ML https://github.com/kabir2505/Deep-Learning-papers

49 Followers  |  991 Following  |  25 Posts  |  Joined: 05.12.2024  |  1.7604

Latest posts by kabir25.bsky.social on Bluesky

Post image

Built a *tiny-Mixtral model* (~172M, 8 experts) from scratch
with
- Grouped Query Attention,
- Rolling Buffer KV Cache
- Sparse MoEs
- Rotary Positional Embeddings
Trained it on TinyStories.

github.com/kabir2505/ti...

05.05.2025 07:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
LLMS Know More than they Show | Notion Key Contributions

Logged the full summary here: www.notion.so/kabir25/LLMS...

08.04.2025 15:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

β€’ Different errors, different signals: Internal states can even help predict what kind of error a model will make β€” factual, reasoning, etc.
β€’ Hidden knowledge: Sometimes, models internally know the right answer… but still generate the wrong one externally.

08.04.2025 15:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
LLMS Know More than they Show | Notion Key Contributions

Takeaways:
β€’ Truth is token-specific: Truthfulness signals are concentrated in certain tokens β€” probing those can significantly boost error detection.
β€’ Generalization is tough: These probing techniques don’t generalize across datasets, which means LLMs hold multiple fragmented notions of truth.

08.04.2025 15:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Read a super interesting paper recently β€” β€œLLMs Know More Than They Show”. (openreview.net/forum?id=KRn...) It dives into how large language models actually encode way more truthfulness internally than they let on in their outputs.

08.04.2025 15:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

which city in Maharashtra?

23.03.2025 11:25 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

hello! are you hiring undergrad ml research interns?

20.03.2025 12:53 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

implemented the Llama architecture from scratch in pytorch

github.com/kabir2505/De...

18.03.2025 02:50 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

let's implement llama today πŸ˜‹

09.03.2025 07:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Not enough ml/dl folks on my feed

06.03.2025 05:59 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

hahahaha same

06.03.2025 03:20 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Deep-Learning-History/GANs/WGan at main Β· kabir2505/Deep-Learning-History Deep learning paper implementations. Contribute to kabir2505/Deep-Learning-History development by creating an account on GitHub.

implemented wgan & wgan-gp in torch
github.com/kabir2505/De...
github.com/kabir2505/De...

onto some more gan models & vaes :)

06.03.2025 03:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
x.com

Spent the day revisiting dropout, so I figured I’d turn it into a blog - kabir25.notion.site/Dropout-16e3...

01.01.2025 13:52 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Implemented *instruction fine-tuning* on a GPT-2 model on a small dataset & mine claimed Robert Frost wrote Pride and PrejudiceπŸ˜…

github.com/kabir2505/pr...

28.12.2024 17:31 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

today's agenda..

21.12.2024 04:27 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

my notes on the gpt-3 paper: kabir25.notion.site/GPT3-1603fc0...

20.12.2024 15:37 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

good morning!!

20.12.2024 15:36 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

this is insane, Huge congrats! πŸ‘

19.12.2024 12:09 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

If you are into ML theory (RL or not) with a proven track record, and you are interested in an industry research position, PM me. Feel free to spread the word.

19.12.2024 00:55 β€” πŸ‘ 74    πŸ” 31    πŸ’¬ 2    πŸ“Œ 0
Post image

today's read :)

18.12.2024 14:33 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1

NeurIPS FOMO is real 🫠 wish I could teleport..

14.12.2024 16:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Deep-Learning-papers/transformers/bert at main Β· kabir2505/Deep-Learning-papers Deep learning paper implementations. Contribute to kabir2505/Deep-Learning-papers development by creating an account on GitHub.

Built π—•π—˜π—₯𝗧 from scratch in pytorch. Took a bit to understand π— Μ²π—ŸΜ²π— Μ²(Masked Language Modeling) and 𝗑̲𝗦̲𝗣̲ (Next Sentence Prediction) but totally worth the grind.
Code: github.com/kabir2505/De...
Notes: kabir25.notion.site/BERT-1533fc0...

13.12.2024 12:56 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

bad recs=zero vibes= productivity tanked!

06.12.2024 09:30 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

tackling my first nlp kaggle competition, any suggestions or references?

06.12.2024 04:55 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

notes on bert: kabir25.notion.site/BERT-1533fc0...
Still a work in progress..

05.12.2024 16:28 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Diving into BERT today :)

05.12.2024 12:31 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1

@kabir25 is following 19 prominent accounts