kabir @kabir25 - Bluesky Profile

Built a *tiny-Mixtral model* (~172M, 8 experts) from scratch
with
- Grouped Query Attention,
- Rolling Buffer KV Cache
- Sparse MoEs
- Rotary Positional Embeddings
Trained it on TinyStories.

github.com/kabir2505/ti...

05.05.2025 07:59 — 👍 0 🔁 0 💬 0 📌 0

LLMS Know More than they Show | Notion Key Contributions

Logged the full summary here: www.notion.so/kabir25/LLMS...

08.04.2025 15:09 — 👍 0 🔁 0 💬 0 📌 0

• Different errors, different signals: Internal states can even help predict what kind of error a model will make — factual, reasoning, etc.
• Hidden knowledge: Sometimes, models internally know the right answer… but still generate the wrong one externally.

08.04.2025 15:09 — 👍 0 🔁 0 💬 1 📌 0

LLMS Know More than they Show | Notion Key Contributions

Takeaways:
• Truth is token-specific: Truthfulness signals are concentrated in certain tokens — probing those can significantly boost error detection.
• Generalization is tough: These probing techniques don’t generalize across datasets, which means LLMs hold multiple fragmented notions of truth.

08.04.2025 15:09 — 👍 0 🔁 0 💬 1 📌 0

Read a super interesting paper recently — “LLMs Know More Than They Show”. (openreview.net/forum?id=KRn...) It dives into how large language models actually encode way more truthfulness internally than they let on in their outputs.

08.04.2025 15:09 — 👍 0 🔁 0 💬 1 📌 0

which city in Maharashtra?

23.03.2025 11:25 — 👍 1 🔁 0 💬 1 📌 0

hello! are you hiring undergrad ml research interns?

20.03.2025 12:53 — 👍 1 🔁 0 💬 1 📌 0

implemented the Llama architecture from scratch in pytorch

github.com/kabir2505/De...

18.03.2025 02:50 — 👍 2 🔁 0 💬 0 📌 0

let's implement llama today 😋

09.03.2025 07:12 — 👍 0 🔁 0 💬 0 📌 0

Not enough ml/dl folks on my feed

06.03.2025 05:59 — 👍 1 🔁 0 💬 0 📌 0

hahahaha same

06.03.2025 03:20 — 👍 0 🔁 0 💬 0 📌 0

Deep-Learning-History/GANs/WGan at main · kabir2505/Deep-Learning-History Deep learning paper implementations. Contribute to kabir2505/Deep-Learning-History development by creating an account on GitHub.

implemented wgan & wgan-gp in torch
github.com/kabir2505/De...
github.com/kabir2505/De...

onto some more gan models & vaes :)

06.03.2025 03:18 — 👍 0 🔁 0 💬 0 📌 0

x.com

Spent the day revisiting dropout, so I figured I’d turn it into a blog - kabir25.notion.site/Dropout-16e3...

01.01.2025 13:52 — 👍 0 🔁 0 💬 0 📌 0

Implemented *instruction fine-tuning* on a GPT-2 model on a small dataset & mine claimed Robert Frost wrote Pride and Prejudice😅

github.com/kabir2505/pr...

28.12.2024 17:31 — 👍 1 🔁 0 💬 0 📌 0

today's agenda..

21.12.2024 04:27 — 👍 1 🔁 0 💬 0 📌 0

my notes on the gpt-3 paper: kabir25.notion.site/GPT3-1603fc0...

20.12.2024 15:37 — 👍 1 🔁 0 💬 1 📌 0

good morning!!

20.12.2024 15:36 — 👍 2 🔁 0 💬 0 📌 0

this is insane, Huge congrats! 👏

19.12.2024 12:09 — 👍 1 🔁 0 💬 0 📌 0

If you are into ML theory (RL or not) with a proven track record, and you are interested in an industry research position, PM me. Feel free to spread the word.

19.12.2024 00:55 — 👍 74 🔁 31 💬 2 📌 0

today's read :)

18.12.2024 14:33 — 👍 0 🔁 0 💬 0 📌 1

NeurIPS FOMO is real 🫠 wish I could teleport..

14.12.2024 16:03 — 👍 0 🔁 0 💬 0 📌 0

Deep-Learning-papers/transformers/bert at main · kabir2505/Deep-Learning-papers Deep learning paper implementations. Contribute to kabir2505/Deep-Learning-papers development by creating an account on GitHub.

Built 𝗕𝗘𝗥𝗧 from scratch in pytorch. Took a bit to understand 𝗠̲𝗟̲𝗠̲(Masked Language Modeling) and 𝗡̲𝗦̲𝗣̲ (Next Sentence Prediction) but totally worth the grind.
Code: github.com/kabir2505/De...
Notes: kabir25.notion.site/BERT-1533fc0...

13.12.2024 12:56 — 👍 1 🔁 0 💬 0 📌 0

bad recs=zero vibes= productivity tanked!

06.12.2024 09:30 — 👍 1 🔁 0 💬 0 📌 0

tackling my first nlp kaggle competition, any suggestions or references?

06.12.2024 04:55 — 👍 0 🔁 0 💬 0 📌 0

notes on bert: kabir25.notion.site/BERT-1533fc0...
Still a work in progress..

05.12.2024 16:28 — 👍 0 🔁 0 💬 0 📌 0

Diving into BERT today :)

05.12.2024 12:31 — 👍 0 🔁 0 💬 0 📌 1

kabir

Latest posts by kabir25.bsky.social on Bluesky

@kabir25 is following 19 prominent accounts