Badr AlKhamissi's Avatar

Badr AlKhamissi

@bkhmsi.bsky.social

PhD at EPFL ๐Ÿง ๐Ÿ’ป Ex @MetaAI, @SonyAI, @Microsoft Egyptian ๐Ÿ‡ช๐Ÿ‡ฌ

219 Followers  |  355 Following  |  61 Posts  |  Joined: 22.11.2024  |  1.5098

Latest posts by bkhmsi.bsky.social on Bluesky

Post image

Looking forward to be speaking at IndabaX Sudan on Building Responsible and Ethical LLMs!

๐Ÿ“… Saturday, December 13th
โฐ 2:00 PM (GMT+2)

Register here: docs.google.com/forms/d/e/1F...

See you all there! :)

07.12.2025 15:43 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Badr AlKhamissi's Website I am a PhD candidate at EPFL, co-advised by Antoine Bosselut and Martin Schrimpf. My research lies at the intersection of machine learning, neuroscience and cognitive science. Prior to EPFL, I was an ...

Not attending NeurIPS this year, but very much looking to connect.

Iโ€™m seeking a PhD research internship next summer in AI for Science, especially where AI meets brain and cognitive sciences. ๐Ÿง 

If youโ€™re hiring, Iโ€™d love to connect!

bkhmsi.github.io

02.12.2025 17:38 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Egyptians in AI Research A website dedicated to showcasing the profiles of prominent Egyptian researchers in the field of Artificial Intelligence.

I finally found time to update the Egyptians in AI Research website, apologies for the delay!

Super excited to share that we now feature 227 incredible Egyptian researchers!! ๐Ÿคฏ

Link: bkhmsi.github.io/egyptians-in...

24.11.2025 06:35 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
From Language to Cognition Large language models (LLMs) exhibit remarkable similarity to neural activity in the human language network. However, the key properties of language underlying this alignment---and how brain-like repr...

You can learn more about our work here: language-to-cognition.epfl.ch

Thanks to all my co-authors @gretatuckute.bsky.social, @davidtyt.bsky.social, @neurotaha.bsky.social and my advisors @abosselut.bsky.social and @mschrimpf.bsky.social!

02.11.2025 12:06 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

On my way to #EMNLP2025 ๐Ÿ‡จ๐Ÿ‡ณ

Iโ€™ll be presenting our work (Oral) on Nov 5, Special Theme session, Room A106-107 at 14:30.

Letโ€™s talk brains ๐Ÿง , machines ๐Ÿค–, and everything in between :D

Looking forward to all the amazing discussions!

02.11.2025 12:06 โ€” ๐Ÿ‘ 9    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

10/
๐Ÿ™ Huge thanks to my incredible co-authors @cndesabbata.bsky.social, @gretatuckute.bsky.social, @eric-zemingchen.bsky.social

and my advisors @mschrimpf.bsky.social and @abosselut.bsky.social!

20.10.2025 12:05 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization Human cognitive behavior arises from the interaction of specialized brain networks dedicated to distinct functions, such as language, logic, and social reasoning. Inspired by this organization, we pro...

9/
๐Ÿ”— Explore MiCRo:

๐ŸŒ Website: cognitive-reasoners.epfl.ch
๐Ÿ“„ Paper: arxiv.org/abs/2506.13331
๐Ÿค— HF Space (interactive): huggingface.co/spaces/bkhmsi/cognitive-reasoners
๐Ÿง  HF Models: huggingface.co/collections/bkhmsi/mixture-of-cognitive-reasoners-684709a0f9cdd7fa180f6678

20.10.2025 12:05 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Mixture of Cognitive Reasoners - a bkhmsi Collection https://arxiv.org/abs/2506.13331

8/
We now have a collection of 10 MiCRo models on HF that you can try out yourself!

๐Ÿง  HF Models: huggingface.co/collections/bkhmsi/mixture-of-cognitive-reasoners-684709a0f9cdd7fa180f6678

20.10.2025 12:05 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Mixture of Cognitive Reasoners - a Hugging Face Space by bkhmsi Enter a prompt and select a model to see how tokens are routed across Language, Logic, Social, and World experts. Optionally, disable experts to see how routing changes.

7/
We built an interactive HF Space where you can see how MiCRo routes tokens across specialized experts for any prompt, and even toggle experts on/off to see how behavior changes.

๐Ÿค— Try it here: huggingface.co/spaces/bkhms...
(Check the example prompts to get started!)

20.10.2025 12:05 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

6/
We also wondered: if neuroscientists use functional localizers to map networks in the brain, could we do the same for MiCRoโ€™s experts?

The answer: yes! The very same localizers successfully recovered the corresponding expert modules in our models!

20.10.2025 12:05 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

5/
One result I was particularly excited about is the emergent hierarchy we found across MiCRo layers:

๐Ÿ”บEarlier layers route tokens to Language experts.
๐Ÿ”ปDeeper layers shift toward domain-relevant experts.

This emergent hierarchy mirrors patterns observed in the human brain ๐Ÿง 

20.10.2025 12:05 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

4/
We find that MiCRo matches or outperforms baselines on reasoning tasks (e.g., GSM8K, BBH) and aligns better with human behavior (CogBench), while maintaining interpretability!!

20.10.2025 12:05 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

3/
โœจ Why it matters:

MiCRo bridges AI and neuroscience:

๐Ÿค– ML side: Modular architectures make LLMs more interpretable and controllable.
๐Ÿง  Cognitive side: Provides a testbed for probing how the relative contributions of different brain networks support complex behavior.

20.10.2025 12:05 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

2/
๐Ÿงฉ Recap:
MiCRo takes a pretrained language model and post-trains it to develop distinct, brain-inspired modules aligned with four cognitive networks:

๐Ÿ—ฃ๏ธ Language
๐Ÿ”ข Logic / Multiple Demand
๐Ÿงโ€โ™‚๏ธ Social / Theory of Mind
๐ŸŒ World / Default Mode Network

20.10.2025 12:05 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿš€ Excited to share a major update to our โ€œMixture of Cognitive Reasonersโ€ (MiCRo) paper!

We ask: What benefits can we unlock by designing language models whose inner structure mirrors the brainโ€™s functional specialization?

More below ๐Ÿง ๐Ÿ‘‡
cognitive-reasoners.epfl.ch

20.10.2025 12:05 โ€” ๐Ÿ‘ 29    ๐Ÿ” 9    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1

Excited to be part of this cool work led by Melika Honarmand!

We show that by selectively targeting VLM units that mirror the brainโ€™s visual word form area, models develop dyslexic-like reading impairments, while leaving other abilities intact!! ๐Ÿง ๐Ÿค–

Details in the ๐Ÿงต๐Ÿ‘‡

02.10.2025 13:27 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
From Language to Cognition: How LLMs Outgrow the Human Language Network Large language models (LLMs) exhibit remarkable similarity to neural activity in the human language network. However, the key properties of language shaping brain-like representations, and their evolu...

Huge thanks to my amazing collaborators: @gretatuckute.bsky.social, @davidtyt.bsky.social, @neurotaha.bsky.social & advisors @abosselut.bsky.social and @mschrimpf.bsky.social!

You can find more about our paper on the project's website: language-to-cognition.epfl.ch

Paper: arxiv.org/abs/2503.01830

25.09.2025 14:56 โ€” ๐Ÿ‘ 4    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Now that the ICLR deadline is behind us, happy to share that From Language to Cognition has been accepted as an Oral at #EMNLP2025! ๐ŸŽ‰

Looking forward to seeing many of you in Suzhou ๐Ÿ‡จ๐Ÿ‡ณ

25.09.2025 14:56 โ€” ๐Ÿ‘ 20    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

1/๐Ÿšจ New preprint

How do #LLMsโ€™ inner features change as they train? Using #crosscoders + a new causal metric, we map when features appear, strengthen, or fade across checkpointsโ€”opening a new lens on training dynamics beyond loss curves & benchmarks.

#interpretability

25.09.2025 14:02 โ€” ๐Ÿ‘ 14    ๐Ÿ” 6    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Post image

NEW PAPER ALERT: Recent studies have shown that LLMs often lack robustness to distribution shifts in their reasoning. Our paper proposes a new method, AbstRaL, to augment LLMsโ€™ reasoning robustness, by promoting their abstract thinking with granular reinforcement learning.

23.06.2025 14:32 โ€” ๐Ÿ‘ 7    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

Check out @bkhmsi.bsky.social 's great work on mixture-of-expert models that are specialized to represent the behavior of known brain networks.

18.06.2025 10:46 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization Human intelligence emerges from the interaction of specialized brain networks, each dedicated to distinct cognitive functions such as language processing, logical reasoning, social understanding, and ...

11/ ๐ŸŒ Links

Paper: arxiv.org/abs/2506.13331
Project Page: bkhmsi.github.io/mixture-of-c...
Code: github.com/bkhmsi/mixtu...
Models: huggingface.co/collections/...

In collaboration with: @cndesabbata.bsky.social, @eric-zemingchen.bsky.social, @mschrimpf.bsky.social, & @abosselut.bsky.social

17.06.2025 15:07 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

10/ ๐Ÿงพ Conclusion:
MiCRo weaves together modularity, interpretability & brain-inspired design to build controllable and high-performing models, moving toward truly cognitively grounded LMs.

17.06.2025 15:07 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

9/ ๐Ÿ’ก Key insights:
1. Minimal data (~3k samples) in Stage 1 can induce lasting specialization
2. Modular structure enables interpretability, control, and scalability (e.g., topโ€‘2 routing can boost performance)
3. Approach generalizes across domains & base models

17.06.2025 15:07 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

8/ ๐Ÿงฌ Brain alignment:
Neuroscience localizers (e.g., for language, multiple-demand) rediscover the corresponding experts in MiCRo, showing functional alignment with brain networks. However, ToM localizer fail to identify the social expert.

Figures for MiCRo-Llama & MiCRo-OLMo.

17.06.2025 15:07 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

7/ ๐Ÿงฉ Steering & controllability:
Removing or emphasizing specific experts steers model behavior: Ablating logic expert hurts math accuracy; suppressing social reasoning improves math slightlyโ€”showcasing fine-grained control.

17.06.2025 15:07 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

6/ ๐Ÿ”„ Interpretable routing:
Early layers route most tokens to the language expert; deeper layers route to domain-relevant experts (e.g., logic expert for math), matching task semantics.

17.06.2025 15:07 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

5/ ๐Ÿ“ˆ Performance gains:
We evaluate on 6 reasoning benchmarks (MATH, GSM8K, MMLU, BBHโ€ฆ), MiCRo outperforms both dense and โ€œgeneralโ€‘expertโ€ baselines: modular models with random specialist assignment in Stage 1.

17.06.2025 15:07 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

4/ ๐Ÿ“š Training curriculum (3 stages):
โ€ข Stage 1: Expert training on small curated domain-specific datasets (~3k samples)
โ€ข Stage 2: Router training, experts frozen
โ€ข Stage 3: End-to-end finetuning on large instruction corpus (939k samples)
This seeds specialization effectively.

17.06.2025 15:07 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

3/ โš™๏ธ Architecture:
We start with a pretrained model (e.g. Llamaโ€‘3.2โ€‘1B). Clone each layer into four experts. Then, a light router assigns tokens dynamically to a single expert (topโ€‘1 routing) per layer. Keeping a comparable number of active parameters to the base model.

17.06.2025 15:07 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@bkhmsi is following 20 prominent accounts