Alessandro Sordoni's Avatar

Alessandro Sordoni

@murefil.bsky.social

ML Team MSR Montreal. Adjunct Prof UdeM MILA. Modularity & reasoning.

87 Followers  |  131 Following  |  4 Posts  |  Joined: 18.11.2024  |  1.4408

Latest posts by murefil.bsky.social on Bluesky

Post image

Today is the International Day for the Elimination of Violence against Women. According to the UN, more than 50 000 women were killed by a partner or family member in 2023 news.un.org/en/story/202... This number is an underestimate given that only 37 countries reported in 2023.

26.11.2024 06:16 β€” πŸ‘ 8    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

Yeah I suspected that, interesting!

25.11.2024 13:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

distributed learning for LLM?

recently, @primeintellect.bsky.social have announced finishing their 10B distributed learning, trained across the world.

what is it exactly?

🧡

25.11.2024 12:02 β€” πŸ‘ 23    πŸ” 6    πŸ’¬ 1    πŸ“Œ 2

Instead of averaging outer gradients would fancier model merging techniques (eg TIES) apply here?

25.11.2024 12:50 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

having better tools for reviewer and ac assignment would definitely help, ultimately reducing # reviews per paper while striving for relevance and quality could increase reviewer / ac engagement and free up their time to do better reviews

24.11.2024 22:23 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Informatics: ILCC: Language Processing, Speech Technology, Information Retrieval, Cognition Study Informatics: ILCC: Language Processing, Speech Technology, Information Retrieval, Cognition at the University of Edinburgh. Our postgraduate degree programmes focus on natural language processin...

Last 5 days to apply for a PhD at #EdinburghNLP!

Deadline: November 25

www.ed.ac.uk/studying/pos...

If you are passionate about:

- adaptive tokenization and memory in foundation models
- modular deep learning
- computational typology

please message me or meet me at #NeurIPS2024!

21.11.2024 13:41 β€” πŸ‘ 20    πŸ” 8    πŸ’¬ 0    πŸ“Œ 0
A sparse mask of attention scores based on VerticalAndSlashAttention and a plot of loss vs sparsity ratio for various methods.

A sparse mask of attention scores based on VerticalAndSlashAttention and a plot of loss vs sparsity ratio for various methods.

Another nano gem from my amazing student
Piotr Nawrot!

A repo & notebook on sparse attention for efficient LLM inference: github.com/PiotrNawrot/...

This will also feature in my #NeurIPS 2024 tutorial "Dynamic Sparsity in ML" with AndrΓ© Martins: dynamic-sparsity.github.io Stay tuned!

20.11.2024 12:51 β€” πŸ‘ 43    πŸ” 8    πŸ’¬ 2    πŸ“Œ 3

Explore zero-shot routing of parameter-efficient experts with Phatgoose arxiv.org/abs/2402.05859 and Arrow arxiv.org/abs/2405.11157 w. github.com/microsoft/mttl

πŸ‘‰ github.com/sordonia/pg_mb…

Part of "Dynamic Sparsity in ML" tuto #neurips2024, feedback welcome and join for discussions! 😊

21.11.2024 15:47 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

@murefil is following 20 prominent accounts