Mohammed Hamdy's Avatar

Mohammed Hamdy

@mmhamdy.bsky.social

A curious explorer of human and machine learning ๐Ÿง ๐Ÿค๐Ÿค–

166 Followers  |  337 Following  |  35 Posts  |  Joined: 21.11.2024  |  2.4497

Latest posts by mmhamdy.bsky.social on Bluesky

Ok, I'll confess! I too like Roland Emmerich's Godzilla. I even like the creature design in this film!

08.06.2025 18:36 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image Post image Post image Post image

The one Frankenstein film to rule them all!

Thank you, @realgdt.bsky.social ๐Ÿ™

02.06.2025 20:11 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

The Consciousness API: What if consciousness isn't contained within us, but rather we are temporary antennas, tuning into a vast, universal broadcast of awareness?

31.05.2025 14:56 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Pandemonium: The Transformers Story A Blog post by Mohammed Hamdy on Hugging Face

huggingface.co/blog/mmhamdy...

30.03.2025 11:37 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

In this article, I explore the story behind some of the ideas introduced in the Transformer paper.

Exploring things from the fundamental attention mechanism that lies at its heart to the surprisingly simple explanation for its name.

You may find it interesting! ๐Ÿ™‚

๐Ÿ‘‡link below

30.03.2025 11:37 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

We're particularly proud to release Aya Vision 8B - it's compact ๐Ÿญ and efficient ๐ŸŽ, outperforming models up to 11x its size ๐Ÿ“ˆ.

Releasing open weights helps to make breakthroughs in VLMs accessible to the research community.

05.03.2025 17:56 โ€” ๐Ÿ‘ 14    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Join the Mozilla AI Discord Server! A global space for sharing and advancing open-source AI. | 3757 members

๐Ÿ“… Event on Mozilla AI discord: discord.gg/QTCRfefF?eve...

๐Ÿ“„ ProGen paper: www.biorxiv.org/content/10.1...

27.01.2025 12:29 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

๐Ÿงฌ Join us this Wednesday on @mozilla.ai discord server in our second session of the Biological Representation Learning series where we discuss landmark papers in the field!

We will be presenting the ProGen protein language model paper from Salesforce. See you there! ๐Ÿ˜ƒ

27.01.2025 12:29 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Join the Mozilla AI Discord Server! A global space for sharing and advancing open-source AI. | 3695 members

๐Ÿ“ข Join us on Discord for our first Blueprints Hub event ๐Ÿ“ข

Discover Blueprints and learn how to transform text into podcast-style conversations using entirely open source tools.

๐Ÿ—“๏ธ Wednesday, Jan. 22nd
โฐ 1:30-2:00 PM EST
๐Ÿ”— Event: discord.gg/BaYFBaeh?eve...

#OpenSource #AI #Blueprints #MozillaAI

20.01.2025 12:15 โ€” ๐Ÿ‘ 6    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

As the @cohereforai.bsky.social joins the Bluesky family โ€” we will be sharing paper gems from when we first started as a lab.

This paper is part of a larger research agenda where we have focused on how to better represent the long tail = making AI work for almost all real world distributions.

18.01.2025 04:57 โ€” ๐Ÿ‘ 25    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
kyutai/helium-1-preview-2b ยท Hugging Face Weโ€™re on a journey to advance and democratize artificial intelligence through open source and open science.

Meet Helium-1 preview, our 2B multi-lingual LLM, targeting edge and mobile devices, released under a CC-BY license. Start building with it today!
huggingface.co/kyutai/heliu...

13.01.2025 17:50 โ€” ๐Ÿ‘ 16    ๐Ÿ” 5    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 5

And lastly, big thanks to you for making it this far ๐Ÿค—, donโ€™t forget to read the paper!

www.dataprovenance.org/Multimodal_D...

11/n

19.12.2024 16:34 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
This is where the data to build AI comes from New findings show how the sources of data are concentrating power in the hands of the most powerful tech companies.

Big thanks to Melissa Heikkilรค for featuring our work in MIT Tech Review.

www.technologyreview.com/2024/12/18/1...

19.12.2024 16:34 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Xuhui Zhou, Caiming Xiong, Luis Villa,
@stellaathena.bsky.social, Alex Pentland,
@sarahooker.bsky.social, Jad Kabbara

9/n

19.12.2024 16:34 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

An Dinh, Shrestha Mohanty, Deividas Mataciunas,
Tobin South, Jianguo Zhang,
@arielnlee.bsky.social , Campbell S. Lund, Christopher Klamm, Damien Sileo, Diganta Misra, Enrico Shippole, Kevin Klyman, Lester JV Miranda, Niklas Muennighoff, Seonghyeon Ye, Seungone Kim, Vipul Gupta, Vivek Sharma

8/n

19.12.2024 16:34 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐ŸŽ‰ big thanks to all the contributors to this huge and magnificent effort. I'm truly honored for the chance to work alongside all of you: Manan Dey, Nayan Saxena,
Ahmad Mustafa Anis, Emad A. Alghamdi, Vu Minh Chien, Naana Obeng-Marnu, Da Yin, Kun Qian, Yizhi Li, Minnie Liang

7/n

19.12.2024 16:34 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

This work was supported by the Mozilla Foundation Data Futures Lab, and was lead by: @shaynelongpre.bsky.social, Nikhil Singh, Manuel Cherep, Kushagra Tiwary, Joanna Materzynska,
William Brannon, and Robert Mahari

6/n

19.12.2024 16:34 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

4๏ธโƒฃ Linguistic representation has not improved by most measures: Gini Coefficients for text and speech datasets show significant concentration, indicating limited progress in diversifying data sources.

5/n

19.12.2024 16:34 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

3๏ธโƒฃ Geographical representation has not improved for a decade: Datasets from African and South American organizations account for < 0.2% of all modality content, while North American or European organizations span 93% of text tokens and 60%+ hours of speech and video.

4/n

19.12.2024 16:34 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

2๏ธโƒฃ Inconsistent dataset licenses: While ~30% of datasets have permissive licenses, 78%+ of their sources carry hidden anti-crawling or licensing restrictions, making compliance a minefield.

3/n

19.12.2024 16:34 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿ“Œ Key Findings

1๏ธโƒฃ The web is still the primary source: The internet, social media platforms, and synthetically generated data are increasingly becoming the predominant sources for multimodal data, compared to curated sources.

2/n

19.12.2024 16:34 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

โœจ Excited to share our latest work from The Data Provenance Initiative โ˜ธ๏ธ

This is the most comprehensive audit of multimodal training data, auditing ~4000 datasets between 1990 and 2024, and covering more than 400 unique tasks in 608 languages!

๐Ÿงต 1/n

19.12.2024 16:34 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

EPIC! ๐Ÿค—

11.12.2024 12:02 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
GitHub - johko/computer-vision-course: This repo is the homebase of a community driven course on Computer Vision with Neural Networks. Feel free to join us on the Hugging Face discord: hf.co/join/disc... This repo is the homebase of a community driven course on Computer Vision with Neural Networks. Feel free to join us on the Hugging Face discord: hf.co/join/discord - johko/computer-vision-course

๐ŸŒŸ 500! ๐ŸŒŸ

Our Community Computer Vision Course Repo just reached 500 stars on GitHub: github.com/johko/comput... ๐Ÿคฉ

I'm really proud of all the amazing content people from the community have contributed here and that they still keep on adding very cool and helpful material ๐Ÿ’ช

01.12.2024 20:41 โ€” ๐Ÿ‘ 12    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 2
Post image

The Hudsucker Proxy is the most underrated Coen Brothers film!

29.11.2024 12:26 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Preview
Post Training in Deep Learning with Last Kernel One of the main challenges of deep learning methods is the choice of an appropriate training strategy. In particular, additional steps, such as unsupervised pre-training, have been shown to greatly im...

It can go even further...
arxiv.org/abs/1611.04499

27.11.2024 04:22 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Funny thought: if "post-training" refers mostly to supervised instruction-tuning and alignment of a "pre-trained" model, then where does the actual "training" happen! ๐Ÿ˜€

27.11.2024 03:53 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Super excited to announce our best open-source language models yet. OLMo 2.

These instruct models are hot off the press -- finished training with our new RL method this morning and vibes are very good.

26.11.2024 20:57 โ€” ๐Ÿ‘ 93    ๐Ÿ” 12    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 2
Post image

Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.

SmolVLM can be fine-tuned on a Google collab and be run on a laptop! Or process millions of documents with a consumer GPU!

26.11.2024 15:57 โ€” ๐Ÿ‘ 104    ๐Ÿ” 22    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 4

Nice ones! ๐Ÿ˜€ They probably were created after this post. Someone has to create a new super mega starter pack!๐Ÿ˜…

27.11.2024 00:14 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@mmhamdy is following 19 prominent accounts