Mohamad Wahba's Avatar

Mohamad Wahba

@m-wahba.bsky.social

CS grad striving to master SWE & ML engineering ๐Ÿ’ป | Passionate about Arabic NLP & advancing knowledge ๐ŸŒ | Exploring data, MLOps, math, science, and languages to build impactful solutions ๐Ÿš€ | Lifelong learner ๐Ÿ“š

20 Followers  |  443 Following  |  1 Posts  |  Joined: 22.11.2024  |  1.5164

Latest posts by m-wahba.bsky.social on Bluesky

Just enrolled in the 2025 ๐ƒ๐š๐ญ๐š ๐„๐ง๐ ๐ข๐ง๐ž๐ž๐ซ๐ข๐ง๐  ๐™๐จ๐จ๐ฆ๐œ๐š๐ฆ๐ฉ by DataTalksClub! ๐Ÿš€

Can't wait to explore data engineering and grow with an amazing cohort. Big shoutout to DataTalksClub for this awesome opportunity!

#DataEngineering #LearningInPublic

14.01.2025 03:12 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

For those who donโ€™t feel like they fit into my Grumpy Machine Learners list (which I still need to update based on 100+ requests) Iโ€™ve created another starter pack:

go.bsky.app/Js7ka12

(Self) nominations welcome.

22.11.2024 18:40 โ€” ๐Ÿ‘ 79    ๐Ÿ” 13    ๐Ÿ’ฌ 37    ๐Ÿ“Œ 1
Preview
The Ultimate Guide to PyTorch Contributions Tensors and Dynamic neural networks in Python with strong GPU acceleration - pytorch/pytorch

For those who wonder about the best way to start contributing to pytorch or open-source projects, here are the top three pointers I'd share:

1. The Ultimate Guide to PyTorch Contributions github.com/pytorch/pyto...
For pytorch core that should be the n1 item on your list.

23.11.2024 14:13 โ€” ๐Ÿ‘ 18    ๐Ÿ” 3    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

Training variance is a thing and no one measures it because research models get trained once to beat the benchmark by 0.2 AP or whatever and then never trained again.

In prod one of the first things we do is train (the same model) a ton over different shuffled splits of the data in order toโ€ฆ 1/3

22.11.2024 22:00 โ€” ๐Ÿ‘ 46    ๐Ÿ” 6    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 3
Post image

You know the "๐Ÿ”นAI Overview" you get on Google Search?

I discovered today that it's repeating as fact something I made up 7 years ago as a joke.

"Kyloren syndrome" is a fictional disease I invented as part of a sting operation to prove that you can publish any nonsense in predatory journals...

22.11.2024 16:06 โ€” ๐Ÿ‘ 4623    ๐Ÿ” 1752    ๐Ÿ’ฌ 124    ๐Ÿ“Œ 109
Using Excel for optimization problems
YouTube video by Jeremy Howard Using Excel for optimization problems

Here's a walk-through of a general-purpose approach to solving many types of optimization problem. It's often not the most efficient way, but it is often fast enough, and it doesn't require using different methods for different problems.
youtu.be/U2b5Cacertc

19.11.2024 23:13 โ€” ๐Ÿ‘ 128    ๐Ÿ” 5    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1
Book outline

Book outline

Over the past decade, embeddings โ€” numerical representations of
machine learning features used as input to deep learning models โ€” have
become a foundational data structure in industrial machine learning
systems. TF-IDF, PCA, and one-hot encoding have always been key tools
in machine learning systems as ways to compress and make sense of
large amounts of textual data. However, traditional approaches were
limited in the amount of context they could reason about with increasing
amounts of data. As the volume, velocity, and variety of data captured
by modern applications has exploded, creating approaches specifically
tailored to scale has become increasingly important.
Googleโ€™s Word2Vec paper made an important step in moving from
simple statistical representations to semantic meaning of words. The
subsequent rise of the Transformer architecture and transfer learning, as
well as the latest surge in generative methods has enabled the growth
of embeddings as a foundational machine learning data structure. This
survey paper aims to provide a deep dive into what embeddings are,
their history, and usage patterns in industry.

Over the past decade, embeddings โ€” numerical representations of machine learning features used as input to deep learning models โ€” have become a foundational data structure in industrial machine learning systems. TF-IDF, PCA, and one-hot encoding have always been key tools in machine learning systems as ways to compress and make sense of large amounts of textual data. However, traditional approaches were limited in the amount of context they could reason about with increasing amounts of data. As the volume, velocity, and variety of data captured by modern applications has exploded, creating approaches specifically tailored to scale has become increasingly important. Googleโ€™s Word2Vec paper made an important step in moving from simple statistical representations to semantic meaning of words. The subsequent rise of the Transformer architecture and transfer learning, as well as the latest surge in generative methods has enabled the growth of embeddings as a foundational machine learning data structure. This survey paper aims to provide a deep dive into what embeddings are, their history, and usage patterns in industry.

Cover image

Cover image

Just realized BlueSky allows sharing valuable stuff cause it doesn't punish links. ๐Ÿคฉ

Let's start with "What are embeddings" by @vickiboykis.com

The book is a great summary of embeddings, from history to modern approaches.

The best part: it's free.

Link: vickiboykis.com/what_are_emb...

22.11.2024 11:13 โ€” ๐Ÿ‘ 653    ๐Ÿ” 101    ๐Ÿ’ฌ 22    ๐Ÿ“Œ 6

@m-wahba is following 20 prominent accounts