Taha Yassine's Avatar

Taha Yassine

@tahayassine.me.bsky.social

Independent researcher working on NLP/LLMs · PhD in AI & Wireless Comms tahayassine.me

51 Followers  |  110 Following  |  32 Posts  |  Joined: 30.12.2023  |  1.8069

Latest posts by tahayassine.me on Bluesky

This comes at the right time, especially with vscode retiring their data viewer

22.12.2024 10:42 — 👍 2    🔁 0    💬 0    📌 0


[1] arxiv.org/abs/2110.03742
[2] arxiv.org/abs/2206.04674
[3] arxiv.org/abs/2205.12701
[4] arxiv.org/abs/2405.11157

16.12.2024 21:33 — 👍 0    🔁 0    💬 0    📌 0

[3] is perhaps the most thorough work I could find exploring this setup for learning multiple tasks. They also investigate soft-routing. [4] seems interesting too, they train LoRAs on the same base for different tasks and train the router to select the correct LoRA to use for a given input.

16.12.2024 21:33 — 👍 0    🔁 0    💬 1    📌 0
Post image


On the other hand, I think for your use case what you're looking for is training a task-level MoE rather than a token-level one. For example, both [1] and [2] find that a task-MoE is better than a token-MoE for language related tasks.

16.12.2024 21:33 — 👍 0    🔁 0    💬 1    📌 0

In the case of Mixtral they don't mention any special auxiliary loss to incentivize the router to push experts to specialize. In general, an auxiliary term may be added to encourage an even assignment of tokens across experts for better load balancing.

16.12.2024 21:33 — 👍 0    🔁 0    💬 1    📌 0

Sorry I'm only responding now.
I'm no expert when it comes to MoEs (no pun intended), but I believe what you're referring to is the specialization of experts under no explicit domain conditioning.

16.12.2024 21:33 — 👍 0    🔁 0    💬 2    📌 0

Maybe you could train an MoE? Your aux model would be the router and part of the main model, and you'd train it with a corresponding loss term to route to the correct expert at training time. This obviously means you'd have as many experts as you have modes in your data dist if you do hard routing.

14.12.2024 09:54 — 👍 0    🔁 0    💬 1    📌 0
Preview
GitHub - matplotlib/viscm: A tool for visualizing and designing colormaps using colorspacious and matplotlib A tool for visualizing and designing colormaps using colorspacious and matplotlib - matplotlib/viscm

github.com/matplotlib/v...

03.12.2024 22:06 — 👍 0    🔁 0    💬 0    📌 0
Post image

These madlads also made a tool that allows you to create a colormap and shows you advanced metrics to help you

03.12.2024 22:06 — 👍 0    🔁 0    💬 1    📌 0
A Better Default Colormap for Matplotlib | SciPy 2015 | Nathaniel Smith and Stéfan van der Walt
YouTube video by Enthought A Better Default Colormap for Matplotlib | SciPy 2015 | Nathaniel Smith and Stéfan van der Walt

TIL an insane amount of r&d went into making Matplotlib's colormaps
www.youtube.com/watch?v=xAol...

03.12.2024 22:06 — 👍 0    🔁 0    💬 2    📌 0
Post image

The developer changing his mind while writing the docs and letting us know

02.12.2024 19:54 — 👍 0    🔁 0    💬 0    📌 0

"network graph" seems to work as a workaround

01.12.2024 13:28 — 👍 0    🔁 0    💬 0    📌 0

It's so annoying that "graph" is both used to refer to plots in general and to a specific type of plots. Searching "interactive graph" on google brings up line plots when I'm really looking for graphs with nodes and edges...

01.12.2024 13:23 — 👍 0    🔁 0    💬 1    📌 0

Wow, TIL. Now it's gonna sound weird when I use in french.

01.12.2024 13:17 — 👍 1    🔁 0    💬 1    📌 0

Lofi for reading papers and synthwave for coding

30.11.2024 04:38 — 👍 1    🔁 0    💬 0    📌 0

Nice to know, will give it a try

29.11.2024 15:36 — 👍 1    🔁 0    💬 0    📌 0

Have you considered using an eGPU?

29.11.2024 15:21 — 👍 0    🔁 0    💬 1    📌 0
Preview
evborjnvioerjnvuowsetngboetgjbeigjaweuofjf/bluesky-298-million-Posts · Datasets at Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Classical Streisand effect. Now someone released a dataset of 298M posts!
huggingface.co/datasets/evb...

29.11.2024 14:39 — 👍 0    🔁 0    💬 0    📌 0

When I a run a job on the gpu sitting on my desk, I always whisper words of encouragement "it's ok, you've got this" just in case models are actually sentient. You never know.

28.11.2024 17:51 — 👍 0    🔁 0    💬 0    📌 0

Any reason you went this route rather than using something like Ansible?

27.11.2024 08:13 — 👍 0    🔁 0    💬 1    📌 0
Post image

TIL you can do print(f"{big_number:,}") to display a comma separated number. It's so much easier to read this way.

26.11.2024 00:30 — 👍 1    🔁 0    💬 0    📌 0

- vscode works really well with the remote extension, so no need to use the browser client imo
- nix shells are great if you use them outside of Python, cf my 1st point. I use them with direnv and really like the dx

25.11.2024 20:36 — 👍 0    🔁 0    💬 0    📌 0

This is almost my current setup but here are a few points:
- I use an eGPU but a dedicated server is cool too
- you really want to use docker on top of NixOS, it's a disaster with python because packages are not always available/up to date/working; in containers use traditional pip/uv

25.11.2024 20:36 — 👍 0    🔁 0    💬 1    📌 0

I'm sure there was a spike of downloads at some point around january of this year and they were all me

25.11.2024 04:08 — 👍 0    🔁 0    💬 0    📌 0
Preview
PEP 541 Request: dotenv · Issue #2568 · pypi/support Project to be claimed dotenv: https://pypi.org/project/dotenv Your PyPI username theskumar: https://pypi.org/user/theskumar Reasons for the request dotenv package has been abandoned for many years,...

Many attempts were made in the past
github.com/pypi/support...

25.11.2024 03:14 — 👍 5    🔁 0    💬 1    📌 0
Post image

My curse is wanting to spend a week trying to optimize my GPU utilization when the job could finish by the morning if I let it run

24.11.2024 04:01 — 👍 0    🔁 0    💬 0    📌 0

Still waiting for that fruit fly brain they mapped to drop on @huggingface.bsky.social

23.11.2024 23:37 — 👍 0    🔁 0    💬 0    📌 0
Post image 16.11.2024 23:27 — 👍 0    🔁 0    💬 0    📌 0

I don't see the og deep learning goodfellow

16.11.2024 23:26 — 👍 0    🔁 0    💬 0    📌 0

Not macOS specific but here is a non-exhaustive list of CLI tools I really like:
- zsh: shell; maybe will move to fish at some point
- btop: (way) better htop
- atuin: better history
- yazi: tui file manager
- starship: terminal prompt
- zellij: better tmux
- kitty: terminal; until ghostty's release

15.11.2024 02:17 — 👍 3    🔁 0    💬 1    📌 0

@tahayassine.me is following 20 prominent accounts