This comes at the right time, especially with vscode retiring their data viewer
22.12.2024 10:42 — 👍 2 🔁 0 💬 0 📌 0@tahayassine.me.bsky.social
Independent researcher working on NLP/LLMs · PhD in AI & Wireless Comms tahayassine.me
This comes at the right time, especially with vscode retiring their data viewer
22.12.2024 10:42 — 👍 2 🔁 0 💬 0 📌 0
[1] arxiv.org/abs/2110.03742
[2] arxiv.org/abs/2206.04674
[3] arxiv.org/abs/2205.12701
[4] arxiv.org/abs/2405.11157
[3] is perhaps the most thorough work I could find exploring this setup for learning multiple tasks. They also investigate soft-routing. [4] seems interesting too, they train LoRAs on the same base for different tasks and train the router to select the correct LoRA to use for a given input.
16.12.2024 21:33 — 👍 0 🔁 0 💬 1 📌 0
On the other hand, I think for your use case what you're looking for is training a task-level MoE rather than a token-level one. For example, both [1] and [2] find that a task-MoE is better than a token-MoE for language related tasks.
In the case of Mixtral they don't mention any special auxiliary loss to incentivize the router to push experts to specialize. In general, an auxiliary term may be added to encourage an even assignment of tokens across experts for better load balancing.
16.12.2024 21:33 — 👍 0 🔁 0 💬 1 📌 0Sorry I'm only responding now.
I'm no expert when it comes to MoEs (no pun intended), but I believe what you're referring to is the specialization of experts under no explicit domain conditioning.
Maybe you could train an MoE? Your aux model would be the router and part of the main model, and you'd train it with a corresponding loss term to route to the correct expert at training time. This obviously means you'd have as many experts as you have modes in your data dist if you do hard routing.
14.12.2024 09:54 — 👍 0 🔁 0 💬 1 📌 0These madlads also made a tool that allows you to create a colormap and shows you advanced metrics to help you
03.12.2024 22:06 — 👍 0 🔁 0 💬 1 📌 0TIL an insane amount of r&d went into making Matplotlib's colormaps
www.youtube.com/watch?v=xAol...
The developer changing his mind while writing the docs and letting us know
02.12.2024 19:54 — 👍 0 🔁 0 💬 0 📌 0"network graph" seems to work as a workaround
01.12.2024 13:28 — 👍 0 🔁 0 💬 0 📌 0It's so annoying that "graph" is both used to refer to plots in general and to a specific type of plots. Searching "interactive graph" on google brings up line plots when I'm really looking for graphs with nodes and edges...
01.12.2024 13:23 — 👍 0 🔁 0 💬 1 📌 0Wow, TIL. Now it's gonna sound weird when I use in french.
01.12.2024 13:17 — 👍 1 🔁 0 💬 1 📌 0Lofi for reading papers and synthwave for coding
30.11.2024 04:38 — 👍 1 🔁 0 💬 0 📌 0Nice to know, will give it a try
29.11.2024 15:36 — 👍 1 🔁 0 💬 0 📌 0Have you considered using an eGPU?
29.11.2024 15:21 — 👍 0 🔁 0 💬 1 📌 0Classical Streisand effect. Now someone released a dataset of 298M posts!
huggingface.co/datasets/evb...
When I a run a job on the gpu sitting on my desk, I always whisper words of encouragement "it's ok, you've got this" just in case models are actually sentient. You never know.
28.11.2024 17:51 — 👍 0 🔁 0 💬 0 📌 0Any reason you went this route rather than using something like Ansible?
27.11.2024 08:13 — 👍 0 🔁 0 💬 1 📌 0TIL you can do print(f"{big_number:,}") to display a comma separated number. It's so much easier to read this way.
26.11.2024 00:30 — 👍 1 🔁 0 💬 0 📌 0- vscode works really well with the remote extension, so no need to use the browser client imo
- nix shells are great if you use them outside of Python, cf my 1st point. I use them with direnv and really like the dx
This is almost my current setup but here are a few points:
- I use an eGPU but a dedicated server is cool too
- you really want to use docker on top of NixOS, it's a disaster with python because packages are not always available/up to date/working; in containers use traditional pip/uv
I'm sure there was a spike of downloads at some point around january of this year and they were all me
25.11.2024 04:08 — 👍 0 🔁 0 💬 0 📌 0Many attempts were made in the past
github.com/pypi/support...
My curse is wanting to spend a week trying to optimize my GPU utilization when the job could finish by the morning if I let it run
24.11.2024 04:01 — 👍 0 🔁 0 💬 0 📌 0Still waiting for that fruit fly brain they mapped to drop on @huggingface.bsky.social
23.11.2024 23:37 — 👍 0 🔁 0 💬 0 📌 0I don't see the og deep learning goodfellow
16.11.2024 23:26 — 👍 0 🔁 0 💬 0 📌 0Not macOS specific but here is a non-exhaustive list of CLI tools I really like:
- zsh: shell; maybe will move to fish at some point
- btop: (way) better htop
- atuin: better history
- yazi: tui file manager
- starship: terminal prompt
- zellij: better tmux
- kitty: terminal; until ghostty's release