ICLR Poster No Need to Talk: Asynchronous Mixture of Language ModelsICLR 2025
3/3
Mixture of experts on high latency networks with No Need to Talk iclr.cc/virtual/2025... (Thu Apr 24 3pm).
Joint work with @matpagliardini.bsky.social , Anastasiia Filippova, @pierreablin.bsky.social, Simin Fan, Skyler Seto, Angelos Katharopoulos, Ronan Collobert
21.04.2025 23:54 — 👍 0 🔁 0 💬 0 📌 0
ICLR Poster Task-Adaptive Pretrained Language Models via Clustered-Importance SamplingICLR 2025
2/3
Importance sampling for better pretraining distribution with CRISP iclr.cc/virtual/2025... (Sat Apr 26, 10 am).
21.04.2025 23:54 — 👍 1 🔁 0 💬 1 📌 0
#ICLR #TrainBetterLM I am at ICLR, come to our posters for improved language model training!
Recycle gradients for faster neural net training with AdEMAmix iclr.cc/virtual/2025... (Fri Apr 25, 10 am).
1/3
21.04.2025 23:54 — 👍 2 🔁 3 💬 1 📌 0