Sam Blouir @samblouir - Bluesky Profile

I am attending @naaclmeeting.bsky.social this week to present our paper. Come check out our poster at 14:00, Apr 30 in Hall 3 . @defnecirci.bsky.social and Hale Sirin will also be there to answer your questions!

30.04.2025 00:34 — 👍 3 🔁 2 💬 0 📌 0

#ICLR2025 & #ACMHCI are a wrap! 🎉 Couldn't ask for better vibes, conversations, and food.

🧬 BirdieDNA
🗣️ SLP Sidekick
🤖 Agents for Genomics

Big thanks to my co-authors @defnecirci.bsky.social, Flavia Negrete, Celeste Watkins, Asher Moldwin & Amarda Shehu, absolute rockstars 🤝🙏

30.04.2025 01:04 — 👍 2 🔁 1 💬 0 📌 0

Hey Marc! Could you add me to this student list? I can’t seem to DM you.

29.11.2024 18:51 — 👍 2 🔁 0 💬 0 📌 0

Haha, cool list! Can you add me?

25.11.2024 06:11 — 👍 1 🔁 0 💬 0 📌 0

Hi! Thanks for making this. Could you add me, please? :)

23.11.2024 16:29 — 👍 0 🔁 0 💬 0 📌 0

Hah! Can you add me please?

23.11.2024 06:17 — 👍 0 🔁 0 💬 0 📌 0

Sent a DM!

21.11.2024 22:59 — 👍 0 🔁 0 💬 0 📌 0

Please add me :) PhD student here in a multilingual focused NLP lab.

20.11.2024 20:18 — 👍 0 🔁 0 💬 0 📌 0

Hi, can you add me? Thank you.

20.11.2024 14:36 — 👍 0 🔁 0 💬 1 📌 0

Hi! Could I please join this group? Thank you.

20.11.2024 10:57 — 👍 0 🔁 0 💬 0 📌 0

Hi, could you please add me? Thank you.

19.11.2024 17:51 — 👍 1 🔁 0 💬 1 📌 0

Hi! Can I join this group? Working on several AI for Science research projects :)

19.11.2024 17:49 — 👍 1 🔁 0 💬 1 📌 0

Definitely D

19.11.2024 05:22 — 👍 0 🔁 0 💬 0 📌 0

Huge thanks to the George Mason University NLP Lab (@GMNLP), Stanford AI Lab (@StanfordAILab), and all of our collaborators! 🙏

18.11.2024 17:28 — 👍 3 🔁 0 💬 0 📌 0

Table of the Story Infilling Task, where the model is given a causal story with 3-7 entries each. One entry is masked out and the model is then asked to choose the most likely option. Hawk with Birdie gets 42.5% accuracy, Hawk with a causal version of Birdie gets 41.5% accuracy. Hawk with Next Token Prediction gets only 33.1%. That is an enormous performance boost for Hawk trained with Birdie - 42.5% vs 33.1% accuracy. A Transformer trained with Birdie gets 42.2% accuracy, and with Next Token Prediction, gets 41.9% accuracy. The performance difference here is more muted for the Transformer on this task, in contrast to the generative SQuAD V2 results, which saw the Transformer with Birdie pull ahead strongly.

General benchmark scores remain intact across 21 tasks on the EleutherAI LM Eval harness, and greatly improve on our new infilling task.

💡 With smarter training, we maintain SSMs’ efficiencies while dramatically enhancing their capabilities.

18.11.2024 17:28 — 👍 3 🔁 0 💬 1 📌 0

🔑 What's new?

• Dynamic Pre-training Curriculum: Optimized via Reinforcement Learning.

• Specialized Training Objectives: Tailored to SSMs' unique strengths.

• Bidirectional Processing: Maximizes fixed state capacity for extra performance.

18.11.2024 17:28 — 👍 2 🔁 0 💬 1 📌 0

Graph of the SQUAD V2 question-answering task. The X-axis shows the context length, showing the length of the tokenized Wikipedia articles used as context, and the Y-axis shows "Response Contains Labels", or the percentage of generated model responses that contained an acceptable answer to a question. The SQUAD V2 question-answering task entails the model reading a Wikipedia article, then being immediately asked a question about what it just read. The information is always found in the article. Training Hawk using BIrdie strongly outperforms using Next Token Prediction. Training with Next Token Prediction results in performance strongly declining when the Wikipedia article length increases to about 500 tokens. In this 500 token scenario, Hawk trained using Next Token Prediction retrieves the exact label less than 10% of the time, while the Birdie procedure results in over 55% accuracy. When the article is only 100 tokens long, Birdie retrieves the correct answer more than 40% of the time, while the Next Token Prediction model does this less than 30% of the time. With Birdie, Hawk matches the "context length vs performance" curves of the Transformer trained with Next Token Prediction, but has slightly worse performance. The Transformer trained with Birdie outperforms all models, with an average of about 75% accuracy, compared to the Next Token Prediction Transformer at 60%. Hawk trained with Birdie gets around 50%. Hawk trained with Next Token Prediction gets around 15%.

🌟 Stellar Results:

• Multi-Phone Number Retrieval: Birdie SSMs achieve 100% accuracy on single lookups; outperform standard SSMs even more as tasks become more complex.

• SQuAD V2: We match a Transformer's performance curve across sequence lengths, while standard SSMs fall behind.

18.11.2024 17:28 — 👍 2 🔁 0 💬 1 📌 0

The multi-number phonebook retrieval task entails retrieving several phone numbers from a phonebook at once, given names. Hawk trained using Birdie strongly outperforms Hawk trained using Next Token Prediction on the multi-number phonebook retrieval task. Hawk trained using Next Token Prediction performs just above random guessing when retrieving 1 and 4 phone numbers, and falls to random performance when retrieving more than 4 phone numbers. In contrast, Hawk trained using Birdie gets 100% accuracy when retrieving 1 phone number. That 100% score slowly decays to about 80% accuracy when retrieving up to 32 phone numbers simultaneously. Two Transformers are included, one trained using Birdie, and the other trained using Next Token Prediction. They both always achieve about 100% accuracy, even when retrieving 32 phone numbers.

Meet Birdie 🐤!

Our EMNLP 2024 paper boosts SSMs like Mamba and Hawk on long-range, context-heavy tasks, closing the gap with Transformers.

Proud to work with @jimmysmith1919.bsky.social, @antonisa.bsky.social, & Amarda Shehu.

📄 Paper: arxiv.org/abs/2411.01030
💻 Code: github.com/samblouir/bi...

18.11.2024 17:28 — 👍 17 🔁 1 💬 1 📌 0

The SQUAD V2 question-answering task entails the model reading a Wikipedia article, then being immediately asked a question about what it just read. The information is always found in the article. Training Hawk using BIrdie strongly outperforms using Next Token Prediction. Training with Next Token Prediction results in performance strongly declining when the Wikipedia article length increases to about 500 tokens. In this 500 token scenario, Hawk trained using Next Token Prediction retrieves the exact label less than 10% of the time, while the Birdie procedure results in over 55% accuracy. When the article is only 100 tokens long, Birdie retrieves the correct answer more than 40% of the time, while the Next Token Prediction model does this less than 30% of the time. With Birdie, Hawk matches the "context length vs performance" curves of the Transformer trained with Next Token Prediction, but has slightly worse performance. The Transformer trained with Birdie outperforms all models, with an average of about 75% accuracy, compared to the Next Token Prediction Transformer at 60%. Hawk trained with Birdie gets around 50%. Hawk trained with Next Token Prediction gets around 15%.

🌟 Stellar Results:

• Multi-Phone Number Retrieval: Birdie SSMs achieve 100% accuracy on single lookups; outperform standard SSMs even more as tasks become more complex.

• SQuAD V2: We match a Transformer's performance curve across sequence lengths, while standard SSMs fall behind.

18.11.2024 17:05 — 👍 0 🔁 0 💬 0 📌 0

Hawk (SSM) trained using Birdie strongly outperforms Hawk trained using Next Token Prediction on the squad v2 question-answering task - which entails the model reading a Wikipedia article, then being immediately asked a question about what it just read. Hawk trained using Next Token Prediction strongly declines in performance when the wikipedia article length increases to about 500 tokens. In this scenario, Hawk retrieves the exact label less than 10% of the time. When the article was only 96 tokens long, it was correct about 25% of the time. Hawk trained using Birdie matches the performance curves of the Transformer trained with Next Token Prediction, but has slightly worse performance. The Transformer trained with BIrdie outperforms all models, with an average of about 75% accuracy, compared to the Next Token Prediction Transformer at 60%. Hawk trained with Birdie gets around 50%. Hawk trained with Birdie gets around 15%.

🌟 Stellar Results:

• Multi-Phone Number Retrieval: Birdie SSMs achieve 100% accuracy on single lookups; outperform standard SSMs even more as tasks become more complex.

• SQuAD V2: We match a Transformer's performance curve across sequence lengths, while standard SSMs fall behind.

18.11.2024 16:48 — 👍 0 🔁 0 💬 0 📌 0

🔑 What's new with Birdie?
• Dynamic Pre-training Curriculum: Optimized via Reinforcement Learning.

• Specialized Training Objectives: Tailored to SSMs' unique strengths.

• Bidirectional Processing: Maximizes fixed state capacity for extra performance.

18.11.2024 16:48 — 👍 0 🔁 0 💬 1 📌 0

Would like to be added to this :)

18.11.2024 16:27 — 👍 1 🔁 0 💬 0 📌 0

Sam Blouir

Latest posts by samblouir.bsky.social on Bluesky

@samblouir is following 20 prominent accounts