Graham Neubig's Avatar

Graham Neubig

@gneubig.bsky.social

Associate professor at CMU, studying natural language processing and machine learning. Co-founder All Hands AI

839 Followers  |  30 Following  |  2 Posts  |  Joined: 20.11.2024
Posts Following

Posts by Graham Neubig (@gneubig.bsky.social)

Where does one language model outperform the other?

We examine this from first principles, performing unsupervised discovery of "abilities" that one model has and the other does not.

Results show interesting differences between model classes, sizes and pre-/post-training.

09.06.2025 18:33 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Preview
Demystifying Long Chain-of-Thought Reasoning in LLMs Scaling inference compute enhances reasoning in large language models (LLMs), with long chains-of-thought (CoTs) enabling strategies like backtracking and error correction. Reinforcement learning (RL)...

Nice contribution to the understanding of Long CoT induction arxiv.org/abs/2502.03373 by Edward Yeo and colleagues (advised by @gneubig.bsky.social and @xiangyue96.bsky.social ). Its hard not to see this as mostly a negative result on induction on the 8B scale. πŸ‘‡

08.02.2025 19:29 β€” πŸ‘ 7    πŸ” 2    πŸ’¬ 1    πŸ“Œ 2
Post image

LLM agents can codeβ€”but can they ask clarifying questions? πŸ€–πŸ’¬
Tired of coding agents wasting time and API credits, only to output broken code? What if they asked first instead of guessing? πŸš€

(New work led by Sanidhya Vijay: www.linkedin.com/in/sanidhya-...)

19.02.2025 19:46 β€” πŸ‘ 7    πŸ” 3    πŸ’¬ 1    πŸ“Œ 1
Schedule The weekly event schedule.

We are now done with all classes for CMU CS11-711 Advanced NLP!

Slides: phontron.com/class/anlp-f...
Videos: youtube.com/playlist?lis...

Hope this is useful to people πŸ˜€

27.11.2024 22:26 β€” πŸ‘ 51    πŸ” 6    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

1/ Introducing α΄α΄˜α΄‡Ι΄κœ±α΄„Κœα΄ΚŸα΄€Κ€: a retrieval-augmented LM to help scientists synthesize knowledge πŸ“š
@uwnlp.bsky.social & Ai2
With open models & 45M-paper datastores, it outperforms proprietary systems & match human experts.
Try out our demo!
openscholar.allen.ai

19.11.2024 16:30 β€” πŸ‘ 161    πŸ” 39    πŸ’¬ 6    πŸ“Œ 8
Screenshot of the paper title "What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length"

Screenshot of the paper title "What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length"

πŸ’¬ Have you or a loved one compared LM probabilities to human linguistic acceptability judgments? You may be overcompensating for the effect of frequency and length!
🌟 In our new paper, we rethink how we should be controlling for these factors 🧡:

20.11.2024 18:07 β€” πŸ‘ 84    πŸ” 19    πŸ’¬ 1    πŸ“Œ 4