Jason Weston's Avatar

Jason Weston

@jasonweston.bsky.social

Senior Director, Research Scientist @ Meta FAIR + Visiting Prof @ NYU. Pretrain+SFT: NLP from Scratch (2011). Multilayer attention+position encode+LLM: MemNet (2015). Recent (2024): Self-Rewarding LLMs & more!

548 Followers  |  342 Following  |  5 Posts  |  Joined: 21.11.2024  |  1.6606

Latest posts by jasonweston.bsky.social on Bluesky

Our new work on continuous chain of thought.

10.12.2024 16:51 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Analysis: AD picks high temp for creative & low for fact-seeking prompts, automatically via training.

Our methods AD & Latent Pref Optimization are general & can be applied to train other hyperparams or latent features.

Excited how people could *adapt* this research!
🧡4/4

22.11.2024 13:06 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

We train on a mix of tasks:
GSM8K - requires factuality (low temp)
Stories - requires creativity (high temp)
UltraFeedback - general instruction following, requires mix

Results: Adaptive Decoding outperforms any fixed temperature, automatically choosing via the AD layer.
🧡3/4

22.11.2024 13:06 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
Post image

Recipe πŸ‘©β€πŸ³:
Adaptive Decoder (AD) Layer:
- Assigns probability to each hyperparam choice (decoding temp) given hidden state. Given temp, sample a token.

Training (Latent PO):
- Train AD by sampling params+tokens & use reward model on rejected hyperparam preference pairs
🧡2/4

22.11.2024 13:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

🚨 Adaptive Decoding via Latent Preference Optimization 🚨
- New layer for Transformer, selects decoding params automatically *per token*
- Learnt via new method Latent Preference Optimization
- Outperforms any fixed temperature decoding, choosing creativity or factuality
arxiv.org/abs/2411.09661
🧡1/4

22.11.2024 13:06 β€” πŸ‘ 43    πŸ” 6    πŸ’¬ 2    πŸ“Œ 0

@jasonweston is following 20 prominent accounts