Jason Weston @jasonweston - Bluesky Profile

Jason Weston

@jasonweston.bsky.social

Senior Director, Research Scientist @ Meta FAIR + Visiting Prof @ NYU. Pretrain+SFT: NLP from Scratch (2011). Multilayer attention+position encode+LLM: MemNet (2015). Recent (2024): Self-Rewarding LLMs & more!

548 Followers | 342 Following | 5 Posts | Joined: 21.11.2024 | 1.6606

Latest posts by jasonweston.bsky.social on Bluesky

Our new work on continuous chain of thought.

10.12.2024 16:51 — 👍 4 🔁 0 💬 0 📌 0

Analysis: AD picks high temp for creative & low for fact-seeking prompts, automatically via training.

Our methods AD & Latent Pref Optimization are general & can be applied to train other hyperparams or latent features.

Excited how people could *adapt* this research!
🧵4/4

22.11.2024 13:06 — 👍 2 🔁 0 💬 0 📌 0

We train on a mix of tasks:
GSM8K - requires factuality (low temp)
Stories - requires creativity (high temp)
UltraFeedback - general instruction following, requires mix

Results: Adaptive Decoding outperforms any fixed temperature, automatically choosing via the AD layer.
🧵3/4

22.11.2024 13:06 — 👍 2 🔁 0 💬 2 📌 0

Recipe 👩‍🍳:
Adaptive Decoder (AD) Layer:
- Assigns probability to each hyperparam choice (decoding temp) given hidden state. Given temp, sample a token.

Training (Latent PO):
- Train AD by sampling params+tokens & use reward model on rejected hyperparam preference pairs
🧵2/4

22.11.2024 13:06 — 👍 1 🔁 0 💬 1 📌 0

🚨 Adaptive Decoding via Latent Preference Optimization 🚨
- New layer for Transformer, selects decoding params automatically *per token*
- Learnt via new method Latent Preference Optimization
- Outperforms any fixed temperature decoding, choosing creativity or factuality
arxiv.org/abs/2411.09661
🧵1/4

22.11.2024 13:06 — 👍 43 🔁 6 💬 2 📌 0

@jasonweston is following 20 prominent accounts

Tanishq Mathew Abraham
@iscienceluvr

@kyunghyuncho

a mediocre combination of a mediocre AI scientist, a mediocre physicist, a mediocre chemist, a mediocre manager and a mediocre professor. see more at https://kyunghyuncho.me/

Astrophotography
@multiverse01

It's all about astrophotography.

Ayoub Bagheri
@ayoubbagheri.nl

Associate Professor @ Utrecht University, NLP & Computational Linguistics. ELLIS Member. Utrecht Young Academy Board Member. CUCo Board Member. Natural Language Processing @ NLTP nlp.sites.uu.nl 🇱🇺

Vaidehi Patil
@vaidehipatil

Ph.D. Student at UNC NLP | Prev: Apple, Amazon, Adobe (Intern) vaidehi99.github.io | Undergrad @IITBombay

Prabin Bhandari
@iamprabin

I do research related to LLMs , their interaction with geospatial data and leveraging them for information extraction . PhD in computer Science at George Mason University.

Han Wang
@hwang98

PhD student @unc @unccs @uncnlp; Formerly Intern @AmazonScience @MSFTResearch @NlpWestlake. RT & like ≠ endorsements. Views are my own. He/him hannight.github.io

Nicholas Popovič
@nicpopovic.com

PhD student at KIT in Germany doing research on language models interacting with structured information.

Marek Kubis
@marekkubis

Leader of Conversational Systems Team at the Center for Artificial Intelligence at Adam Mickiewicz University, Poznań. Assistant Professor in the Department of Artificial Intelligence. https://marekkubis.com #AI #NLProc

Maria Antoniak
@mariaa

☀️ Assistant Professor of Computer Science at CU Boulder 👩‍💻 NLP, cultural analytics 🌐 https://maria-antoniak.github.io Previously: Pioneer Centre for AI in Copenhagen, Ai2, Microsoft Research, Twitter, Facebook, Cornell, UW

Emily M. Bender
@emilymbender

Book: https://thecon.ai Web: https://faculty.washington.edu/ebender

Chris Brockett
@chrisbrockett

Data janitor and leftover linguist (retired). Tsundoku expert. Language & Cognition. NLP. Japanese literature. Anti-authoritarian. Pro-science.

Christopher Manning
@chrmanning

Stanford Linguistics and Computer Science. Director, Stanford AI Lab. Founder of @stanfordnlp.bsky.social . #NLP https://nlp.stanford.edu/~manning/

Margaret Mitchell
@mmitchell

Researcher trying to shape AI towards positive outcomes. ML & Ethics +birds. Generally trying to do the right thing. TIME 100 | TED speaker | Senate testimony provider | Navigating public life as a recluse. Former: Google, Microsoft; Current: Hugging Face

David Bamman
@dbamman

Associate Professor, School of Information, UC Berkeley. NLP, computational social science, digital humanities.

David Smith
@dasmiq

Associate professor of computer science at Northeastern University. Natural language processing, digital humanities, OCR, computational bibliography, and computational social sciences. Artificial intelligence is an archival science.

David Jurgens
@davidjurgens

Associate prof at @UMich in SI and CSE working in computational social science and natural language processing. PI of the Blablablab blablablab.si.umich.edu

David Mimno
@dmimno

He teaches information science at Cornell. http://mimno.infosci.cornell.edu

Luca Soldaini 🎀
@soldaini.net

I like tokens! Lead for OLMo data at @ai2.bsky.social (Dolma 🍇) w @kylelo.bsky.social. Open source is fun 🤖☕️🍕🏳️‍🌈 Opinions are sampled from my own stochastic parrot more at https://soldaini.net

Kyle Lo
@kylelo

#nlp #ml #hci research scientist @ai2.bsky.social, Co-lead of Data for OLMo w/ @soldaini.net, statistics @uw, open science, tabletop, seattle, he/him,🧋 kyleclo.com