Transcript of Hard Fork ep 111: Yeah. And I could talk for an hour about transformers and why they are so important.
But I think it's important to say that they were inspired by the alien language in the film Arrival, which had just recently come out.
And a group of researchers at Google, one researcher in particular, who was part of that original team, was inspired by watching Arrival and seeing that the aliens in the movie had this language which represented entire sentences with a single symbol. And they thought, hey, what if we did that inside of a neural network? So rather than processing all of the inputs that you would give to one of these systems one word at a time, you could have this thing called an attention mechanism, which paid attention to all of it simultaneously.
That would allow you to process much more information much faster. And that insight sparked the creation of the transformer, which led to all the stuff we see in Al today.
Did you know that attention across the whole input span was inspired by the time-negating alien language in Arrival? Crazy anecdote from the latest Hard Fork podcast (by @kevinroose.com and @caseynewton.bsky.social). HT nwbrownboi on Threads for the lead.
01.12.2024 14:50 β π 247 π 53 π¬ 19 π 17
https://huggingface.co/openlm-research
All model checkpoints we used for this research are also available here: t.co/IlSmJ8Na1i
26.11.2024 22:37 β π 2 π 0 π¬ 0 π 0
Finally, we present a case study of two real world uses for emergence prediction:
1) cheaply assessing pretraining data quality (left).
2) predicting more complex capabilities, closer to those of future frontier models, using the difficult APPS coding benchmark (right).
26.11.2024 22:37 β π 5 π 0 π¬ 1 π 0
We validate our emergence law using four standard NLP benchmarks where large-scale open-source LLMs already demonstrate emergence, so we can easily check our predictions.
We find that our emergence law can accurately predict the point of emergence up to 4x the FLOPs in advance.
26.11.2024 22:37 β π 3 π 0 π¬ 1 π 0
To operationalize this insight, we finetune LLMs on varying amounts of data and fit a parametric function (i.e., βemergence lawβ) which models how the point of emergence shifts with the amount of data. We can then extrapolate a prediction for emergence in the few-shot setting.
26.11.2024 22:37 β π 3 π 0 π¬ 1 π 0
We then discover a simple insight for this problem:
finetuning LLMs on a given task can shift the point in scaling at which emergence occurs towards less capable LLMs, and the magnitude of this shift is modulated by the amount of finetuning data.
26.11.2024 22:37 β π 3 π 0 π¬ 1 π 0
We first pose the task of emergence prediction:
given access to LLMs that have random few-shot accuracy on a task, can we predict the point in scaling (e.g., pretraining loss) at which performance will jump up beyond random-chance?
26.11.2024 22:37 β π 5 π 0 π¬ 1 π 0
Can we predict emergent capabilities in GPT-N+1π using only GPT-N model checkpoints, which have random performance on the task?
We propose a method for doing exactly this in our paper βPredicting Emergent Capabilities by Finetuningβπ§΅
26.11.2024 22:37 β π 45 π 6 π¬ 3 π 1
PhD student at MIT. Trying to make deep neural networks among the best understood objects in the universe. π»π€π§ π½ππ
ericjmichaud.com
From SLAM to Spatial AI; Professor of Robot Vision, Imperial College London; Director of the Dyson Robotics Lab; Co-Founder of Slamcore. FREng, FRS.
San Diego Dec 2-7, 25 and Mexico City Nov 30-Dec 5, 25. Comments to this account are not monitored. Please send feedback to townhall@neurips.cc.
Bringing the sergey posts until he does it himself.
Robotics. Reinforcement learning. AI.
stealth // Gemini RL+inference @ Google DeepMind // Conversational AI @ Meta // RL Agents @ EA // ML+Information Theory @ MIT+Harvard+Duke // Georgia Tech PhD // Ψ²Ω Ψ²ΩΨ―Ϊ―Ϋ Ψ’Ψ²Ψ§Ψ―Ϋ
π{NYC, SFO, YYZ}
π https://beirami.github.io/
computer vision PhD student at UC Berkeley
Visiting Scientist at Schmidt Sciences. Visiting Researcher at Stanford NLP Group
Interested in AI safety and interpretability
Previously: Anthropic, AI2, Google, Meta, UNC Chapel Hill
NYT tech columnist, Hard Fork co-host, best at 0.8x speed
Searching for the numinous
Australian Canadian, currently living in the US
https://michaelnotebook.com
AI Safety @ xAI | AI robustness, PhD @ UC Berkeley | normanmu.com
We are a research institute investigating the trajectory of AI for the benefit of society.
epoch.ai
PhD student @stanfordnlp.bsky.social. Robotics Intern at the Toyota Research Institute. I like language, robots, and people.
On the academic job market!
NLP / CSS PhD at Berkeley I School. I develop computational methods to study culture as a social language.
The FTβs team of reporters, statisticians, illustrators, cartographers, designers, and developers work with colleagues across our newsrooms, using graphics and data to find, investigate and explain stories.
https://www.ft.com/visual-and-data-journalism