NEW blog post: Do modern #LLMs capture the conceptual diversity of human populations? #KempnerInstitute researchers find #alignment reduces conceptual diversity of language models. bit.ly/4hNjtiI
10.02.2025 15:19 β π 12 π 3 π¬ 0 π 0NEW blog post: Do modern #LLMs capture the conceptual diversity of human populations? #KempnerInstitute researchers find #alignment reduces conceptual diversity of language models. bit.ly/4hNjtiI
10.02.2025 15:19 β π 12 π 3 π¬ 0 π 0NEW in the #KempnerInstitute blog: learn about ProCyon, a multimodal foundation model to model, generate & predict protein phenotypes. Read it here: bit.ly/4fA8xUk
19.12.2024 19:22 β π 6 π 1 π¬ 0 π 0
Calling college grads interested in intelligence research: the application for the #KempnerInstitute's post-bac program w/ the Harvard Kenneth C. Griffin Graduate School of Arts and Sciences Office for Equity, Diversity, Inclusion & Belonging is now open! Apply by Feb. 1, 2025.
t.co/jdJrzRegL0
NEW in the #KempnerInstitute blog: A method to predict how #LLMs scale w/ compute across different datasets. Read it here:
09.12.2024 20:44 β π 7 π 2 π¬ 0 π 0LLM self-improvement has critical implications in synthetic data, post-training and test-time inference. To understand LLMs' true capability of self-improvement, we perform large-scale experiments with multiple families of LLMs, tasks and mechanisms. Here is what we found: (1/9)
06.12.2024 18:02 β π 12 π 4 π¬ 1 π 1
NEW: we have an exciting opportunity for a tenure-track professor at the #KempnerInstitute and the John A. Paulson School of Engineering and Applied Sciences (SEAS). Read the full description & apply today: academicpositions.harvard.edu/postings/14362
#ML #AI
(5/n) π€ Shoutout to some great collaborators:
@hanlin_zhang, @depen_morwani, @vyasnikhil96, @uuujingfeng, @difanzou, @udayaghai
#AI #ML #ScalingLaws
(4/n) π§ Want theory? We provide rigorous justifications, provide critical hyperparameters, and characterize lr decay to the overtraining regime.
Check out the details here:
π arxiv.org/abs/2410.21676
π Blog: tinyurl.com/ysufbwsr
(3/n) π From our controlled experiments on language models:
πCBS increases as dataset size grows
π€CBS remains weakly dependent on model size
Data size, not model size, drives parallel efficiency for large-scale pre-training.
(2/n) π€ How does CBS scale with model size and data size in pre-training? We find that CBS scales with data size and is largely invariant to model size. Prior beliefs that CBS scales with model size may have stemmed from Chinchillaβs coupled N-D scaling.
22.11.2024 20:19 β π 0 π 0 π¬ 1 π 0(1/n) π‘How can we speed up the serial runtime of long pre-training runs? Enter Critical Batch Size (CBS): the tipping point where the gains of data parallelism balance with diminishing efficiency. Doubling batch size halves the optimization stepsβuntil we hit CBS, beyond which returns diminish.
22.11.2024 20:19 β π 16 π 4 π¬ 2 π 0
How does test loss change as we change the training data? And how does this interact with scaling laws?
We propose a methodology to approach these questions by showing that we can predict the performance across datasets and losses with simple shifted power law fits.