Hanlin Zhang's Avatar

Hanlin Zhang

@hlzhang109.bsky.social

CS PhD student @Harvard https://hanlin-zhang.com

21 Followers  |  43 Following  |  11 Posts  |  Joined: 11.12.2024  |  1.4583

Latest posts by hlzhang109.bsky.social on Bluesky

Preview
EvoLM: In Search of Lost Language Model Training Dynamics Modern language model (LM) training has been divided into multiple stages, making it difficult for downstream developers to evaluate the impact of design choices made at each stage. We present EvoLM, ...

Dive in πŸ“‘: arxiv.org/abs/2506.16029

Blog Post πŸ“: zhentingqi.github.io/internal/pro...

Thread 🧡: x.com/_hanlin_zhan...

Work by Zhenting Qi, and the team Fan Nie, Alexandre Alahi, @jameszou.bsky.social, Himabindu Lakkaraju, Yilun Du, Eric Xing, @shamkakade.bsky.social

02.07.2025 20:05 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
EvoLM: In Search of Lost Language Model Training Dynamics Modern language model (LM) training has been divided into multiple stages, making it difficult for downstream developers to evaluate the impact of design choices made at each stage. We present EvoLM, ...

βœ… Open-source everything β€” models, data, training, and evaluation pipeline

βœ… Maintain the EvoLM model family with clear data provenance

βœ… Support the community in extending this foundation for future LLM research

02.07.2025 20:05 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
EvoLM: In Search of Lost Language Model Training Dynamics Modern language model (LM) training has been divided into multiple stages, making it difficult for downstream developers to evaluate the impact of design choices made at each stage. We present EvoLM, ...

We seek to:

βœ… Build a fully transparent and reproducible model suite for studying LM training

βœ… Quantify how each training phase contributes to upstream cloze task performance and downstream generative task performance, considering both in-domain and out-of-domain settings

02.07.2025 20:05 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
EvoLM: In Search of Lost Language Model Training Dynamics Modern language model (LM) training has been divided into multiple stages, making it difficult for downstream developers to evaluate the impact of design choices made at each stage. We present EvoLM, ...

Introducing EvoLM, a model suite with 100+ decoder-only LMs (1B/4B) trained from scratch, across four training stages β€”

🟦 Pre-training
🟩 Continued Pre-Training (CPT)
🟨 Supervised Fine-Tuning (SFT)
πŸŸ₯ Reinforcement Learning (RL)

02.07.2025 20:05 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

New work [JSKZ25] w/ Jikai, Vasilis,
@shamkakade.bsky.social .

We introduce new formulations and tools for evaluating LM capabilities, which help explain observations of post-training behaviors of Qwen-series models.

More details:

- hanlin-zhang.com/causal-capab...
- x.com/_hanlin_zhan...

18.06.2025 18:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems Retrieval-Augmented Generation (RAG) improves pre-trained models by incorporating external knowledge at test time to enable customized adaptation. We study the risk of datastore leakage in Retrieval-I...

[4/4] Prompt injection can extract private datastore contentβ€”verbatimβ€”from RAG:

– Black-box attack can leak 41% of a book with just 100 queries
– Vulnerability grows with model size and instruction tuning
– Mitigation: eliminate position bias (via PINE)+system prompts

(arxiv.org/abs/2402.17840)

23.04.2025 01:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Eliminating Position Bias of Language Models: A Mechanistic Approach Position bias has proven to be a prevalent issue of modern language models (LMs), where the models prioritize content based on its position within the given context. This bias often leads to unexpecte...

[3/4] LMs can suffer from position biasβ€”they favor content based on where it appears. This can hurt reasoning and evaluation.
We introduce PINE, a training-free method that eliminates position bias via bidirectional attention+reordering docs by attention scores.
(arxiv.org/abs/2407.01100)

23.04.2025 01:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models Self-improvement is a mechanism in Large Language Model (LLM) pre-training, post-training and test-time inference. We explore a framework where the model verifies its own outputs, filters or reweights...

[2/4] Can LLMs self-improve by verifying their own outputs? This paper says yesβ€”with a twist. The key lies in a measure: the Generation-Verification Gap (GV-Gap) that scales with pretraining FLOPs in a log-linear trend.
Oral @yus167.bsky.social 6A: Sat 26 Apr 4:18-4:30.
(arxiv.org/abs/2412.02674)

23.04.2025 01:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Hanlin Zhang on X: "Critical batch size is crucial for reducing the wall-clock time of large-scale training runs with data parallelism. We find that it depends primarily on data size. 🧡 [1/n] Paper πŸ“‘: https://t.co/LFAPtzRkD9 Blog πŸ“: https://t.co/tGhR6HDgnE" / X Critical batch size is crucial for reducing the wall-clock time of large-scale training runs with data parallelism. We find that it depends primarily on data size. 🧡 [1/n] Paper πŸ“‘: https://t.co/LFAPtzRkD9 Blog πŸ“: https://t.co/tGhR6HDgnE

[1/4]
This work:
- Shows that CBS scales with data size, not model size
- Provides theory + empirical scaling laws
- Suggests more data β†’ higher CBS β†’ more efficient data-parallel
Learn more: x.com/_hanlin_zhan...
Poster at Hall 3 #376, Thu 24 Apr 10-12:30.

23.04.2025 01:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
How the von Neumann bottleneck is impeding AI computing The von Neumann architecture, which separates compute and memory, is perfect for conventional computing. But it creates a data traffic jam for AI.

[1/4] Modern large-scale LM training is limited not just by compute, but by data movementβ€”a classic Von Neumann bottleneck (research.ibm.com/blog/why-von...).

Scaling batch size reduces optimization steps, but only up to a pointβ€”the Critical Batch Size (CBS).

23.04.2025 01:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Highlights from #ICLR2025 β€” a brief thread 🧡

23.04.2025 01:35 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

I want to reshare @brandfonbrener.bsky.social's @NeurIPSConf 2024 paper on CoLoR-Filter: A simple yet powerful method for selecting high-quality data for language model pre-training!

With @hlzhang109.bsky.social @schwarzjn.bsky.social @shamkakade.bsky.social

05.04.2025 12:04 β€” πŸ‘ 17    πŸ” 8    πŸ’¬ 2    πŸ“Œ 1
Post image

(1/n) πŸ’‘How can we speed up the serial runtime of long pre-training runs? Enter Critical Batch Size (CBS): the tipping point where the gains of data parallelism balance with diminishing efficiency. Doubling batch size halves the optimization stepsβ€”until we hit CBS, beyond which returns diminish.

22.11.2024 20:19 β€” πŸ‘ 16    πŸ” 4    πŸ’¬ 2    πŸ“Œ 0
Post image

LLM self-improvement has critical implications in synthetic data, post-training and test-time inference. To understand LLMs' true capability of self-improvement, we perform large-scale experiments with multiple families of LLMs, tasks and mechanisms. Here is what we found: (1/9)

06.12.2024 18:02 β€” πŸ‘ 12    πŸ” 4    πŸ’¬ 1    πŸ“Œ 1

@hlzhang109 is following 19 prominent accounts