Shangshang Wang's Avatar

Shangshang Wang

@shangshang-wang.bsky.social

https://shangshang-wang.github.io/ Phd student in CS + AI @usc.edu. CS undergrad, master at ShanghaiTech. LLM reasoning, RL, AI4Science.

127 Followers  |  14 Following  |  26 Posts  |  Joined: 13.11.2024  |  2.2087

Latest posts by shangshang-wang.bsky.social on Bluesky

Curious about the details for these efficiency claims?

We open-source everything for full reproducibility:

Paper: arxiv.org/abs/2506.09967
Blog: shangshangwang.notion.site/resa
Code: github.com/shangshang-w...
Model: huggingface.co/Resa-Yi
Training Logs: wandb.ai/upup-ashton-...

12.06.2025 17:02 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

SAE-Tuning trains models that match RL-trained counterparts’ performance while reducing costs by >2000x and time by >450x.

The trained model is transparent, revealing where reasoning abilities hide, also generalizable and modular, enabling transfer across datasets and models.

12.06.2025 17:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Such efficiency stems from our novel SAE-Tuning method, which expands the use of SAEs beyond test-time steering.

In SAE-Tuning, the SAE first β€œextracts” latent reasoning features and then guides a standard supervised fine-tuning process to β€œelicit” reasoning abilities.

12.06.2025 17:02 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Sparse autoencoders (SAEs) can be used to elicit strong reasoning abilities with remarkable efficiency.

Using only 1 hour of training at $2 cost without any reasoning traces, we find a way to train 1.5B models via SAEs to score 43.33% Pass@1 on AIME24 and 90% Pass@1 on AMC23.

12.06.2025 17:02 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Check out more about Tina following the links down below.

Paper: arxiv.org/abs/2504.15777
Notion Blog: shangshangwang.notion.site/tina
Code: github.com/shangshang-w...
Model: huggingface.co/Tina-Yi
Training Logs: wandb.ai/upup-ashton-...

Tina's avatar is generated by GPT-4o based onΒ KYNE's girls.

23.04.2025 17:10 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We also want to express our gratitude to the broader open-source community. This research was made possible by leveraging numerous publicly available resources from DeepScaleR, STILL, OpenThoughts @bespokelabs.bsky.social , OpenR1 @hf.co , LIMR, and OpenRS.

23.04.2025 17:10 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

This is an amazing collaboration with Julian, Omer, Enes, and Oliver @oliu-io.bsky.social in the course taught by Willie @willieneis.bsky.social (both the teacher and the advisor) Thanks everyone!

23.04.2025 17:10 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

[9/9] πŸš€ We thus hypothesize that LoRA’s effectiveness and efficiency stem from rapidly adapting the reasoning format under RL while preserving base model knowledge, a likely more compute-efficient process than the deep knowledge integration of full-parameter training.

23.04.2025 17:10 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

[8/9] πŸ’‘ Observation 2) We consistently observe a training phase transition in the format-related metrics (format reward, completion length) but NOT accuracy-related metrics across most Tina models. And the best-performance checkpoint is always found around this transition point.

23.04.2025 17:10 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

[7/9] πŸ’‘ Observation 1) We observe that in Tina models, increased training compute inversely affects performance, in contrast to full-parameter models. This observation highlights a β€œless compute can yield more performance” phenomenon.

23.04.2025 17:10 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸ€” But why? Where does this effectiveness and efficiency come from?

πŸ’‘ We further provide insights based on our observations during post-training Tina models.

23.04.2025 17:10 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

[6/9] πŸ˜‹ And, it costs only $9 to reproduce the best Tina checkpoint, $526 to reproduce all our experiments from scratch!

23.04.2025 17:10 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

[5/9] 🀩 We validate this across multiple open-source reasoning datasets and various ablation settings with a single, fixed set of hyperparameters, confirming the effectiveness and efficiency of LoRA-based RL.

23.04.2025 17:10 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

[4/9] 😍 With minimal post-training compute, the best Tina checkpoint achieves >20% performance increase over the base model and 43% Pass@1 on AIME24.

23.04.2025 17:10 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

[3/9] πŸš€ Our Tina models compete with, and sometimes surpass, SOTA models built on the same base model with a surprising high cost efficiency.

23.04.2025 17:10 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

[2/9] πŸ‘© We release the Tina family of models, created by post-training theΒ DeepSeek-R1-Distill-Qwen-1.5BΒ base model using low-rank adaptation (LoRA) during reinforcement learning (RL), on open-source reasoning datasets.

23.04.2025 17:10 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸ˜ƒ Want strong LLM reasoning without breaking the bank? We explored just how cost-effectively RL can enhance reasoning using LoRA!

[1/9] Introducing Tina: A family of tiny reasoning models with strong performance at low cost, providing an accessible testbed for RL reasoning. 🧡

23.04.2025 17:10 β€” πŸ‘ 7    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Preview
LLM Reasoning: Curated Insights Reasoning ability is gained via post-training and is scaled via test-time compute.

Click on the link/card below to see the full set of spreadsheets:
shangshangwang.notion.site/llm-reasoning

19.02.2025 18:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Feedback on "LLM Reasoning: Curated Insights"

6/6 This is an ongoing collection of LLM reasoning. Please feel free to send new materials or any feedback here or via the google form.
docs.google.com/forms/d/e/1F...

19.02.2025 18:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

5/6 Others Artifacts

We then collect survey, evaluation, benchmark and application papers and also online resources like blogs, posts, videos, code, and data.

19.02.2025 18:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

4/6 Verification, The Key to Reasoning

Verifiers serve as a key component in both post-training (e.g., as reward models) and test-time compute (e.g., as signals to guide search). Our fourth section collects thoughts on various verification.

19.02.2025 18:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image Post image Post image

3/6 Test-Time Compute: Scaling Reasoning Ability

Test-time compute is an emerging field where folks are trying different methods (e.g., search) and using extra components (e.g., verifiers). Our third section classifies them based on the optimization targets for LLMs.

19.02.2025 18:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

2/6 Post-Training: Gain Reasoning Ability

Our second section collects thoughts on post-training methods for LLM reasoning including the hot RL-based and also SFT-based methods.

19.02.2025 18:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image Post image

1/6 OpenAI, DeepSeek and More

The discussion on reasoning ability started to go viral with the release of OpenAI o-series and DeepSeek R1 models. Our first section collects thoughts on OpenAI o-series and DeepSeek R1 models and other SOTA reasoning models.

19.02.2025 18:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
LLM Reasoning: Curated Insights Reasoning ability is gained via post-training and is scaled via test-time compute.

Click on the link/card below to see the full set of spreadsheets, and check out the thread below for an overview of each section:
shangshangwang.notion.site/llm-reasoning

19.02.2025 18:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸ” Diving deep into LLM reasoning?

From OpenAI's o-series to DeepSeek R1, from post-training to test-time compute β€” we break it down into structured spreadsheets. 🧡

19.02.2025 18:01 β€” πŸ‘ 4    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Post image

Introducing METAGENE-1🧬, an open-source 7B-parameter metagenomics foundation model pretrained on 1.5 trillion base pairs. Built for pandemic monitoring, pathogen detection, and biosurveillance, with SOTA results across many genomics tasks.
🧡1/

06.01.2025 17:04 β€” πŸ‘ 27    πŸ” 6    πŸ’¬ 2    πŸ“Œ 0

@shangshang-wang is following 14 prominent accounts