Yining Lu's Avatar

Yining Lu

@yininglu.bsky.social

Second year CS PhD student @notredame.bsky.social | Intern: Amazon | Prev: @jhuclsp.bsky.social https://yining610.github.io/

32 Followers  |  187 Following  |  14 Posts  |  Joined: 20.11.2024  |  1.9822

Latest posts by yininglu.bsky.social on Bluesky

Work done during an internship at @amazon. Huge thanks to my mentor, @zlwang_cs, and advisor, @Meng_CS, for their support in making this work possible, and to collaborators @ShiyangLi5, Xin Liu, Changlong Yu, @YinQingyu, Zhan Shi, and @zhangzxUIUC for their valuable feedback!

16.09.2025 18:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

8/8 [Convergence rate]
The gradient-based method consistently has a higher convergence rate, reducing the required steps by 6.1 on average across RL algorithms.

16.09.2025 18:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

7/8 [Generalizability]
We further extend experiments to different math datasets and model families. Our two methods yield superior Pareto fronts compared to the baseline, with the gradient-based weighting showing the best overall performance.

16.09.2025 18:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

6/8 [Gradient-based weight optimization]
Our method generates superior Pareto fronts that dominate all baseline approaches under both GRPO and REINFORCE training.

16.09.2025 18:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

5/8 [Hypervolume-guided weight adaptation]
Across all three online RL algorithms, there is consistently at least one weight configuration our method outperforms the baselines on all objectives.

16.09.2025 18:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Dynamic reward weights show objectives learn differently. For example, accuracy is a more challenging objective that requires continual learning, while conciseness quickly converges to 0.2.

4/8

16.09.2025 18:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

3/8 [Preliminary finding]
Different objectives vary in learning difficulty. Each objective reaches saturation at different training stages.

16.09.2025 18:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Question: How to redirect learning effort towards objectives with the greatest potential for improvement.

Answer:
- If the user preference for objectives is given, use our hypervolume-based method
- If the user preference is unknown, use our gradient-based method.
2/8

16.09.2025 18:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

✴️ Pleased to introduce our new paper yining610.github.io/dynamic-rew...

- Rebalance multiobjectives during training through dynamic reward weighting
- Build Pareto-dominant front over static baselines across online RL algorithms, datasets, and model families
- Faster convergence rate

1/8

16.09.2025 18:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
ACL2025: Optimizing Decomposition for Optimal Claim Verification
ACL2025: Optimizing Decomposition for Optimal Claim Verification

This is our teaser video πŸ˜€
youtu.be/TgloG4Oefeg

25.07.2025 22:11 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Can't make it to #ACL2025 this year, but for people interested in RL for factuality and textual decomposition, please check out our paper!

TL;DR: We found a mismatch between the decomposition policy and LLM verifier, and propose a dynamic training paradigm to bridge the gap.

25.07.2025 22:11 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Optimizing Decomposition for Optimal Claim Verification Current research on the \textit{Decompose-Then-Verify} paradigm for evaluating the factuality of long-form text typically treats decomposition and verification in isolation, overlooking their interact...

Pleased to share that two papers were accepted to #ACL2025 main! Huge congratulations to all collaborators for the hard work and time we put in together!

1. Dynamic Decomposition: arxiv.org/abs/2503.15354
2. RATIONALYST: arxiv.org/abs/2410.01044

Both works study the mulit-model collobration!

16.05.2025 05:29 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Quick reminder that our paper, Benchmarking Language Model Creativity: A Case Study on Code Generation, will be presented today!

πŸ“… 11AM-12:30PM, Fri, May 2
πŸ“ Hall 3
πŸ“ arxiv.org/abs/2407.09007
πŸŽ₯ www.youtube.com/watch?v=v1c...

02.05.2025 13:11 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Highlighting our #NAACL2025 papers 🧡🧡🧡

28.04.2025 12:30 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

I will be at #NAACL2025 to present our LLM creativity benchmark. Drop by if interested (Poster Session 8, Fri, May 2)!

I'd love to chat about RL and its interpretability, data influence for post-training, CogSci for LLM. Feel free to reach out and let's have some coffee together β˜• !

28.04.2025 19:53 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Benchmarking Language Model Creativity: A Case Study on Code Generation --- NAACL 2025 (Yining Lu)
Yining Lu: https://yining610.github.io/ Based on the following paper: https://arxiv.org/abs/2407.09007 As LLMs become increasingly prevalent, it is interesti... Benchmarking Language Model Creativity: A Case Study on Code Generation --- NAACL 2025 (Yining Lu)

A video teaser of @Yining__Lu 's paper:
www.youtube.com/watch?v=v1c...

28.04.2025 12:30 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Midwest Speech and Language Days 2025

Midwest Speech and Language Days will be held Apr 15-16 at
@NotreDame! Abstract submissions are due Mar 20, and registration deadline is Mar 27. Financial assistance for students (lodging, poster printing) is available. nlp.nd.edu/msld25

08.03.2025 18:35 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

A starter pack for #NLP #NLProc researchers! πŸŽ‰

go.bsky.app/SngwGeS

04.11.2024 10:01 β€” πŸ‘ 253    πŸ” 100    πŸ’¬ 45    πŸ“Œ 13

@yininglu is following 20 prominent accounts