Yining Lu @yininglu - Bluesky Profile

Work done during an internship at @amazon. Huge thanks to my mentor, @zlwang_cs, and advisor, @Meng_CS, for their support in making this work possible, and to collaborators @ShiyangLi5, Xin Liu, Changlong Yu, @YinQingyu, Zhan Shi, and @zhangzxUIUC for their valuable feedback!

16.09.2025 18:15 — 👍 0 🔁 0 💬 0 📌 0

8/8 [Convergence rate]
The gradient-based method consistently has a higher convergence rate, reducing the required steps by 6.1 on average across RL algorithms.

16.09.2025 18:15 — 👍 0 🔁 0 💬 1 📌 0

7/8 [Generalizability]
We further extend experiments to different math datasets and model families. Our two methods yield superior Pareto fronts compared to the baseline, with the gradient-based weighting showing the best overall performance.

16.09.2025 18:15 — 👍 0 🔁 0 💬 1 📌 0

6/8 [Gradient-based weight optimization]
Our method generates superior Pareto fronts that dominate all baseline approaches under both GRPO and REINFORCE training.

16.09.2025 18:15 — 👍 0 🔁 0 💬 1 📌 0

5/8 [Hypervolume-guided weight adaptation]
Across all three online RL algorithms, there is consistently at least one weight configuration our method outperforms the baselines on all objectives.

16.09.2025 18:15 — 👍 0 🔁 0 💬 1 📌 0

Dynamic reward weights show objectives learn differently. For example, accuracy is a more challenging objective that requires continual learning, while conciseness quickly converges to 0.2.

4/8

16.09.2025 18:15 — 👍 0 🔁 0 💬 1 📌 0

3/8 [Preliminary finding]
Different objectives vary in learning difficulty. Each objective reaches saturation at different training stages.

16.09.2025 18:15 — 👍 0 🔁 0 💬 1 📌 0

Question: How to redirect learning effort towards objectives with the greatest potential for improvement.

Answer:
- If the user preference for objectives is given, use our hypervolume-based method
- If the user preference is unknown, use our gradient-based method.
2/8

16.09.2025 18:15 — 👍 0 🔁 0 💬 1 📌 0

✴️ Pleased to introduce our new paper yining610.github.io/dynamic-rew...

- Rebalance multiobjectives during training through dynamic reward weighting
- Build Pareto-dominant front over static baselines across online RL algorithms, datasets, and model families
- Faster convergence rate

1/8

16.09.2025 18:15 — 👍 0 🔁 0 💬 1 📌 0

ACL2025: Optimizing Decomposition for Optimal Claim Verification

This is our teaser video 😀
youtu.be/TgloG4Oefeg

25.07.2025 22:11 — 👍 0 🔁 0 💬 0 📌 0

Can't make it to #ACL2025 this year, but for people interested in RL for factuality and textual decomposition, please check out our paper!

TL;DR: We found a mismatch between the decomposition policy and LLM verifier, and propose a dynamic training paradigm to bridge the gap.

25.07.2025 22:11 — 👍 1 🔁 0 💬 1 📌 0

Optimizing Decomposition for Optimal Claim Verification Current research on the \textit{Decompose-Then-Verify} paradigm for evaluating the factuality of long-form text typically treats decomposition and verification in isolation, overlooking their interact...

Pleased to share that two papers were accepted to #ACL2025 main! Huge congratulations to all collaborators for the hard work and time we put in together!

1. Dynamic Decomposition: arxiv.org/abs/2503.15354
2. RATIONALYST: arxiv.org/abs/2410.01044

Both works study the mulit-model collobration!

16.05.2025 05:29 — 👍 0 🔁 0 💬 0 📌 0

Quick reminder that our paper, Benchmarking Language Model Creativity: A Case Study on Code Generation, will be presented today!

📅 11AM-12:30PM, Fri, May 2
📍 Hall 3
📝 arxiv.org/abs/2407.09007
🎥 www.youtube.com/watch?v=v1c...

02.05.2025 13:11 — 👍 0 🔁 0 💬 0 📌 0

Highlighting our #NAACL2025 papers 🧵🧵🧵

28.04.2025 12:30 — 👍 1 🔁 1 💬 1 📌 0

I will be at #NAACL2025 to present our LLM creativity benchmark. Drop by if interested (Poster Session 8, Fri, May 2)!

I'd love to chat about RL and its interpretability, data influence for post-training, CogSci for LLM. Feel free to reach out and let's have some coffee together ☕ !

28.04.2025 19:53 — 👍 2 🔁 1 💬 0 📌 0

Benchmarking Language Model Creativity: A Case Study on Code Generation --- NAACL 2025 (Yining Lu)

Yining Lu: https://yining610.github.io/ Based on the following paper: https://arxiv.org/abs/2407.09007 As LLMs become increasingly prevalent, it is interesti... Benchmarking Language Model Creativity: A Case Study on Code Generation --- NAACL 2025 (Yining Lu)

A video teaser of @Yining__Lu 's paper:
www.youtube.com/watch?v=v1c...

28.04.2025 12:30 — 👍 1 🔁 1 💬 1 📌 0

Midwest Speech and Language Days 2025

Midwest Speech and Language Days will be held Apr 15-16 at
@NotreDame! Abstract submissions are due Mar 20, and registration deadline is Mar 27. Financial assistance for students (lodging, poster printing) is available. nlp.nd.edu/msld25

08.03.2025 18:35 — 👍 0 🔁 2 💬 1 📌 0

A starter pack for #NLP #NLProc researchers! 🎉

go.bsky.app/SngwGeS

04.11.2024 10:01 — 👍 253 🔁 100 💬 45 📌 13

Yining Lu

Latest posts by yininglu.bsky.social on Bluesky

@yininglu is following 20 prominent accounts