π one-line command for easy deployment: github.com/yining610/Re...
12.11.2025 05:10 β π 0 π 0 π¬ 0 π 0@yininglu.bsky.social
Second year CS PhD student @notredame.bsky.social | Intern: Amazon | Prev: @jhuclsp.bsky.social https://yining610.github.io/
π one-line command for easy deployment: github.com/yining610/Re...
12.11.2025 05:10 β π 0 π 0 π¬ 0 π 0π£ My first system paper π£ We built a decentralized RAG system that solves data reliability challenges in real-world settings. The sources provided by each data owner will be securely managed and scored on the blockchain.
π Paper link: arxiv.org/abs/2511.07577
Work done during an internship at @amazon. Huge thanks to my mentor, @zlwang_cs, and advisor, @Meng_CS, for their support in making this work possible, and to collaborators @ShiyangLi5, Xin Liu, Changlong Yu, @YinQingyu, Zhan Shi, and @zhangzxUIUC for their valuable feedback!
16.09.2025 18:15 β π 0 π 0 π¬ 0 π 08/8 [Convergence rate]
The gradient-based method consistently has a higher convergence rate, reducing the required steps by 6.1 on average across RL algorithms.
7/8 [Generalizability]
We further extend experiments to different math datasets and model families. Our two methods yield superior Pareto fronts compared to the baseline, with the gradient-based weighting showing the best overall performance.
6/8 [Gradient-based weight optimization]
Our method generates superior Pareto fronts that dominate all baseline approaches under both GRPO and REINFORCE training.
5/8 [Hypervolume-guided weight adaptation]
Across all three online RL algorithms, there is consistently at least one weight configuration our method outperforms the baselines on all objectives.
Dynamic reward weights show objectives learn differently. For example, accuracy is a more challenging objective that requires continual learning, while conciseness quickly converges to 0.2.
4/8
3/8 [Preliminary finding]
Different objectives vary in learning difficulty. Each objective reaches saturation at different training stages.
Question: How to redirect learning effort towards objectives with the greatest potential for improvement.
Answer:
- If the user preference for objectives is given, use our hypervolume-based method
- If the user preference is unknown, use our gradient-based method.
2/8
β΄οΈ Pleased to introduce our new paper yining610.github.io/dynamic-rew...
- Rebalance multiobjectives during training through dynamic reward weighting
- Build Pareto-dominant front over static baselines across online RL algorithms, datasets, and model families
- Faster convergence rate
1/8
This is our teaser video π
youtu.be/TgloG4Oefeg
Can't make it to #ACL2025 this year, but for people interested in RL for factuality and textual decomposition, please check out our paper!
TL;DR: We found a mismatch between the decomposition policy and LLM verifier, and propose a dynamic training paradigm to bridge the gap.
Pleased to share that two papers were accepted to #ACL2025 main! Huge congratulations to all collaborators for the hard work and time we put in together!
1. Dynamic Decomposition: arxiv.org/abs/2503.15354
2. RATIONALYST: arxiv.org/abs/2410.01044
Both works study the mulit-model collobration!
Quick reminder that our paper, Benchmarking Language Model Creativity: A Case Study on Code Generation, will be presented today!
π
11AM-12:30PM, Fri, May 2
π Hall 3
π arxiv.org/abs/2407.09007
π₯ www.youtube.com/watch?v=v1c...
Highlighting our #NAACL2025 papers π§΅π§΅π§΅
28.04.2025 12:30 β π 1 π 1 π¬ 1 π 0I will be at #NAACL2025 to present our LLM creativity benchmark. Drop by if interested (Poster Session 8, Fri, May 2)!
I'd love to chat about RL and its interpretability, data influence for post-training, CogSci for LLM. Feel free to reach out and let's have some coffee together β !
A video teaser of @Yining__Lu 's paper:
www.youtube.com/watch?v=v1c...
Midwest Speech and Language Days will be held Apr 15-16 at
@NotreDame! Abstract submissions are due Mar 20, and registration deadline is Mar 27. Financial assistance for students (lodging, poster printing) is available. nlp.nd.edu/msld25
A starter pack for #NLP #NLProc researchers! π
go.bsky.app/SngwGeS