Rushiv has been doing this neat work trying to understand how to bring language into multi-task learning.
09.10.2025 13:34 β π 6 π 2 π¬ 0 π 0@rushivarora.bsky.social
ML Research Scientist at Dell AI by day, RL Researcher at night https://rushivarora.github.io
Rushiv has been doing this neat work trying to understand how to bring language into multi-task learning.
09.10.2025 13:34 β π 6 π 2 π¬ 0 π 0Special thanks to @eugenevinitsky.bsky.social for being an amazing mentor!
08.10.2025 15:19 β π 0 π 0 π¬ 0 π 02. LEXPOLβs performance benchmarks against previous methods on MetaWorld
3. A combination of LEXPOL with the previous natural language based state embedding algorithm, giving a joint method combing state and action factorization
Paper Highlights:
1. Qualitative analysis of LEXPOL (end-to-end learning) and frozen pre-trained single-task policies. We note that LEXPOL successfully disentangles the tasks into fundamental skills, and learns to combine them without a decomposition to primitive actions.
LEXPOL is inspired by our ability to combine multiple different sub-skills together to solve larger tasks based on context. It works by factorizing the complexity of multi-task reinforcement learning into smaller learnable pieces.
08.10.2025 15:19 β π 2 π 0 π¬ 1 π 0This helps since multi-task RL is hard because a single monolithic policy must entangle many skills. LEXPOL factorizes control into smaller learnable pieces and uses language as the router for composition.
08.10.2025 15:19 β π 1 π 0 π¬ 1 π 0The idea: give the agent a natural-language task description (βpush the green buttonβ) and let a learned language gate blend or select among several sub-policies (skills) based on context. One shared state; multiple policies; a gating MLP guided by language embeddings chooses the action.
08.10.2025 15:19 β π 2 π 0 π¬ 1 π 0Iβm very excited to share my new paper introducing LEXical POLicy Networks (LEXPOL) for Multi-Task Reinforcement Learning!β¨β¨
In LEXPOL, language acts as a gate that routes among reusable sub-policies (skills) to solve diverse tasks.
Paper: arxiv.org/abs/2510.06138
Excited to share that an abstract of this paper was accepted at RLDM!
Iβm genuinely excited to go for the conferenceβit is one of my favorite venues for interdisciplinary discussions!
This is a cause that is very close to my heart, and I am glad to have found the NOCC. They do a lot of important work for patients and their caregivers.
12.02.2025 03:21 β π 0 π 0 π¬ 0 π 0My grandmother passed away from ovarian cancer in 1995. I never got the chance to meet her, but I have always felt extremely connected with her through the countless stories and family heirlooms my parents have shared with me.
12.02.2025 03:21 β π 0 π 0 π¬ 1 π 0I am thrilled to announce that I am running the 2025 New York City Marathon for the National Ovarian Cancel Coalition.
I would be grateful if you would consider donating or sharing the message to help me fundraise for this very important cause! Every share counts!
p2p.onecause.com/nycmarathon2...
This paper is personally meaningful to me since it is my first solo-authored paper! (And 7th overall!).β¨β¨I got the idea in 2023 in the final months of my masterβs but didnβt have the chance to work on it until last year. I am thrilled to finally publish it!
13.01.2025 16:04 β π 0 π 0 π¬ 0 π 0Read the paper to explore how H-UVFAs advance scalable and reusable skills in RL! #ReinforcementLearning #MachineLearning #AI
13.01.2025 16:04 β π 0 π 0 π¬ 1 π 0- Outperforming UVFAs: In Hierarchical settings, H-UVFAs have superior performance and generalization than UVFAs. In fact, UVFAs failed to learn in some settings.β¨
- Learning in both supervised and reinforcement learning contexts.
Core Contributions:
- Hierarchical Embeddings: We show that it is possible to break down hierarchical value functions into its core elements by leveraging higher-order decomposition methods in Mathematics like Tucker Decompositions.β¨
- Zero-shot generalization: H-UVFAs can extrapolate to new goals!
We extend Universal Value Function Approximators (UVFAs) to hierarchical RL, enabling zero-shot generalization across new goals in multi-task settings while retaining the benefits of temporal abstraction.
13.01.2025 16:04 β π 0 π 0 π¬ 1 π 0First post on here and itβs an exciting one! I am happy to share my new paper βHierarchical Universal Value Function Approximatorsβ (H-UVFAs): arxiv.org/abs/2410.08997
13.01.2025 16:04 β π 5 π 1 π¬ 1 π 1