PQN, a recently introduced value-based method (bsky.app/profile/matt...) has a similar data-collection as PPO. Although we see a similar trend as with PPO, but much less pronounced. It is possible our findings are more correlated with policy-based methods.
9/
05.06.2025 14:27 β π 2 π 1 π¬ 1 π 0
2/2 π Our new paper below tackles two major issues of high online sample complexity and lack of online performance guarantees in offline RL, obtaining accurate regret estimation and achieving competitive performance with the best online hyperparameter tuning methods, both
using only offline data! π
30.05.2025 08:39 β π 4 π 0 π¬ 0 π 1
1/2 Offline RL has always bothered me. It promises that by exploiting offline data, an agent can learn to behave near-optimally once deployed. In real life, it breaks this promise, requiring large amount of online samples for tuning and has no guarantees of behaving safely to achieve desired goals.
30.05.2025 08:39 β π 6 π 3 π¬ 1 π 1
2/2 π Our new paper below tackles two major issues of high online sample complexity and lack of online performance guarantees in offline RL, obtaining accurate regret estimation and achieving competitive performance with the best online hyperparameter tuning methods, both
using only offline data! π
30.05.2025 08:37 β π 0 π 0 π¬ 0 π 0
TeXstudio - A LaTeX editor
www.texstudio.org
14.05.2025 09:34 β π 1 π 0 π¬ 2 π 0
If you're struggling with the bs Overleaf outage, you can try going to: www.overleaf.com/project/[PROJECTID]/download/zip. to download the zip. It seems to sometimes work after a few minutes
14.05.2025 09:03 β π 4 π 2 π¬ 1 π 0
Excited to be presenting our spotlight ICLR paper Simplifying Deep Temporal Difference Learning today! Join us in Hall 3 + Hall 2B Poster #123 from 3pm :)
25.04.2025 22:56 β π 7 π 1 π¬ 0 π 0
The techniques used by our work and Bhandari are a standard technique in the analysis of stochastic approximation algorithms and have been around for a long time. Moreover the point of a blog was an expositional tool that acts as a complete analysis of TD. But sure, I'll add even more references...
21.03.2025 10:19 β π 0 π 0 π¬ 0 π 0
In our paper we quite clearly state at several points including ` convergence of TD methods has been studied extensively (Watkins
& Dayan, 1992; Tsitsiklis & Van Roy, 1997; Dalal et al., 2017; Bhandari et al., 2018; Srikant &
Ying, 2019)' ` our proof is similar to Bhandari et al. (2018).'
21.03.2025 10:12 β π 0 π 0 π¬ 0 π 0
Crucially, techniques that study linear function approximation could not be used to understand things like LayerNorm
21.03.2025 09:13 β π 1 π 0 π¬ 1 π 0
As far as I'm aware, and please correct me if I'm wrong, I've never seen the derivation of the path mean Jacobian, which really is a key contribution of our analysis as it allows us to study nonlinear systems (i.e. ACTUAL neural nets used in practice) that many papers like Bhandari etc. can't.
21.03.2025 09:11 β π 1 π 0 π¬ 1 π 0
we cite said papers several times in our work and the blogs...
21.03.2025 09:06 β π 0 π 0 π¬ 1 π 0
PQN puts Q-learning back on the map and now comes with a blog post + Colab demo! Also, congrats to the team for the spotlight at #ICLR2025
20.03.2025 11:51 β π 16 π 4 π¬ 0 π 0
Simplifying Deep Temporal Difference Learning
A modern implementation of Deep Q-Network without target networks and replay buffers.
PQN blog 3/3 πtake a look at Matteo's 5-minute blog covering PQNβs key features, plus a Colab demo with JAX & PyTorch implementations mttga.github.io/posts/pqn/
π For a deeper dive into the theory:
blog.foersterlab.com/fixing-td-pa...
blog.foersterlab.com/fixing-td-pa...
See you in Singapore! πΈπ¬
20.03.2025 10:28 β π 9 π 1 π¬ 0 π 1
There are so many great places in the world, if anything it would be a positive to regularly see more conferences in countries other than US/Austria/Canada
20.03.2025 09:47 β π 3 π 0 π¬ 0 π 0
Fixing TD Pt II: Overcoming the Deadly Triad
PQN Blog 2/3: In this blog we show how to overcome `deadly triad' and stabilise TD using regularisation techniques such as LayerNorm and/or l_2 regularisation, deriving a provably stable deep Q learning update WITHOUT ANY REPLAY BUFFER OR TARGET NETWORKS @jfoerst.bsky.social @flair-ox.bsky.social
20.03.2025 09:01 β π 4 π 2 π¬ 0 π 0
Are academic conferences in the US a thing of the past?
19.03.2025 18:48 β π 30 π 13 π¬ 6 π 1
Fixing TD Pt I: Why is Temporal Difference Learning so Unstable?
PQN Blog 1/3: TD methods are the bread and butter of RL, yet can have convergence issues when used in practice. This has always annoyed me. Find out below why TD is so unstable and how can we understand this instability better using the TD Jacobian. @flair-ox.bsky.social @jfoerst.bsky.social
19.03.2025 08:36 β π 19 π 3 π¬ 3 π 2
Super excited to share our paper, Simplifying Deep Temporal Difference Learning has been accepted as a spotlight at ICLR! My fab collaborator Matteo Gallici and I have written a three part blog on the work, so stay tuned for that! :)
@flair-ox.bsky.social
arxiv.org/pdf/2407.04811
18.03.2025 11:48 β π 20 π 4 π¬ 3 π 2
On it
15.11.2024 07:31 β π 3 π 0 π¬ 1 π 0
If you're an RL researcher or RL adjacent, pipe up to make sure I've added you here!
go.bsky.app/3WPHcHg
09.11.2024 16:42 β π 71 π 26 π¬ 52 π 0
Feel free to add me!
15.11.2024 07:28 β π 1 π 0 π¬ 1 π 0
Evolutionary biologist. Assistant Professor CCLCM. Author of The Gene's-Eye View of Evolution.
www.arvidagren.com
RL researcher looking for DACs // What is this AutoRL anyway?
she/her
Currently: Leibniz Uni Hannover
Previously: Uni Freiburg (Master's) | Meta AI London (Intern)
Always & Forever: AutoRL.org
ML research group @universityofoxford.bsky.social. Focussed on multi-agent, open-ended, meta and reinforcement learning as well as agent based models. More at http://foersterlab.com.
my name is june - she/her - evil transsexual disinfo influencer
i am a host on the kill the computer podcast and the ill conceived podcast:
- www.killthecomputer.com
- https://www.illconceivedpodcast.com
a reputable journalist at @theonion.com
Anti-cynic. Towards a weirder future. Reinforcement Learning, Autonomous Vehicles, transportation systems, the works. Asst. Prof at NYU
https://emerge-lab.github.io
https://www.admonymous.co/eugenevinitsky
AI @ OpenAI, Tesla, Stanford
Waiting on a robot body. All opinions are universal and held by both employers and family.
Recruiting students to start my lab!
ML/NLP/they/she.
So far I have not found the science, but the numbers keep on circling me.
Views my own, unfortunately.
A LLN - large language Nathan - (RL, RLHF, society, robotics), athlete, yogi, chef
Writes http://interconnects.ai
At Ai2 via HuggingFace, Berkeley, and normal places
Research Scientist at Google DeepMind, interested in multiagent reinforcement learning, game theory, games, and search/planning.
Lover of Linux π§, coffee β, and retro gaming. Big fan of open-source. #gohabsgo π¨π¦
For more info: https://linktr.ee/sharky6000
Principal Researcher @ Microsoft Research.
Cognitive computational neuroscience & AI.
Writer. Nature wanderer.
www.momen-nejad.org
@Penn Prof, deep learning, brains, #causality, rigor, http://neuromatch.io, Transdisciplinary optimist, Dad, Loves outdoors, π¦ , c4r.io
Interested in cognition and artificial intelligence. Research Scientist at Google DeepMind. Previously cognitive science at Stanford. Posts are mine.
lampinen.github.io
Staff research scientist at Google DeepMind. AI and neuro.
Former physicist, current human.
Find more at www.janexwang.com
AI, RL, NLP, Games Asst Prof at UCSD
Research Scientist at Nvidia
Lab: http://pearls.ucsd.edu
Personal: prithvirajva.com
professor at university of washington and founder at csm.ai. computational cognitive scientist. working on social and artificial intelligence and alignment.
http://faculty.washington.edu/maxkw/
AI and Games Researcher at NYU.