Reading group today at 2pm BST!
We are starting our NeurIPS series with Sable and Oryx, sequence models for scalable multi-agent coordination from the RL Research Team at InstaDeep. ๐
Papers:
- Sable: bit.ly/3Lme7jH
- Oryx: bit.ly/47GJb4T
Meeting:
- bit.ly/3JoEbtU
06.11.2025 10:43 โ ๐ 3 ๐ 1 ๐ฌ 0 ๐ 0
UoE RL Reading Group
University of Edinburgh Reinforcement Learning Reading Group
๐ข RL reading group Thursday @ 16:00 BST ๐ข
Speaker: Alex Lewandowski
Title: The World Is Bigger: A Computationally-Embedded Perspective on the Big World Hypothesis ๐
Details: edinburgh-rl.github.io/reading-group
03.09.2025 11:32 โ ๐ 6 ๐ 3 ๐ฌ 0 ๐ 0
Refreshing to see posts like this compared to "we have 15 papers accepted at X" ๐
19.08.2025 11:44 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
None of our impactful papers have had an easy path through traditional venues.
Most cited paper? Rejected four times.
Most impactful paper? Poster at a conference.
But none of it matters because arxiv makes everything work
18.08.2025 23:40 โ ๐ 109 ๐ 6 ๐ฌ 6 ๐ 4
๐๐
03.08.2025 20:14 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0
๐จ๐ฆ Heading to @rl-conference.bsky.social next week to present HyperMARL (@cocomarl-workshop.bsky.social) and Remember Markov (Finding The Frame Workshop).
If you are around, hmu, happy to chat about Multi-Agent Systems (MARL, agentic systems), open-endedness, environments, or anything related! ๐
03.08.2025 10:41 โ ๐ 9 ๐ 2 ๐ฌ 0 ๐ 2
We are thrilled to announce our next keynote speaker
@wellingmax.bsky.social, Professor at the University of Amsterdam, Visiting Professor at Caltech and CTO & Co-Founder of CuspAI.
Catch his talk โHow AI could transform the sciencesโ on August 18 at 4:30 PM GMT+2.
#DLI2025
30.07.2025 10:52 โ ๐ 1 ๐ 1 ๐ฌ 0 ๐ 0
UoE RL Reading Group
University of Edinburgh Reinforcement Learning Reading Group
RL reading group TODAY @ 15:00 BST ๐ฅ
Speaker: Cam Allen (Postdoc, UC Berkeley)
Title: The Agent Must Choose the Problem Model
Details: edinburgh-rl.github.io/reading-group
24.07.2025 05:39 โ ๐ 3 ๐ 1 ๐ฌ 0 ๐ 0
Always nice to see when simpler methods + good evaluations > more complicated ones. ๐
23.07.2025 09:47 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
Reading group is back for those interested in RL/MARL/agents/open-endedness and alike... First session today at 3pm BST, @mattieml.bsky.social is presenting the Simplifying TD learning/PQN paper. ๐ Meeting link: bit.ly/4lfdaGR Sign up: bit.ly/40xNQDR
10.07.2025 10:49 โ ๐ 3 ๐ 1 ๐ฌ 0 ๐ 0
Hello world! This is the RL & Agents Reading Group
We organise regular meetings to discuss recent papers in Reinforcement Learning (RL), Multi-Agent RL and related areas (open-ended learning, LLM agents, robotics, etc).
Meetings take place online and are open to everyone ๐
10.07.2025 10:29 โ ๐ 37 ๐ 12 ๐ฌ 1 ๐ 3
This has happened to me too many times ๐คฆโโ๏ธ Also doesn't help that Jax and PyTorch use different default initialisations for dense layers.
24.06.2025 07:19 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0
Well done & well deserved!! ๐๐ It has been awesome to see this project evolve from the early days.
23.06.2025 06:45 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
Edinburgh RL Reading Group
Please add your details so that you can remain on the mailing list for the RL Reading Group.
The Edinburgh one will be back and running soon. We are just updating the website and other things. There is this form for people interested - forms.gle/DAbkpN9b4cUt...
05.06.2025 15:40 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 1
Forgot to also add โก quickstart link for people who like to experiment on notebooks: github.com/KaleabTesser...
28.05.2025 09:37 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
Thanks for checking it out! ๐ Good point, there might be an interesting link between MoEs and hypernets. We used hypernets since they're simpler (no need to pick or combine experts), and maximally expressive (gen weights directly).
Lol yes, will had a .gitignore, missed it when copying things over.
28.05.2025 07:40 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
HyperMARL: Adaptive Hypernetworks for Multi-Agent RL
Adaptability to specialised or homogeneous behaviours is critical in cooperative multi-agent reinforcement learning (MARL). Parameter sharing (PS) techniques, common for efficient adaptation, often li...
๐ฏ TL;DR: HyperMARL is a versatile approach for adaptive MARL -- no changes to the RL objective, preset diversity, or seq. updates needed. See paper & code below!
Work with Arrasy Rahman, Amos Storkey & Stefano Albrecht.
๐: arxiv.org/abs/2412.04233
๐ฉโ๐ป: github.com/KaleabTessera/HyperMARL
27.05.2025 11:07 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0
โ ๏ธ Limitations (+opportunity): HyperMARL uses vanilla hypernets, which can inc. param. count esp. MLP hypernets. In RL/MARL this matters less (actor-critic nets are small), and params grow ~const with #agents, so scaling remains strong. Future work could explore chunked hypernets.
27.05.2025 11:07 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
๐ We also do ablations and see the importance of the decoupling and the simple initialisation scheme we follow.
27.05.2025 11:07 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
๐ We validate HyperMARL across various diverse envs (18 settings; up to 20 agents) and find that it achieves competitive mean episode returns compared to NoPS, FuPS, and modern diversity-focused methods -- without using diversity losses, preset diversity levels or seq. updates.
27.05.2025 11:07 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
๐กTo address the coupling problem, we propose ๐๐ฒ๐ฉ๐๐ซ๐๐๐๐: a method that explicitly ๐๐๐๐จ๐ฎ๐ฉ๐ฅ๐๐ฌ obs- and agent-conditioned gradients with hypernetworks. This means obs grad noise is avg. per agent (Zแตข) before applying agent-cond. grads (Jแตข) -- unlike FuPS, which entangles both.
27.05.2025 11:07 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
Specialisation matrix game.
Performance and gradient interference plots.
๐ฌ We isolate FuPSโs failure in matrix games: shared policies struggle when agents need to act differently. Inter-agent gradient interference is at play -- especially when obs and agent IDs are ๐๐จ๐ฎ๐ฉ๐ฅ๐๐. Surprisingly, using only IDs (no obs) performed better and reduced interference.
27.05.2025 11:07 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
โExisting methods add a diversity loss, use sequential updates or require knowing the optimal task diversity level beforehand. These can be hard to tune or inefficient. We ask: can shared policies adapt without any of the above?
27.05.2025 11:07 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
โ๏ธ ๐๐ก๐๐ญโ๐ฌ ๐ญ๐ก๐ ๐ข๐ฌ๐ฌ๐ฎ๐? In MARL, optimal performance requires representing the right behaviours. Separate networks per agent (NoPS) enable agent specialisation but is costly & sample-inefficient; shared networks (FuPS) are efficient but lack agent diversity/specialisation.
27.05.2025 11:07 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0
๐๐ค Can a shared multi-agent RL policy support both specialised & homogeneous team behaviours -- without changing the learning objective, requiring preset diversity levels or sequential updates? Our preprint โ๐๐บ๐ฑ๐ฆ๐ณ๐๐๐๐: ๐๐ฅ๐ข๐ฑ๐ต๐ช๐ท๐ฆ ๐๐บ๐ฑ๐ฆ๐ณ๐ฏ๐ฆ๐ต๐ธ๐ฐ๐ณ๐ฌ๐ด ๐ง๐ฐ๐ณ ๐๐ถ๐ญ๐ต๐ช-๐๐จ๐ฆ๐ฏ๐ต ๐๐โ explores this!
27.05.2025 11:07 โ ๐ 11 ๐ 2 ๐ฌ 1 ๐ 2
Ah nice, looks like a fun competition!
26.05.2025 20:31 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0
For many, the Indaba was their first exposure to ML/AI, and this helps give more people that opportunity. Please donate if you can: gofund.me/61b012e4
24.05.2025 07:15 โ ๐ 4 ๐ 1 ๐ฌ 0 ๐ 0
๐ชThe 2024 Impact Report is here!
Our last Indabaโs theme, Xam Xamlรฉ (Wolof for "To Gather Knowledge and Share It"), beautifully reflected our mission: Empowering and educating through African AI. Read our report ๐https://deeplearningindaba.com/blog/2025/04/xam-xamle-our-latest-indaba-impact-report/
28.04.2025 14:54 โ ๐ 3 ๐ 2 ๐ฌ 1 ๐ 1
Incoming PhD, UC Berkeley
Interested in RL, AI Safety, Cooperative AI, TCS
https://karim-abdel.github.io
CS prof at University of Waterloo and Research Scientist at Google DeepMind.
Postdoc at UC Berkeley, Redwood Center | ๐ง ๐ค | ๐น | ๐พ | https://mysterioustune.com/
A latent space odyssey
gracekind.net
Neuroscience Professor at Mount Sinai, into the science of squad goals.
https://www.wulab.bio/
AI researcher Google DeepMind * hon. professor at Heriot-Watt University * mother of dragons * Own opinions only.
Currently LLM agents at META.
AI researcher in Reinforcement Learning, LLMs and Cultural Heritage.
https://lcipolina.github.io/
Reinforcement learning, but without rewards.
Postdoc at the Technion. PhD from Politecnico di Milano.
https://muttimirco.github.io
I work at Sakana AI ๐๐ ๐ก โ @sakanaai.bsky.social
https://sakana.ai/careers
This is the official account of EWRL18 - European Workshop on Reinforcement Learning
Official website: https://euro-workshop-on-reinforcement-learning.github.io/ewrl18/
EurIPS is a community-organized, NeurIPS-endorsed conference in Copenhagen where you can present papers accepted at @neuripsconf.bsky.social
eurips.cc
Official account of the Reinforcement Learning and Video Games Workshop at RLC 2025, August 5th.
Website: https://sites.google.com/view/rlvg-workshop-2025
RL & Agents Reading Group @ University of Edinburgh
We regularly discuss recent papers in RL, MARL & related
https://edinburgh-rl.github.io/reading-group
[bridged from https://blog.neurips.cc/ on the web: https://fed.brid.gy/web/blog.neurips.cc ]
Assistant Prof @ImperialCollege. Applied Bayesian inference, spatial stats and deep generative models for epidemiology. Passionate about probabilistic programmingโcheck out my evolving #Numpyro course: https://elizavetasemenova.github.io/prob-epi ๐
AGI safety researcher at Google DeepMind, leading causalincentives.com
Personal website: tomeveritt.se
Professor at the Gatsby Unit and Sainsbury Wellcome Centre, UCL, trying to figure out how we learn
Associate Professor - University of Alberta
Canada CIFAR AI Chair with Amii
Machine Learning and Program Synthesis
he/him; ele/dele ๐จ๐ฆ ๐ง๐ท
https://www.cs.ualberta.ca/~santanad