 
                        
                OpenAI releases a free GPT model that can run right on your laptop
                GPT-OSS is OpenAIβs first open-weight model in six years.
            
        
    
    
            NEW: OpenAI is releasing two free open models today, ahead of the GPT-5 launch. One of the open-weight "GPT-OSS" models is small enough to run on a laptop. More from @alexeheath.com π www.theverge.com/openai/71878...
               
            
            
                05.08.2025 17:01 β π 48    π 5    π¬ 1    π 1                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            According to new research by waymo, self driving cars neural nets perform better according to power scaling laws. More data and compute = better performance. waymo.com/blog/2025/06...
               
            
            
                14.06.2025 01:31 β π 64    π 8    π¬ 8    π 2                      
            
         
            
        
            
        
            
            
            
            
            
    
    
    
    
            True but looking for data to back up what "you believe" is already a good sign, right? It seems better than trying to claim things without anything to back the claims. And at least, you can then argue with people who disagree based on the data on a scientific basis.
               
            
            
                05.06.2025 11:11 β π 3    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Interested in more insights about the progress of AI, you can check out these two sources:
www.bondcap.com/report/tai
www.ben-evans.com/presentations
               
            
            
                04.06.2025 14:27 β π 0    π 0    π¬ 0    π 0                      
            
         
            
        
            
        
            
            
            
            
            
    
    
    
    
            Not long ago, people laughed at the idea of AI generating minutes-long realistic videos. Now it's reality with tools like Sora and Veo 3 leading the way. Full movies in cinemas soon, generated from just a few prompts...
               
            
            
                25.05.2025 14:21 β π 2    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Shoutout to the creators of PQN and for the cleanRL baselines.
               
            
            
                22.05.2025 11:39 β π 0    π 0    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            My co-authors: Jacob Kooi and Zhao Yang
Paper: arxiv.org/abs/2505.15345
Codebase: github.com/Jacobkooi/Ha...
               
            
            
                22.05.2025 11:38 β π 0    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            Directly implementing the Hadamax encoder in other algorithms such as C51 also shows over 60% improvements.
               
            
            
                22.05.2025 11:35 β π 0    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            The Hadamax architecture can be implemented in any pixel-based encoder. The most important design choices are:
1. Convolutional Hadamard Representations.
2. Max-pooling instead of convolutional down-sampling.
3. Gaussian Error Linear Unit activations.
               
            
            
                22.05.2025 11:34 β π 1    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            Without changing any algorithmic hyperparameters, this encoder substitution places Hadamax-PQN among state-of-the-art model-free reinforcement learning, while remaining an order of magnitude faster than Rainbow.
               
            
            
                22.05.2025 11:34 β π 1    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            π’New paper on arXiv: Hadamax Encoding: Elevating Performance in Model-Free Atari. (arxiv.org/abs/2505.15345)
Our Hadamax (Hadamard max-pooling) encoder architecture improves the recent PQN algorithmβs Atari performance by 80%, allowing it to significantly surpass Rainbow-DQN!
               
            
            
                22.05.2025 11:33 β π 5    π 0    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Making stock market predictions (especially short/medium term) is tempting but unless you have privileged information, you might as well try predicting random noise. Financial markets are self-adapting systems where any predictable pattern tends to be exploited and arbitraged away by participants.
               
            
            
                17.05.2025 12:10 β π 1    π 0    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            Just shared a new article on "The State of Reinforcement Learning for LLM Reasoning"!
If you are new to reinforcement learning, this article has a generous intro section (PPO, GRPO, etc)
Also, I cover 15 recent articles focused on RL & Reasoning.
π magazine.sebastianraschka.com/p/the-state-...
               
            
            
                19.04.2025 13:48 β π 61    π 10    π¬ 1    π 2                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            This is indeed a great position paper, I like it a lot:
- pre-training w next token prediction creates local minima in reasoning we can't escape => pre-training should also be done with RL
- long context windows lead to exploitation of spurious correleations
- disentangle reasoning and knowledge
               
            
            
                17.04.2025 07:04 β π 21    π 2    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            The funny thing about multimodal image generation as released in the last week by Google and OpenAI is that now LLM image generation works like how most people using LLMs for the past two years always thought LLM image generation works.
               
            
            
                26.03.2025 01:17 β π 77    π 6    π¬ 1    π 0                      
            
         
            
        
            
        
            
            
            
            
            
    
    
            
            
            
                YouTube video by Amii
                TURING AWARD WINNER Richard S. Sutton in Conversation with Cam Linke | No Authorities in Science
            
         
    
    
            www.youtube.com/watch?v=9_Pe... An interview with Rich. The humility of Rich is truly inspiring: "There are no authorities in science". I wish people would listen and live by this.
               
            
            
                06.03.2025 20:50 β π 40    π 13    π¬ 2    π 1                      
            
         
            
        
            
        
            
            
            
            
            
    
    
            
                             
                        
                Ramon Llull AIRA Open Calls
                Open Calls  
In our inaugural call scheduled for December 2024, we aim to select up to 17 exceptional postdoctoral fellows, with an additional 16 to be chosen in Call 2 in 2025.  
20Β  December 2024 ...
            
        
    
    
            check this out: new postdoc program for AI-related research in Catalunya!
our group is looking to hire within this program, ideally to work on topics related to RL theory. in case you're interested, pls DM or email me.
(retweets appreciated!)
ramonllull-aira.eu/application
               
            
            
                22.01.2025 16:55 β π 12    π 10    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            How DeepSeek R1's Multi-round Conversation works.
api-docs.deepseek.com/guides/reaso...
               
            
            
                20.01.2025 17:04 β π 13    π 1    π¬ 0    π 0                      
            
         
            
        
            
            
            
            
                                                 
                                                
    
    
    
    
            Bombshell from DeepSeek: the R1 family of models. Incredibly, it's MIT licensed and they encourage us to distill from it.
The core of the approach is reinforcement learning from verifiable rewards. No PRMs / MCTS. R1-zero doesn't even use SFT to start.
               
            
            
                20.01.2025 15:35 β π 8    π 2    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
            
    
    
            
                             
                        
                2024 Robotics Year in Review
                Robotics finally feels like it's happening
            
        
    
    
            I probably donβt need to tell you that 2024 was a huge year for robotics. As a long-time robotics researcher, itβs been amazing to watch; some of the things that I always dreamed about actually seem to be happening.
For me, there are three big stories: itcanthink.substack.com/p/2024-robot...
               
            
            
                02.01.2025 18:15 β π 36    π 7    π¬ 2    π 3                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Super happy to reveal our new paper! ππβοΈ
We trained a model to play four games, and the performance in each increases by "external search" (MCTS using a learned world model) and "internal search" where the model outputs the whole plan on its own!
               
            
            
                05.12.2024 09:09 β π 137    π 17    π¬ 4    π 8                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            RLDM will be held next year in Dublin!
A reminder that the call for workshops is out: rldm.org/call-for-wor...
The workshops are one of my favourite parts of the conference :) please get in touch if you have any questions!
               
            
            
                22.11.2024 09:57 β π 42    π 15    π¬ 1    π 0                      
            
         
            
        
            
            
            
            
            
    
    
    
    
            Hello, world! You seem a bit wilder than I expected, but here we are.
               
            
            
                18.11.2024 19:15 β π 12    π 0    π¬ 3    π 0                      
            
         
    
         
        
            
        
                            
                    
                    
                                            Researcher on MDPs and RL. Retired prof. #orms #rl
                                     
                            
                    
                    
                                            Postdoc @ University of Amsterdam | world models and sensorimotor abstractions for RL and cognitive modeling |πΎπ€π§ 
https://cgumbsch.github.io
                                     
                            
                    
                    
                                            Biostatistician @IDEXX formerly at harvardmed, @BIDMChealth, @nasa. Big data, clinical trials, and medical diagnostics. Mainer. Opinions are my own. he/him
                                     
                            
                    
                    
                                            This is the official account of EWRL18 - European Workshop on Reinforcement Learning 
Official website: https://euro-workshop-on-reinforcement-learning.github.io/ewrl18/
                                     
                            
                    
                    
                                            Anthropologist + Data Scientist. Cofounder at aampe.com
                                     
                            
                    
                    
                                            Professor for Machine Learning, University of TΓΌbingen, Germany
                                     
                            
                    
                    
                                            Assistant Professor in Robotics and Autonomous Systems Heriot-Watt University & The National Robotarium
                                     
                            
                    
                    
                                            Assistant Professor at the Department of Computer Science, University of Liverpool.
https://lutzoe.github.io/
                                     
                            
                    
                    
                                            PhD student at INRIA (FLOWERS team) working on LLM4code | Prev. (MVA) ENS ParisSaclay
                                     
                            
                    
                    
                                            Empowering Businesses Through Tech πΌ | Software Development π₯οΈ | Digital Marketing & Growth Hacking π | AI-driven Web Dev Enthusiast π€ | ML Research Tinkerer π
                                     
                            
                    
                    
                                            Assistant Professor at UW and Staff Research Scientist at Google DeepMind. Social Reinforcement Learning in multi-agent and human-AI interactions. PhD from MIT. Check out https://socialrl.cs.washington.edu/ and https://natashajaques.ai/.
                                     
                            
                    
                    
                                            Principal Researcher in AI/ML/RL Theory @ Microsoft Research NE/NYC. Previously @ MIT, Cornell. http://dylanfoster.net
RL Theory Lecture Notes: https://arxiv.org/abs/2312.16730
                                     
                            
                    
                    
                                            Assistant Professor at Maastricht University.
Research interests: AI, RL, games. Tic-Tac-Toe aficionado. Opinions my own, but should be everyone's.
Anon feedback: admonymous.co/dennis-soemers
                                     
                            
                    
                    
                                            Research Scientist @ Idiap Research Institute. @idiap.bsky.social
Adjunct lecturer @ Australian Institute for ML. @aimlofficial.bsky.social
Occasionally cycling across continents.
https://www.damienteney.info
                                     
                            
                    
                    
                                            Associate Professor - University of Alberta
Canada CIFAR AI Chair with Amii
Machine Learning and Program Synthesis
he/him; ele/dele π¨π¦ π§π·
https://www.cs.ualberta.ca/~santanad
                                     
                            
                    
                    
                                            Savorer of NaN β machine learning, data, code β here for the preprints β research scientist at NVIDIA, ex-Supercell, ex-Nokia β opinions mine
                                     
                            
                    
                    
                                            Machine Learning Researcher
https://alexalemi.com
https://blog.alexalemi.com
                                     
                            
                    
                    
                                            Postdoc at Utrecht University, previously PhD candidate at the University of Amsterdam
Multimodal NLP, Vision and Language, Cognitively Inspired NLP 
https://ecekt.github.io/
                                     
                            
                    
                    
                                            Assistant prof at TU Graz, formerly assistant prof at TU Eindhoven, Marie-Curie Fellow at University of Cambridge. Probabilistic Machine Learning.
                                     
                            
                    
                    
                                            Postdoc at University of California, Riverside