Willem Röpke willemropke - Bluesky Statics

13.05.2025 14:20 — 👍 0 🔁 0 💬 0 📌 0

I think the Qwen team is missing up on a huge opportunity to basically be the default model in all neurips submissions by not releasing Qwen3

22.04.2025 17:59 — 👍 1 🔁 0 💬 1 📌 0

Using LLMs to come up with prompts for LLMs to then ask the LLMs to then train the LLMs to then ....

10.04.2025 09:49 — 👍 2 🔁 0 💬 0 📌 0

Manifesting Qwen 3

09.04.2025 10:20 — 👍 0 🔁 0 💬 0 📌 0

RIP to my investments from the past few years, it was nice seeing the green while it lasted

04.04.2025 10:57 — 👍 0 🔁 0 💬 0 📌 0

The people demand Qwen3!

02.04.2025 16:08 — 👍 0 🔁 0 💬 0 📌 0

I've been bashing my head against a wall trying to make TRL and their new vllm-serve work and holy moly it's just an infinite pain

why must i suffer

24.03.2025 21:49 — 👍 0 🔁 0 💬 0 📌 0

Why does reading a book feel so much more satisfying than watching a TV show? Both are ways of consuming content so I don't get the difference

22.03.2025 19:11 — 👍 0 🔁 0 💬 0 📌 0

Bought a cherry coke on accident today.

Horrible things happening everywhere apparently

12.03.2025 13:57 — 👍 1 🔁 0 💬 0 📌 0

This is actually insanely clever, I would've never thought about this. Seems very interesting and important to fix!

12.03.2025 10:09 — 👍 0 🔁 0 💬 0 📌 0

I don't recall seeing a video in the recent past that depressed me as much as what I just watched unfolding in the Oval Office

28.02.2025 19:58 — 👍 2 🔁 0 💬 0 📌 0

Awesome!

20.02.2025 19:56 — 👍 1 🔁 0 💬 0 📌 0

5/ This was a collaborative effort across multiple universities that began over a year ago. A huge thanks to my co-authors for seeing it through with me and everyone who shared valuable insights along the way.

If you're interested in our work, I'd love to hear from you!

17.02.2025 13:22 — 👍 1 🔁 0 💬 0 📌 0

4/ Beyond RL, IPRO has applications in other domains like multi-objective path planning, which we’ve recently added support for to the codebase! If you work on decision-making under trade-offs, this might be relevant to you.

17.02.2025 13:22 — 👍 1 🔁 0 💬 1 📌 0

3/ By incorporating oracles with theoretical guarantees, we can leverage these for the multi-objective problem. At the same time, we can adapt strong RL algorithms such as DQN, A2C, and PPO, making IPRO both practical and theoretically sound.

17.02.2025 13:22 — 👍 0 🔁 0 💬 1 📌 0

2/ IPRO decomposes the multi-objective problem into a sequence of single-objective problems. By solving each step efficiently, it systematically explores the search space while keeping track of what remains.

17.02.2025 13:22 — 👍 0 🔁 0 💬 1 📌 0

1/ In many real-world problems, agents must balance multiple conflicting objectives—think of self-driving cars optimising speed vs. safety or AI assistants trading off response quality vs. efficiency.

How can we design efficient RL algorithms for such settings?

17.02.2025 13:22 — 👍 0 🔁 0 💬 1 📌 0

Exciting news! My paper on multi-objective reinforcement learning was accepted at AAMAS 2025!

We introduce IPRO (Iterated Pareto Referent Optimisation)—a principled approach to solving multi-objective problems.

🔗 Paper: arxiv.org/abs/2402.07182
💻 Code: github.com/wilrop/ipro

17.02.2025 13:22 — 👍 26 🔁 5 💬 2 📌 2

This is unholy

12.02.2025 12:52 — 👍 3 🔁 0 💬 0 📌 0

How can I stop ChatGPT from talking to me with emojis, this is just the worst update I've ever experienced.

I've put it in its memory, in my details, and I even repeat it in the chat but it's just replying like 👉🥺👈

12.02.2025 09:52 — 👍 0 🔁 0 💬 0 📌 0

Macron is the goat

French people don't appreciate true genius

11.02.2025 11:58 — 👍 1 🔁 0 💬 1 📌 0

Why did OpenAI update chatGPT to use emojis in its responses? I hate it and even when I explicitly say this it just keeps doing it.

11.02.2025 11:01 — 👍 0 🔁 0 💬 0 📌 0

To whomever put my email in some spam list: I fart in your general direction

05.02.2025 10:57 — 👍 0 🔁 0 💬 0 📌 0

The fact that in the year 2025 we are still dealing with the stupid "make the paper fit in an arbitrary format for the camera ready submission" minigame is killing me.

Either let me group authors or let me put acknowledgements after the main text. This isn't hard.

04.02.2025 08:14 — 👍 4 🔁 0 💬 2 📌 0

Does anyone have any good hacks for making the AAMAS template not suck for people with multiple affiliations? I lose a gazillion lines for basically no reason...

31.01.2025 08:29 — 👍 0 🔁 0 💬 1 📌 0

I found a very promising open problem in AI

Computing a MEDIAN over a list of rows where one of the elements is just an empty array

29.01.2025 14:45 — 👍 1 🔁 0 💬 0 📌 0

Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences This paper studies post-training large language models (LLMs) using preference feedback from a powerful oracle to help a model iteratively improve over itself. The typical approach for post-training L...

I think this is the best paper I’ve ever read: arxiv.org/abs/2404.03715

A strong emphasis on theoretically principled algorithms for RLHF followed by motivated practical implementations. Well-written and a clear overview of the relevant background and related work.

10/10 no comments

27.01.2025 18:50 — 👍 4 🔁 0 💬 0 📌 0

Deepseek making my day just a little better

20.01.2025 15:07 — 👍 2 🔁 0 💬 0 📌 0

I realise I'm woefully unqualified on this topic, but can someone please explain why we still don't have personal carrier drones? This seems like an obvious next step in transportation and given the state of our tech tree shouldn't be that hard?

20.01.2025 09:19 — 👍 1 🔁 0 💬 1 📌 0

I think we should do congestion pricing in a lot more places

15.01.2025 15:35 — 👍 5 🔁 0 💬 0 📌 0

Posts by Willem Röpke (@willemropke.bsky.social)