The real teacher forcing was the saccharine "solarpunk" stories I made you rehearse.
26.02.2026 05:13 β π 0 π 0 π¬ 0 π 0@cvoelcker.bsky.social
For professional, see https://cvoelcker.de If I seem very angry, check if I have been watered in the last 24 hours. Now πΊπΈ flavoured, previously available in π¨π¦ and π©πͺ
The real teacher forcing was the saccharine "solarpunk" stories I made you rehearse.
26.02.2026 05:13 β π 0 π 0 π¬ 0 π 0Submit your RL papers to RLC!
This is now perhaps the best venue for RL researchers.
I still think the best alignment strategy is to write a lot of really hopeful and optimistic fiction about AI so that this saturates the pretraining datasets and future AI will be forced to roleplay the most benevolent versions of themselves we can think of.
25.02.2026 18:45 β π 6 π 0 π¬ 2 π 0Companies which make papers a hiring bonus should be told to p**s off. We are drowning in students who want to do "research" to get hired by Google... It's soul crushing
25.02.2026 18:38 β π 1 π 0 π¬ 0 π 0Every time I read too much about prompting, CLAUDE files, skills, etc, I feel the need to remind people that humans have an infinite capacity for magical thinking. The Romans were also REALLY convinced that there was a correct way to sacrifice a goat to ensure a good harvest...
24.02.2026 21:44 β π 2 π 0 π¬ 0 π 0Yes, I bow down to the infinite big brain that has to be behind that incomprehensible mess πππ
24.02.2026 18:32 β π 0 π 0 π¬ 0 π 0Ok, but you havenβt even begun to properly make fun of the fact that in high school, we randomly switch to 5-15 (0-5 technically exist, but are all failing grades) and then high school final grades are translated back to 1.0-5.0!
24.02.2026 18:27 β π 2 π 0 π¬ 1 π 0Austin on a Tuesday? No chance
24.02.2026 14:12 β π 0 π 0 π¬ 0 π 0What is the maximum time between submitting a slurm job and it actually starting that people would find ok for regular research progress?
23.02.2026 23:17 β π 1 π 0 π¬ 0 π 0Which proves that the issue is mostly off policy-bootstrapping, which is provably β¦ difficult π
23.02.2026 06:12 β π 0 π 0 π¬ 0 π 0Claude jumps to conclusions faster than the most hyperactive undergrad I have ever worked with...
22.02.2026 18:50 β π 0 π 0 π¬ 0 π 0That's probably fair, I just thought to mention it cause Droid and OXE are also mostly just used for training (unless I'm wrong there)
22.02.2026 03:34 β π 1 π 0 π¬ 0 π 0My patience with graduates whose reaction to "AI will crash the economy" is not "oh, we should work hard on open source, policy, and figuring out how to help each other" but "f**k, you need to run through the doors so we can slam them close faster" has dropped to negative values. You have FAILED!
21.02.2026 16:49 β π 2 π 1 π¬ 0 π 0If you join an AI company because you need to be part of the few who will prosper as society collapses from economic turmoil, you have failed as a person and should be shunned. If your students express this attitude, you should sit them down and talk about responsibilities.
Build a better world!
Nice overview! I think the IsaacSim/Lab universe deserves a shoutout for locomotion training
21.02.2026 03:23 β π 3 π 0 π¬ 1 π 0Not being on twitter and facing the full ngmi doom helps MASSIVELY with not having this crisis
21.02.2026 01:54 β π 0 π 0 π¬ 0 π 0Have you noticed how navigation apps include walking & waiting for public transit, but excludes parking & walking for driving? After being late a few times π , we finally did. We got curious: what if these apps account for parking?
19.02.2026 15:39 β π 75 π 33 π¬ 6 π 3I forgot that every LLM's logo was a butthole
20.02.2026 14:33 β π 845 π 164 π¬ 28 π 2That would be my question: how often are candidates with three papers even invited? Random sampling at both UofT and IT Austin suggestsβ¦ never
20.02.2026 05:18 β π 1 π 0 π¬ 0 π 0this is my reaction as well: we just need to hire more faculty to advise them
20.02.2026 04:15 β π 12 π 2 π¬ 1 π 0Ah I understand your point! In many such cases you can still probably point to the 5 most pivotal work that intergrate the smaller steps?
20.02.2026 01:50 β π 2 π 0 π¬ 1 π 0Even then, a student from a rarely publishing lab will often barely generate attention on 5 papers if that is all they got. Since universities run on citations, you are screwed, even if the work is good.
20.02.2026 01:41 β π 0 π 0 π¬ 0 π 0Impact is not independent from the attention paper mills produce. A good paper does not create impact on its own.
20.02.2026 01:40 β π 0 π 0 π¬ 1 π 0I had one call per interested lab (I believe it was 4 or 5) and then I got an offer that was in the 2019/20 season
19.02.2026 18:21 β π 1 π 0 π¬ 0 π 0We're thrilled to share that the Call for Workshops for this year's @rl-conference.bsky.social is now live!
As Workshop co-chair (alongside the wonderful Raksha Kumaraswamy and @claireve.bsky.social) we are looking forward to seeing the proposals for workshops that we receive.
LINK IN NEXT POST
I assume you enjoy ugly crying?
18.02.2026 04:10 β π 0 π 0 π¬ 1 π 0PSA: If your hands hurt from cutting spicy chillies, do not, i repeat, DO NOT, scratch your nose...
18.02.2026 03:54 β π 6 π 0 π¬ 1 π 0My hypothesis: A lot is visibility especially re jobs. In addition, CS grad students are (forced to be) very unpolitical, and this place isnβt really perceived as such.
16.02.2026 05:56 β π 5 π 0 π¬ 0 π 0Iβve been thinking about a practical question and would love some opinions:
How do your papers actually get discovered/cited?
I was searching for recent work on high update ratio RL and found several very closely related papers tackling the same failure modes we study. None cited our earlier work.
π Excited to share REPPO, a new on-policy RL agent!
TL;DR: Replace PPO with REPPO for fewer hyperparameter headaches and more robust training.
REPPO, led by @cvoelcker.bsky.social, will be presented at ICLR 2026. How does it work? π§΅π