's Avatar

@sacha2.bsky.social

183 Followers  |  428 Following  |  1,668 Posts  |  Joined: 18.10.2024  |  2.535

Latest posts by sacha2.bsky.social on Bluesky

Currenltly, i know more negative than positive examples. At least, you are in position to change that

28.10.2025 22:18 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Mmmm

28.10.2025 22:10 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

why do you think all the tasks aren't aggregated in one place, like prime intellect hub or kaggle arena?

28.10.2025 21:10 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
BattleshipQA: Shoot First, Ask Questions Later? BattleshipQA

Battleship arena for LLMs

another cool imperfect information task

www.gabegrand.com/battleship/

cc @sharky6000.bsky.social

28.10.2025 21:09 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

bluesky is for american type of people

uncomfortable

28.10.2025 20:53 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Grokipedia (@Grokipedia) / X Grokipedia (@Grokipedia) / X

same type beat
x.com/Grokipedia

28.10.2025 20:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Can you share a screenshot, please. I have never seen one as well

28.10.2025 13:54 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

le chat

28.10.2025 13:07 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Felicitations

28.10.2025 12:44 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

everyone has recently started to write code so fast β€” entire projects are being done over a weekend

interesting how to become better at this

27.10.2025 18:46 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Large Action Models: From Inception to Implementation

Lu Wang, Fangkai Yang, Chaoyun Zhang et al.

Action editor: Edward Grefenstette

https://openreview.net/forum?id=bYdKtf0Q31

#agent #agents #ai

27.10.2025 16:18 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Thank you for the clarification

27.10.2025 13:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

First time seeing someone calling Simplex algorithm fast, but OK. It's like calling a hashmap or A* fast. Let's check out the paper.

27.10.2025 06:15 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Nowadays, we say "chopped"

26.10.2025 16:51 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

31k submissions to AAAI is fucking crazy

26.10.2025 15:25 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Replicable Reinforcement Learning with Linear Function Approximation Replication of experimental results has been a challenge faced by many scientific disciplines, including the field of machine learning. Recent work on the theory of machine learning has formalized rep...

I think I posted about it before but never with a thread. We recently put a new preprint on arxiv.

πŸ“– Replicable Reinforcement Learning with Linear Function Approximation

πŸ”— arxiv.org/abs/2509.08660

In this paper, we study formal replicability in RL with linear function approximation. The... (1/6)

26.10.2025 14:16 β€” πŸ‘ 20    πŸ” 6    πŸ’¬ 2    πŸ“Œ 1
Post image

still true, i reckon

25.10.2025 17:02 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

I now understand what you're up to. However, i mistakenly thought about PPO all the time, which is slightly different and may require additional assumptions for the convergence and, thus, deliberate notion of policy.

Sorry.

25.10.2025 15:18 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Not necessary, afaik

25.10.2025 15:15 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

However, if you formulate the problem like more general stochastic optimisation over a sequence of tokens, then maybe the notion of policy is unne

25.10.2025 15:10 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

RL problem, if it could be denoted as such. For example, if one wants to finetune their LLM with RL, the solution of the problem would be the policy, or a function of the model

25.10.2025 15:07 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

When you present the results of any RL-based experiments, you need to evaluate your model to get the evaluation return. But with respect to what policy you get the return? That's why the notion of the policy, even for language generation, should be clear

25.10.2025 15:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Yes, but you still need to understand what the policy of the LLM is because the objective optimisation has to find a policy that maximises cumulative return...

25.10.2025 14:57 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Maybe due to the choice of words, it's upsetting, but a language model can indeed act a policy or strategy. Refer to the paper "Strings as strategies..." However, to be fair, the definition might be written better.

25.10.2025 14:51 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

PPO is still more than a bag of heuristics, I would argue. It might have been proposed as a heuristic approximation of the trust region method, but with time has become a theoretically grounded procedure.

25.10.2025 14:45 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Btw have you got any opinion about "just a band" dead poets society?

25.10.2025 14:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

67

25.10.2025 12:35 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Bob Vylan - He's a Man
YouTube video by Bob Vylan Bob Vylan - He's a Man

btw every msterpiece has its own cheap copy, like
youtu.be/anlghGgWuAs?...

24.10.2025 20:05 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

what i won't add to the list - Ivan Dorn, lol

24.10.2025 19:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Thou Shalt Always Kill (Original Video) Higher Quality - dan le sac Vs Scroobius Pip
YouTube video by dan le sac Vs Scroobius Pip Thou Shalt Always Kill (Original Video) Higher Quality - dan le sac Vs Scroobius Pip

-- Dan Le Lac vs. Scroobius Pip "Thou shalt always kill"

youtu.be/CWrMGXwhFLk?...

24.10.2025 19:41 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0

@sacha2 is following 20 prominent accounts