whats the difference in your view?
12.12.2025 17:17 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0@yoavgo.bsky.social
whats the difference in your view?
12.12.2025 17:17 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0i discuss this in the gist text. this is the more correct way to frame it imo (env provides observations, which agent interprets as rewards based on its goals), and it also opens up possible variations in how to think about learning from the env.
06.12.2025 00:44 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0I complain a lot about RL lately, and here we go again.
The CS view of RL is wrong in how it thinks about rewards, already at the setup level. Briefly, the reward computation should be part of the agent, not part of the environment.
More at length here:
gist.github.com/yoavg/3eb3e7...
yes it sucks to be the ICLR organizers today, totally agree
28.11.2025 00:01 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0given that the data is already out and a large jsonl file is rumored to be floating around (which seems very plausible to me), i think the moral thing to do now would be to make the breached data publicly available for all rather than trying to hide it.
27.11.2025 23:32 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0RL is ok. but the jump from
A) people can be thought of as agents who observe and environment, act, observe the outcome and update their beliefs
to:
B) lets model all things as a POMDP with a numeric reward function!
is just way too big for me
the fascinating (to me) quality of hard-core RL researchers (e.g Sutton) is the ability to have an all encompassing view of RL as the basis of intelligence, while at the same time working on super low level stuff like tabular TD algorithms, and yet strongly believe these are actually the same thing
27.11.2025 16:32 โ ๐ 20 ๐ 0 ๐ฌ 1 ๐ 0ืืกืืื ืืืืจื
19.11.2025 02:07 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0ืืืืืืื ืืื (ืื ื ืื ืื ืฉืืืืจืช ืฆืืืืืื ืจืฉืืืื ืืืืจื ืืฉืืื ืฉื ืืจืืื ืืืฉืื) ื ืชืคืก ืืฆืื ืืืจ? (ืื ื ืคืฉืื ืืืคืชืข ืื ืื ื ืื ืืืืชื ืืืฉื ืขื ืื)
18.11.2025 17:58 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0ืืชืืจ ืืื ืฉืจืืื ืขืฆืื ืืขืฉืื ืืืชืื ืืฉืื ืืื ืืืขืืช ืืื ืืืื ืื ืืขื ืืื, ืืฉืื ืื ืชืืืื ืืืกืืืจ ืื ืื ืื ืืงืคืืฅ ืคื?
18.11.2025 16:24 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0ืื ืืื ืชื ืื ืืจืคืจื ืก ืืฉืืจ ืฉื ืืกืฃ ืืื ืื ืจืืืชื ืืช ืืืจืืง ืืืืงื ืืจืคืจืืจ ืืฆืื.. ืืื ืืืืืจ ืืืื ืื ืื ืื ื ืืืืช ืื ืืขืฆื ืื ืื ื ืื ืืืฉื ืืืืื ืื ืืืื
18.11.2025 15:39 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0ืืื ืืช ืืืืื ื ืืขืืงืจืื? ืื ื ืฉืืข ืืฆืืืฆืื ืืืจืื ืฉืื ืืืฉ, ืืื, ืื ืืืคืช ืื ืืขืฆื ืขื ืืื ืื ืืืืืืง? ืื ื ืืืฉืืช ืื ืืืืื, ืืืื ืืืืชื ืฉืื ืื ืื ืืฉืชืคืจ ืืืื, ืืืืืื ืฉืืื ืื ืืืื ืื ืื ืืืื ืืืื ืฉืืงืืฆืจ ืืืืคื ืื ืืจืืจ ืขื ืคืืกืืจ.
18.11.2025 15:37 โ ๐ 0 ๐ 0 ๐ฌ 2 ๐ 0ืื ืืกืืืื ืืืืืืื ืืื ืื ืืืืช ืื ืืื ืืืคืจื. ืืืจื ืืคืืจืฉืช ืืืืืช ืฉื ืืฉืจืื ืืืชืงืืื ืืืฉืืช ืฆืืื ืืช, ืืฆื ืืืืื ืืช ืืขืจืืืืช ืืืื ืืคืืกืืื ืืช.
18.11.2025 15:11 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0ืืื ืืขืฆื, ืื ื ืจืื ืื ืงืฆืช ืืืืื ื ืืื ืกืื ืืืฉ ืืกืืจ. ืื ืืืขืื? ืื ืื ืืจืืง ืื ื ืืื?
18.11.2025 15:10 โ ๐ 5 ๐ 0 ๐ฌ 1 ๐ 0what's the latest-and-greatest attempt to reverse-engineer and document the inner-working of claude-code?
17.11.2025 10:23 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0(hmm i guess we can amend to "increase in the proportion of knowledge we believe to be true")
17.11.2025 07:00 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0i think memory is never "free", in the sense that the real bottleneck is not storage, but the ability to retrieve the right thing, while not retrieving a wrong (out of date) thing by mistake.
but assuming we do delete facts, is deleting considered learning in your definition?
is "increase" necessary? or is "change" enough? (although i guess that in an ideal form, you dont "forget" a wrong fact but add the fact that it is wrong, so you may consider it as increasing...)
16.11.2025 20:11 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0yes, following instructions in prompt is not learning. but if a wrapping systems stores items to inject in future prompts, then you can consider the system as learning.
16.11.2025 20:00 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0it will be in-context-induction, and the storing and retention from external memory would be learning.
16.11.2025 19:34 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0the storage, if it happens, is the learning part. the inference process is not learning.
16.11.2025 19:14 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0or as i wrote two years ago:
gist.github.com/yoavg/59d174...
i dont think it is a very useful view. at a very minimum we see extremely elaborate neighbor-matching and interpolation mechanisms, so the "glorified" part should be elaborated on and studied.
16.11.2025 17:25 โ ๐ 0 ๐ 0 ๐ฌ 2 ๐ 0i agree, where is "storing" in the above case?
16.11.2025 17:04 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0ah, cool!
16.11.2025 15:54 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0indeed kNN is also not learning. its just a classification method. if you want to consider kNN as a learning method, then the learning part is just "store these pairs as is".
16.11.2025 15:54 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0i am not sure that it is (or rather, if everything is retrieval, then this term is useless)
16.11.2025 15:20 โ ๐ 0 ๐ 0 ๐ฌ 2 ๐ 0if we want to study the phenomena, a non-misleading name may be better than a misleading one
16.11.2025 15:10 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0to me "learning" *requires* that something is stored for later use. again i dont care *where* it is stored, but *that* it is stored.
16.11.2025 15:09 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0this term is more accurate, but does not help in what i have in mind, which is to have a better name to the process that happens in ICL. some suggested "induction", which is OK but also not perfect (because the model both induces and applies).
16.11.2025 15:07 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0