play this
www.puzzlescript.net/play.html?p=...
play this
www.puzzlescript.net/play.html?p=...
in the trailer, the mentions use the full names for "Mirror Isle" and "Skipping Stones to Lonely Homes", and "Heroes" for Heroes of Sokoban, so I'd be surprised if the intention is to hide anything.
14.12.2025 10:04 — 👍 1 🔁 0 💬 0 📌 0for the record i don't think language is "solved". the parts i cared about solving, though, are to a large extent "solved", to the extent that the remaining "non-solved" parts are imo not linguistic
13.12.2025 06:46 — 👍 2 🔁 0 💬 1 📌 0why?
12.12.2025 19:28 — 👍 0 🔁 0 💬 0 📌 0whats the difference in your view?
12.12.2025 17:17 — 👍 0 🔁 0 💬 0 📌 0i discuss this in the gist text. this is the more correct way to frame it imo (env provides observations, which agent interprets as rewards based on its goals), and it also opens up possible variations in how to think about learning from the env.
06.12.2025 00:44 — 👍 0 🔁 0 💬 0 📌 0
I complain a lot about RL lately, and here we go again.
The CS view of RL is wrong in how it thinks about rewards, already at the setup level. Briefly, the reward computation should be part of the agent, not part of the environment.
More at length here:
gist.github.com/yoavg/3eb3e7...
yes it sucks to be the ICLR organizers today, totally agree
28.11.2025 00:01 — 👍 2 🔁 0 💬 0 📌 0given that the data is already out and a large jsonl file is rumored to be floating around (which seems very plausible to me), i think the moral thing to do now would be to make the breached data publicly available for all rather than trying to hide it.
27.11.2025 23:32 — 👍 2 🔁 0 💬 1 📌 0
RL is ok. but the jump from
A) people can be thought of as agents who observe and environment, act, observe the outcome and update their beliefs
to:
B) lets model all things as a POMDP with a numeric reward function!
is just way too big for me
the fascinating (to me) quality of hard-core RL researchers (e.g Sutton) is the ability to have an all encompassing view of RL as the basis of intelligence, while at the same time working on super low level stuff like tabular TD algorithms, and yet strongly believe these are actually the same thing
27.11.2025 16:32 — 👍 20 🔁 0 💬 1 📌 0מסכים לגמרי
19.11.2025 02:07 — 👍 1 🔁 0 💬 0 📌 0והאמייל הזה (אני מניח שלמטרת צילומים רשמיים למטרה כשלהי של ארגון כלשהו) נתפס כצינזור? (אני פשוט מופתע כי אני לא הייתי חושב על זה)
18.11.2025 17:58 — 👍 0 🔁 0 💬 1 📌 0בתור אחד שרואה עצמו כעשוי לכתוב משהו כזה בטעות ולא מבין מה העניין, אשמח אם תוכלו להסביר מה כל כך מקפיץ פה?
18.11.2025 16:24 — 👍 1 🔁 0 💬 1 📌 0לא הבנתי מה הרפרנס לשטר של כסף וגם לא ראיתי את הירוק דווקא כרפרור לצהל.. אבל כאמור אולי זה כי אני באמת לא מעצב אז אני לא חושב במובנים האלו
18.11.2025 15:39 — 👍 3 🔁 0 💬 1 📌 0האם את מאמינה בעיקרון? כי נשמע מציוצים אחרים שלא ממש, ואז, מה אכפת לך בעצם עד כמה זה מדוייק? אני אישית כן מאמין, ואכן הייתי שמח אם זה ישתפר להבא, ומאמין שאכן כך יהיה כי זה כולה בולט שמקוצר באופן לא ברור על פוסטר.
18.11.2025 15:37 — 👍 0 🔁 0 💬 2 📌 0לא הסלוגן האידאלי אבל גם באמת לא כזה מופרך. הכרה מפורשת בזכות של ישראל להתקיים כישות ציונית, לצד המדינות הערביות כולל הפלסטינית.
18.11.2025 15:11 — 👍 3 🔁 0 💬 1 📌 0כלא מעצב, זה נראה לי קצת חובבני אבל סהכ ממש בסדר. מה הבעיה? מה זה ירוק לא נכון?
18.11.2025 15:10 — 👍 5 🔁 0 💬 1 📌 0what's the latest-and-greatest attempt to reverse-engineer and document the inner-working of claude-code?
17.11.2025 10:23 — 👍 1 🔁 0 💬 0 📌 0(hmm i guess we can amend to "increase in the proportion of knowledge we believe to be true")
17.11.2025 07:00 — 👍 0 🔁 0 💬 1 📌 0
i think memory is never "free", in the sense that the real bottleneck is not storage, but the ability to retrieve the right thing, while not retrieving a wrong (out of date) thing by mistake.
but assuming we do delete facts, is deleting considered learning in your definition?
is "increase" necessary? or is "change" enough? (although i guess that in an ideal form, you dont "forget" a wrong fact but add the fact that it is wrong, so you may consider it as increasing...)
16.11.2025 20:11 — 👍 1 🔁 0 💬 1 📌 0yes, following instructions in prompt is not learning. but if a wrapping systems stores items to inject in future prompts, then you can consider the system as learning.
16.11.2025 20:00 — 👍 0 🔁 0 💬 1 📌 0it will be in-context-induction, and the storing and retention from external memory would be learning.
16.11.2025 19:34 — 👍 1 🔁 0 💬 1 📌 0the storage, if it happens, is the learning part. the inference process is not learning.
16.11.2025 19:14 — 👍 0 🔁 0 💬 1 📌 0
or as i wrote two years ago:
gist.github.com/yoavg/59d174...
i dont think it is a very useful view. at a very minimum we see extremely elaborate neighbor-matching and interpolation mechanisms, so the "glorified" part should be elaborated on and studied.
16.11.2025 17:25 — 👍 0 🔁 0 💬 2 📌 0i agree, where is "storing" in the above case?
16.11.2025 17:04 — 👍 0 🔁 0 💬 0 📌 0ah, cool!
16.11.2025 15:54 — 👍 0 🔁 0 💬 0 📌 0indeed kNN is also not learning. its just a classification method. if you want to consider kNN as a learning method, then the learning part is just "store these pairs as is".
16.11.2025 15:54 — 👍 1 🔁 0 💬 0 📌 0