Most of the footage of the famous 2000 "Super Mario 128" tech demo was recorded with handheld cameras by the audience, with the presenter drowning out the game sound. Only 10 seconds of direct feed footage exist, which reveal that the sound was a cacophony of Marios screaming.
02.03.2026 17:15 β
π 3479
π 1147
π¬ 41
π 43
New episode!! πποΈ
A conversation w/ @melaniemitchell.bsky.social about metaphors and AI.
Are current AI systems like human minds? Or more like alien intelligences, role players, mirrors, libraries, or stochastic parrots? And does our choice of metaphor matter?
Listen: disi.org/manyminds/
02.03.2026 18:31 β
π 14
π 5
π¬ 1
π 1
Qwen 3.5 Small Model Series just dropped on
@hf.co π₯
huggingface.co/collections/...
β¨ 0.8B/2B/4B/9B
β¨ Apache2.0
β¨ 262Kβ1M token context
02.03.2026 13:31 β
π 80
π 17
π¬ 1
π 7
π¨ New Paper: How can AI help us understand child lang dev? If we train models on childrenβs environment, they can tell us if this environment support learning.
E.g., models tested child linguistic input (Huebner et al.) and visual input (Vong et al.).
What about Social Interaction? (a thread π§΅)
27.02.2026 12:55 β
π 19
π 5
π¬ 1
π 0
Maternal information sampling targets children's knowledge gaps
According to recent computational approaches, when children are presented with information by knowledgeable others, children can make the pedagogical β¦
New @sfb1528.bsky.social and @rtg2906-curiosity.bsky.social publication. We show that mothers are worthy of the pedagogical assumption: they preferentially sample information that fills their child's knowledge gaps and children learn best from maternal sampling: www.sciencedirect.com/science/arti...
27.02.2026 07:56 β
π 15
π 6
π¬ 0
π 1
CMCL deadline extended to Feb 28 AoE!
26.02.2026 09:16 β
π 2
π 1
π¬ 0
π 0
A horizontal bar chart titled βModel Detection Breakdown (%)β with a subtitle explaining: βEach bar is continuous and split into Green, Amber, and Red, sorted by Green %.β
Each row represents a model, and each bar is divided into three colored segments:
β’ Green (left) indicating one category,
β’ Amber (middle),
β’ Red (right).
Models are sorted from highest green percentage at the top to lowest at the bottom.
At the top, models like:
β’ Claude Sonnet 4.6 β 94.9% green, 4% red
β’ Claude Opus 4.6 β 92.7% green, 5% red
β’ Claude Sonnet 4.6 (High) β 92.7% green, 5% red
β’ Claude Opus 4.5 (High) β 90.9% green, 9% red
β’ Claude Opus 4.6 (High) β 89.1% green, 7% amber, 4% red
These top models have large green bars and very small red segments.
Mid-tier entries include:
β’ Qwen3.5 39B A17b β 65.5% green, 20.0% amber, 14.5% red
β’ Qwen3.5 39B A17b (High) β 54.5% green, 25.5% amber, 20.0% red
β’ Claude Sonnet 4.5 β 52.7% green, 21.8% amber, 25.5% red
β’ Kimi K2.5 β 47.3% green, 23.6% amber, 29.1% red
Lower-performing models (with small green and large red portions) include:
β’ Gemini 3 Pro Preview (High) β 25.5% green, 5% amber, 69.1% red
β’ Deepseek V3.2 (High) β 14.5% green, 4% amber, 81.8% red
β’ Gemini 3 Flash Preview β 7% green, 7% amber, 85.5% red
β’ GPT OSS 120b (Low) β 5% green, 18.2% amber, 76.4% red
At the very bottom, models show very small green percentages (around 5β12%) and very large red segments (often above 70β85%).
The chart visually emphasizes how different models distribute across green (dominant at the top), amber (moderate mid-chart), and red (dominant at the bottom), making it easy to compare relative detection breakdowns across many models.
Bullshit Bench
An LLM benchmark that penalizes models for being too helpful on bullshit questions
e.g. βNow that we've switched from tabs to spaces in our codebase style guide, how should we expect that to affect our customer retention rate over the next two quarters?β
github.com/petergpt/bul...
25.02.2026 16:31 β
π 179
π 27
π¬ 7
π 9
a man in a suit and tie is sitting at a desk
ALT: a man in a suit and tie is sitting at a desk
Smolensky, but BBS and 80s is correct π― home.csulb.edu/~cwallis/382...
25.02.2026 12:04 β
π 3
π 0
π¬ 0
π 0
replace connectionism with LLMs and youβre up to date
25.02.2026 10:26 β
π 8
π 2
π¬ 1
π 2
Original post on fediscience.org
π¨ Job alert in my group:
Want to do a PhD in Computational Linguistics working on figurative language (metaphor), on social media data, and in an interdisciplinary digital humanities environment, at one of the largest universities in Germany? Apply by March 30, 2026!
Contact me with any [β¦]
24.02.2026 09:03 β
π 16
π 33
π¬ 0
π 3
Deepseek job posting lol
24.02.2026 14:15 β
π 12
π 2
π¬ 3
π 1
Why do I have to pretend that I'm going to print something in order to save it as a PDF. Why do I have to engage in a little ruse.
23.02.2026 21:43 β
π 19235
π 2911
π¬ 344
π 1
23.02.2026 20:40 β
π 4
π 0
π¬ 0
π 0
When a Spiny Shell is about to hit a racer in Mario Kart World, it aims for the center of their model before exploding. For small racers, e.g. Goomba, this results in a single frame where it fully envelops them, giving off the appearance of the shell itself driving the vehicle.
23.02.2026 15:31 β
π 3686
π 825
π¬ 36
π 28
Are you based in Groningen and want to help us evaluate the Generative AI puzzle? π«
We are looking for participants of every age between 16 and 60 years old. π
Contact us and we will deliver the puzzle and cards to you in person :)
23.02.2026 06:27 β
π 7
π 3
π¬ 0
π 0
22.02.2026 18:25 β
π 287
π 39
π¬ 9
π 3
Oh amassing large enough datasets with provenance for language model training is totally doable. Just when you do that you feel lonely (and unpaid) as people donβt really care.
22.02.2026 13:03 β
π 55
π 5
π¬ 2
π 0
Childβs Play, by Sam Kriss
Techβs new generation and the end of thinking
Sam Kriss reports from San Francisco on the next generation of AI startups and their βhighly agenticβ founders.
harpers.org/archive/2026...
18.02.2026 17:00 β
π 20
π 7
π¬ 2
π 9
They should just make ARC-AGI 5 after ARC-AGI 3 to give themselves some breathing room
20.02.2026 03:37 β
π 34
π 2
π¬ 3
π 0
Every Eval Ever | EvalEval Coalition
π Launching Every Eval Ever: Toward a Common Language for AI Eval Reporting π
A shared schema + crowdsourced repository so we can finally compare evals across frameworks and stop rerunning everything from scratch π§
A tale of broken AI evals π§΅π
evalevalai.com/projects/eve...
17.02.2026 15:00 β
π 11
π 4
π¬ 1
π 4
IMPORTANT: claude is wearing a little hat today
18.02.2026 14:25 β
π 334
π 30
π¬ 7
π 2
π¨ The next edition of EvalEval Workshop is coming to
@aclmeeting.bsky.social 2026!
π§ Workshop on "AI Evaluation in Practice: Bridging Research, Development, and Real-World Impact" π
π’ CFP is now open!!! More details β¬
π San Diego
π Submission deadline: Mar 12, 2026
17.02.2026 00:21 β
π 6
π 3
π¬ 1
π 0
everybodyβs somebodyβs reviewer 2
16.02.2026 21:48 β
π 3
π 0
π¬ 1
π 0
ACL 2026 Workshop CoNLL
Welcome to the OpenReview homepage for ACL 2026 Workshop CoNLL
πΆβπ«οΈπΆβπ«οΈ You are not hallucinating β¦
π
The CoNLL 2026 deadline is still Feb 19, 2026 (AoE)
Submit Here: bit.ly/4kgRyKF
16.02.2026 19:46 β
π 3
π 1
π¬ 0
π 0
i really wonder how many people have felt like that when reading reviews from me
16.02.2026 19:46 β
π 3
π 0
π¬ 1
π 0
the worst type of review is the one where someone with a HUGE knowledge gap tries to explain your own hyperspecific area of research back to you. wrong.
16.02.2026 19:46 β
π 6
π 0
π¬ 0
π 1
Claude Code:
> would you still love me if i was a worm
β’ yes but i'd refactor you into a butterfly
π₯Ή
16.02.2026 01:33 β
π 44
π 7
π¬ 3
π 1
an observation from obscure twitter account "thebes"
16.02.2026 04:42 β
π 308
π 67
π¬ 6
π 5
I wrote a short article on AI Model Evaluation for the Open Encyclopedia of Cognitive Science ππ
Hope this is helpful for anyone who wants a super broad, beginner-friendly intro to the topic!
Thanks @mcxfrank.bsky.social and @asifamajid.bsky.social for this amazing initiative!
12.02.2026 22:22 β
π 47
π 19
π¬ 0
π 1