brendan chambers's Avatar

brendan chambers

@societyoftrees.bsky.social

Ithaca | prev Chicago | interested in interconnected systems and humans+computers | currently: gardening

629 Followers  |  386 Following  |  161 Posts  |  Joined: 18.10.2023  |  1.9911

Latest posts by societyoftrees.bsky.social on Bluesky

Post image

working on a seven thousand layer model of extended claugenition

08.02.2026 22:46 β€” πŸ‘ 67    πŸ” 6    πŸ’¬ 5    πŸ“Œ 0
Preview
Are LLMs Smarter Than Chimpanzees? An Evaluation on Perspective Taking and Knowledge State Estimation Cognitive anthropology suggests that the distinction of human intelligence lies in the ability to infer other individuals' knowledge states and understand their intentions. In comparison, our closest ...

New work by my former PhD student, Boyang Li

His team produced 500 stories of less than 100 words. LLMs were basically chance-level at answering binary questions about the stories

arxiv.org/abs/2601.12410

04.02.2026 00:36 β€” πŸ‘ 117    πŸ” 15    πŸ’¬ 6    πŸ“Œ 14

This is a real banger of a paper. The example of a model being weirdly focused on jasmine (lol) makes me increasingly think that single-point-of-access models don't really consider who their audience is. Jasmine is a super legible cultural marker for people outside, but is so, _so_ generic.

03.02.2026 16:41 β€” πŸ‘ 12    πŸ” 4    πŸ’¬ 2    πŸ“Œ 0
Preview
From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence Can we learn more from data than existed in the generating process itself? Can new and useful information be constructed from merely applying deterministic transformations to existing data? Can the le...

This was a colossal multi-year effort driven by an incredible team that gave this everything: Marc Finzi, Shikai Qiu, Yiding Jiang, Pavel Izmailov, Zico Kolter. Much more in the paper! arxiv.org/abs/2601.03220 7/7

07.01.2026 17:27 β€” πŸ‘ 22    πŸ” 1    πŸ’¬ 1    πŸ“Œ 1
Preview
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning Large-scale autoregressive models pretrained on next-token prediction and finetuned with reinforcement learning (RL) have achieved unprecedented success on many problem domains. During RL, these model...

Well this is exciting: arxiv.org/abs/2512.20605

06.01.2026 19:53 β€” πŸ‘ 54    πŸ” 7    πŸ’¬ 1    πŸ“Œ 0

the reason I'd follow Cat Hicks into hell is this unswerving humanist conviction that actually

people are going to do the best they can

we can help them do even better

and neither avenue is served by thinking less of people

03.01.2026 23:13 β€” πŸ‘ 79    πŸ” 9    πŸ’¬ 3    πŸ“Œ 0
39C3 - From Silicon to Darude Sand-storm: breaking famous synthesizer DSPs
YouTube video by media.ccc.de 39C3 - From Silicon to Darude Sand-storm: breaking famous synthesizer DSPs

i think we are about to experience an explosion of the possibilities in reverse engineering

02.01.2026 19:38 β€” πŸ‘ 48    πŸ” 3    πŸ’¬ 2    πŸ“Œ 0

we’re at a fascinating moment where I am still ~better at programming than Claude at a medium-horizon difficulty task, but Claude has me absolutely beat in terms of cognitive fatigue so we’re able to ship so much more stuff I never would’ve gotten around to before

02.01.2026 20:56 β€” πŸ‘ 99    πŸ” 4    πŸ’¬ 2    πŸ“Œ 0

Great list of models in 2025 πŸ‘πŸ½

02.01.2026 17:14 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
arXiv AI/ML Catch-Up Was your New Year's resolution to keep up with arXiv AI/ML preprints? Browse the past week's new uploads in 30 mins.

I uh, made this. It was supposed to be a joke / concept-art thing that scrolls through the torrent of new AI/ML arXiv uploads too fast to read. But I think I iterated too much and made it almost usable.

01.01.2026 23:45 β€” πŸ‘ 79    πŸ” 13    πŸ’¬ 7    πŸ“Œ 3

Everyone’s favorite feed is running on one person’s gaming system. I love how hackable this site is, it makes it much more fun.

26.12.2025 18:47 β€” πŸ‘ 22    πŸ” 1    πŸ’¬ 3    πŸ“Œ 0

If you’re working on a non-fiction research/writing project that isn’t journalism and you don’t have an academic affiliation, how do you find other people who are doing the same thing? Ideally locally (I’m in NY).

22.12.2025 01:53 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
Owning group data Thinking about how communities can manage shared data on and off ATProto

local first vs atproto!! what should the source of truth for group data be?

20.12.2025 12:45 β€” πŸ‘ 37    πŸ” 11    πŸ’¬ 1    πŸ“Œ 1
Post image

I am late to the game but I finally read the NeurIPS 2025 best paper on gating in LLMs, it is great.

Qiu et al.
Alibaba, U Edinburg, Stanford, MIT, Tsinghua U
arxiv.org/abs/2505.06708

1/3

15.12.2025 16:42 β€” πŸ‘ 12    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

Olmo 3.1 is here. We extended our strongest RL run and scaled our instruct recipe to 32Bβ€”releasing Olmo 3.1 Think 32B & Olmo 3.1 Instruct 32B, our most capable models yet. 🧡

12.12.2025 17:14 β€” πŸ‘ 14    πŸ” 3    πŸ’¬ 1    πŸ“Œ 1

+1 for mentioning AI as structuralism

13.12.2025 02:24 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
New Talk: Building Olmo 3 Think Re-recording my NeurIPS talks in one mega-take.

Post on Interconnects: www.interconnects.ai/p/building-o...
Slides: docs.google.com/presentation...
YouTube: youtu.be/uaZ3yRdYg8A

10.12.2025 19:36 β€” πŸ‘ 6    πŸ” 2    πŸ’¬ 0    πŸ“Œ 1

i think i can officially say i preferred my arch linux desktop over macOS the best thing about macOS is the flow between computer, phone, airpods everything else feels like 10% off the mark and all these paper cuts don't feel good

07.12.2025 14:10 β€” πŸ‘ 25    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0
screenshot

screenshot

Built a little AT Protocol playground - a single HTML file that lets you watch the firehose, create records, and browse any repo with a dynamic form UI. Changes sync directly back to your PDS. #atproto

at.selem.im

06.12.2025 01:17 β€” πŸ‘ 74    πŸ” 10    πŸ’¬ 5    πŸ“Œ 0
A figure demonstrating the different aspects of the corpus described in the tweet. There is a main isomorphic 3D view of a level in the Portal 2 co-op game, with some portals, lasers, and the blue and orange players. Inset, there are first-person captures of the blue and orange player views. There is also a box containing the transcribed dialogue with timestamps and labels for the discursive acts. Finally, there is a box containing a task and a list of subtasks. Some subtasks are already crossed out, with the time that they have been completed. The last subtask ("Player 2 places portal 4 on wall 4") is marked incomplete.

The dialogue is as follows:

Blue: Can you put your other portal up here? (tagged as directive)
Orange: Where? (tagged as request for clarification)
Blue: On uh, on this wall. (tagged as directive)
Blue: So that it uh points at the circle. (tagged as directive)
Orange: Okay. (tagged as commit)

The full list of subtasks is:

Task: Redirect lasers
Subtask: Player 1 places portal 1 on wall 1. (completed)
Subtask: Player 1 polaces portal 2 on wall 2 or 3. (completed)
Subtask: Player 2 places portal 3 opposite of portal 2. (completed)
Subtask: Player 2 places portal 4 on wall 4. (incomplete)

A figure demonstrating the different aspects of the corpus described in the tweet. There is a main isomorphic 3D view of a level in the Portal 2 co-op game, with some portals, lasers, and the blue and orange players. Inset, there are first-person captures of the blue and orange player views. There is also a box containing the transcribed dialogue with timestamps and labels for the discursive acts. Finally, there is a box containing a task and a list of subtasks. Some subtasks are already crossed out, with the time that they have been completed. The last subtask ("Player 2 places portal 4 on wall 4") is marked incomplete. The dialogue is as follows: Blue: Can you put your other portal up here? (tagged as directive) Orange: Where? (tagged as request for clarification) Blue: On uh, on this wall. (tagged as directive) Blue: So that it uh points at the circle. (tagged as directive) Orange: Okay. (tagged as commit) The full list of subtasks is: Task: Redirect lasers Subtask: Player 1 places portal 1 on wall 1. (completed) Subtask: Player 1 polaces portal 2 on wall 2 or 3. (completed) Subtask: Player 2 places portal 3 opposite of portal 2. (completed) Subtask: Player 2 places portal 4 on wall 4. (incomplete)

A couple years (!) in the making: we’re releasing a new corpus of embodied, collaborative problem solving dialogues. We paid 36 people to play Portal 2’s co-op mode and collected their speech + game recordings.

Paper: arxiv.org/abs/2512.03381
Website: berkeley-nlp.github.io/portal-dialo...

1/n

05.12.2025 18:54 β€” πŸ‘ 102    πŸ” 30    πŸ’¬ 3    πŸ“Œ 8

Makes me wonder how viable a presentation app that uses SVGs as its native format would be.

LLMs tend to do fairly well with vector formats and it would solve the mutability problem here.

04.12.2025 04:41 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

I've been having a bunch of fan hacking on my Bluesky thread viewing HTML+JS app using Claude Code - here's a video demo of the most recent version, you can try it out here tools.simonwillison.net/bluesky-thre...

28.11.2025 19:24 β€” πŸ‘ 66    πŸ” 4    πŸ’¬ 5    πŸ“Œ 3

tldr of Andy’s back of envelope math: in Morrow County data centers may actually be accounting for only ~1% of local wastewater

26.11.2025 06:40 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
'The Precedent Is Flint': How Oregon's Data Center Boom Is Supercharging a Water Crisis Amazon data centers constructed in eastern Oregon's farmland have worsened a water pollution problem that’s been linked to cancer and miscarriages.

www.rollingstone.com/culture/cult...

26.11.2025 06:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

the local agriculture was highly polluting but drew water primarily from a river

while the data centers drew from the (poisoned) water table,

competing with residents for the deepest wells and sending outputs into a processing ponds that couldn’t handle the capacity

26.11.2025 06:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

so Morrow County Oregon seems to be an example of a drinking water crisis accelerating b/c of data center buildouts

the original crisis was caused by agriculture, but the scale of the issue worsened because of data center waste water handling

26.11.2025 06:03 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

⚠️ Update on Deep Research Tulu (DR Tulu), our post-training recipe for deep research agents: we’re releasing an upgraded version of our example agent, DR Tulu-8B (RL), that matches or beats systems like Gemini 3 Pro & Tongyi DeepResearch-30B-A3B on core benchmarks. 🧡

25.11.2025 19:37 β€” πŸ‘ 22    πŸ” 5    πŸ’¬ 1    πŸ“Œ 1
Post image

Test-time reasoning guidance: up to 66.7% improvement πŸ’‘

We scaffold cognitive structures from successful traces to guide reasoning.

Major gains on ill-structured problems🌟

Models possess latent capabilitiesβ€”they just don't deploy them adaptively without explicit guidance.

25.11.2025 18:25 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

We analyzed 1,598 LLM reasoning papers:

Research concentrates on easily quantifiable behaviorsβ€”sequential organization (55%), decomposition (60%)

Neglects meta-cognitive controls (8-16%) and alternative representations (10-27%) that correlate with success⚠️

25.11.2025 18:25 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

Our taxonomy bridges cognitive science β†’ LLM eval:

28 elements across 4 dimensionsβ€”reasoning invariants (compositionality, logical coherence), meta-cognitive controls (self-awareness), representations (hierarchical, causal), and operations (backtracking, verification)

25.11.2025 18:25 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

@societyoftrees is following 20 prominent accounts