Arjun Guha @guha-anderson.com

I’ll take this opportunity to say that UMass Amherst is amazing in many ways. But, their “8x #1 in dining by Princeton Review” dining halls are not as good as Mount Holyoke dining.

26.07.2025 23:36 — 👍 2 🔁 0 💬 2 📌 0

I still don’t understand why anyone cares about food quality. How else is a 21 year old boy to learn to cook if the dining hall food doesn’t suck?

26.07.2025 20:49 — 👍 3 🔁 0 💬 2 📌 0

I know folks in college PR and I heard when a Reddit thread about my exam was reported up to the college president.

25.07.2025 20:27 — 👍 5 🔁 0 💬 0 📌 0

I learned things about containers I didn’t know from her containers book. Very annoying. I always get annoyed the more I learn about containers, independent of source.

25.07.2025 15:08 — 👍 1 🔁 0 💬 0 📌 0

New zine: Bite Size Command Line! New zine: Bite Size Command Line!

jvns.ca/blog/2018/08...

24.07.2025 20:00 — 👍 3 🔁 0 💬 1 📌 0

1850’s baseball.

22.06.2025 18:37 — 👍 2 🔁 0 💬 0 📌 0

The recent *Your Brain on ChatGPT* paper seems cool.

When a ugrad approaches me to do research, I still have them read a prefix of PLAI (1st ed.) and demonstrate that they understand it.

I wonder what would happen if I asked them to self-study with an LLM exclusively. Has anyone tried this?

19.06.2025 18:11 — 👍 0 🔁 0 💬 0 📌 0

Now that I've written more Python than I care to admit, I'm getting tired of duplicating abstractions: one for sync code and another for async. Effect polymorphism wanted.

21.05.2025 16:06 — 👍 1 🔁 0 💬 0 📌 0

Looking forward to this.

15.04.2025 19:13 — 👍 0 🔁 0 💬 0 📌 0

Are you hiring new grads (BS) for this kind of work? I can suggest some people.

29.03.2025 07:01 — 👍 0 🔁 0 💬 1 📌 0

Water for Boston, part 3 - Lost Towns of the Swift River Valley: Drowned by the Quabbin, with Elena Palladino (episode 322) - HUB History: Boston history podcast This week, we’re speaking with Elena Palladino, the author of the recent book Lost Towns of the Swift River Valley: Drowned by the Quabbin. This book outlines the 20th century development of Boston’s...

In this episode of HUB History, Elena Palladino discusses the creation of the Quabbin Reservoir, the four towns that were sacrificed for its construction, and her book Lost Towns of the Swift River Valley. Listen now!

www.hubhistory.com/episodes/wat...

10.03.2025 13:55 — 👍 2 🔁 3 💬 0 📌 0

I distinctly remember the moment in grad school when I realized I was not going to learn any more PL by taking classes. I fell bad for an instant, and then moved on.

08.03.2025 13:50 — 👍 3 🔁 0 💬 1 📌 0

What does “PhD-level” AI mean? OpenAI’s rumored $20,000 agent plan explained. Silicon Valley may value imperfect virtual PhDs more than universities pay real ones.

There seems to be a fundamental misunderstanding here. I don't think PhD students complete assigned tasks.

arstechnica.com/ai/2025/03/w...

08.03.2025 13:47 — 👍 6 🔁 0 💬 2 📌 0

Yes. Still there. Also the pinball machine, the PDP, and @shriram.bsky.social .

05.03.2025 21:07 — 👍 4 🔁 0 💬 1 📌 0

Photo taken today at @browncsdept.bsky.social. I'm glad to see that the PhD students (@genevievemp.bsky.social), furniture, and faculty seem to have not changed in 10+ years.

05.03.2025 20:06 — 👍 4 🔁 0 💬 1 📌 0

Andrew Barto and Richard Sutton are the recipients of the 2024 ACM A.M. Turing Award for developing the conceptual and algorithmic foundations of reinforcement learning. Andrew Barto and Richard Sutton as the recipients of the 2024 ACM A.M. Turing Award for developing the conceptual and algorithmic foundations of reinforcement learning. In a series of papers beginning...

Congrats to Andy and Rich. A well-deserved recognition of their work and reinforcement learning in general!

awards.acm.org/about/2024-t...

05.03.2025 15:25 — 👍 15 🔁 1 💬 0 📌 0

GitHub - deepseek-ai/3FS: A high-performance distributed file system designed to address the challenges of AI training and inference workloads. A high-performance distributed file system designed to address the challenges of AI training and inference workloads. - GitHub - deepseek-ai/3FS: A high-performance distributed file system design...

The real lesson from DeepSeek is the importance of good old-fashioned computer science. Every day this week, they've been doing open source releases. The latest is their in-house distributed file system. github.com/deepseek-ai/...

28.02.2025 10:07 — 👍 16 🔁 4 💬 1 📌 0

I think language devs can help in a few ways. Benchmarking is the easiest for us to do and necessary to guide LLM development. I’ve been meaning to writeup my experience being only PL person in the room for the StarCoder LLM development process. It was very informative.

26.02.2025 22:16 — 👍 12 🔁 0 💬 2 📌 0

Please help amplify ARBOR, a fantastic new research opportunity! If you’d like to start contributing, NDIF is now hosting DeepSeek R1 8B and 70B, open for all researchers to experiment on via our API.

Sign up for API access here: login.ndif.us

20.02.2025 22:35 — 👍 4 🔁 3 💬 0 📌 0

Or, ask these products to write a 2 page ICFP workshop paper in one’s area of expertise. OK if it’s incremental, just has to be novel for 2025 and clearly positioned wrt related work. I know PhD students who can do this.

12.02.2025 22:05 — 👍 0 🔁 0 💬 0 📌 0

PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models

Our tech report has more fun examples of short prompts that make reasoning models crunch for several minutes or longer: khoury.northeastern.edu/~arjunguha/m...

04.02.2025 02:37 — 👍 2 🔁 0 💬 0 📌 0

Puzzle Reasoning Challenge - a Hugging Face Space by nuprl Discover amazing ML apps made by the community

If you want to read some deranged thoughts from frustrated models (R1 and Gemini Thinking), check them out here: huggingface.co/spaces/nuprl...

04.02.2025 02:37 — 👍 2 🔁 0 💬 1 📌 0

We believe our benchmark is out-of-domain for DeepSeek-style models: RL with verifiable rewards on math and programming. It’s remarkable that they generalize to this type of verbal reasoning. But, perhaps there are limits to what can be done with verifiable rewards exclusively.

04.02.2025 02:37 — 👍 2 🔁 0 💬 1 📌 0

However, many problems are so hard that reasoning models “give up” – they output solutions that they know are wrong or argue that the problem is impossible to solve. In some cases, R1 gets stuck “thinking forever”. (See this example of R1 getting “frustrated.”)

04.02.2025 02:37 — 👍 2 🔁 0 💬 1 📌 0

Our benchmark reveals capability gaps and failure modes that are not evident in existing benchmarks. E.g., we find that o1 is significantly better at these tasks than other reasoning models.

04.02.2025 02:37 — 👍 2 🔁 0 💬 1 📌 0

In short, we turn the weekly puzzles from the NPR Sunday Puzzle Challenge into a machine-checkable benchmark. These are hard problems, typically solved by a few hundred people a week. But, the answers are obvious when revealed (to U.S. adults).

04.02.2025 02:37 — 👍 3 🔁 0 💬 1 📌 0

O1, R1, etc. are so good that we evaluate them on “PhD-level” benchmarks. But, these benchmarks are so hard that most people can’t even understand what they are testing. We’ve built a benchmark with problems that are hard to solve but easy to verify: for both humans and models.

04.02.2025 02:37 — 👍 9 🔁 4 💬 1 📌 1

Last one: there are a LOT of people to blame for this one. I think @jasvir.bsky.social is to blame for this problem in "Humanity's Last Exam".

28.01.2025 18:29 — 👍 6 🔁 0 💬 2 📌 0

Ugh, who did this? @joepolitz.bsky.social ? Wait, was it @dbp.bsky.social ? Someone else from @shriram.bsky.social's group?

Also from "Humanity's Last Exam".

28.01.2025 15:50 — 👍 5 🔁 0 💬 2 📌 1

OK, who is responsible for this? Is it @natefoster.bsky.social?

Source: "Humanity's Last Exam" www.nytimes.com/2025/01/23/t...

28.01.2025 15:47 — 👍 5 🔁 0 💬 1 📌 0

Arjun Guha

Latest posts by guha-anderson.com on Bluesky

@guha-anderson.com is following 20 prominent accounts