's Avatar

@anishathalye.bsky.social

46 Followers  |  3 Following  |  19 Posts  |  Joined: 18.11.2024  |  1.5928

Latest posts by anishathalye.bsky.social on Bluesky

If you have suggestions for topics to cover in the next iteration of the course, please share them in this thread!

05.08.2025 17:43 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Missing Semester Classes teach you all about advanced topics within CS, from operating systems to machine learning, but there’s one critical subject that’s rarely covered, and is instead left to students to figure out...

Lecture videos: www.youtube.com/@MissingSeme..., Notes: missing.csail.mit.edu

05.08.2025 17:43 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Missing Semester has grown past 100K subscribers on YouTube. Appreciate all the engagement and support!

We plan to teach another iteration of the course in January 2026, revising the curriculum and covering new topics like AI IDEs and vibe coding.

05.08.2025 17:42 β€” πŸ‘ 9    πŸ” 4    πŸ’¬ 2    πŸ“Œ 0
Preview
GitHub - anishathalye/neural-style: Neural style in TensorFlow! 🎨 Neural style in TensorFlow! 🎨. Contribute to anishathalye/neural-style development by creating an account on GitHub.

Incidentally, this is how I first got interested in ML. github.com/anishathalye...

21.06.2025 15:19 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

My favorite way to measure progress in AI: finding papers obsoleted by ChatGPT prompts

21.06.2025 15:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
GitHub - anishathalye/lumen: Magic auto brightness based on screen contents πŸ’‘ Magic auto brightness based on screen contents πŸ’‘. Contribute to anishathalye/lumen development by creating an account on GitHub.

Code/binary here: github.com/anishathalye...

17.06.2025 17:07 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Ever get blinded when writing code late at night and you alt-tab from your dark-mode terminal to your browser? Made this little macOS utility to solve this little problem, just updated for the latest macOS.

No thanks to AI for hallucinating BrightnessKit.framework.

17.06.2025 17:07 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
GitHub - cleanlab/aiuc-workshop: AI User Conference 2025 - Developer Day workshop AI User Conference 2025 - Developer Day workshop - GitHub - cleanlab/aiuc-workshop: AI User Conference 2025 - Developer Day workshop

We did a workshop at AIUC that: (1) implements a RAG app on top of Cursor's docs, (2) reproduces the widely-publicized failure from last week, and (3) shows how to automatically catch and reproduce this failure. All slides/code are open-sourced here: github.com/cleanlab/aiu... (5/5)

24.04.2025 18:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

What’s the solution? I believe that one ingredient will be intelligent systems that evaluate the output of these LLMs in real-time and keep them in check, building on and combining techniques like LLM-as-a-judge, using per-token logprobs, and statistical methods. (4/5)

24.04.2025 18:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Why do such failures occur? These next-token-prediction models are nondeterministic and can be fragile. And they’re not getting consistently better over timeβ€”OpenAI’s latest models like o3 and o4-mini show higher hallucination rates compared to previous versions. (3/5)

24.04.2025 18:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

It’s been over a year since the well-publicized failures of Air Canada’s support bot and NYC’s MyCity bot. And these AI’s are still failing spectacularly in production, with the most recent debacle being Cursor’s AI going rogue and triggering a wave of cancellations. (2/5)

24.04.2025 18:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We reproduced (and fixed!) Cursor’s rogue customer support AI. (1/5)

24.04.2025 18:20 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I wonder if there's anything special in the Cursor Tab completion model or system prompt that induces this behavior.

16.04.2025 22:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Coincidence, or genius growth hack? Cursor self-propagating through developer set-up instructions.

16.04.2025 22:02 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Real-Time Evaluation Models for RAG: Who Detects Hallucinations Best? A comprehensive benchmark of evaluation models to automatically catch incorrect responses across five RAG applications.

2/2
It works surprisingly well in practice.

cleanlab.ai/blog/rag-eva...

Hoping to see more of these real-time reference-free evaluations to give end users more confidence in the outputs of AI applications.

07.04.2025 23:04 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Is AI any good at evaluating AI? Is it turtles all the way down? We benchmarked evaluation models like LLM-as-a-judge, HHEM, Prometheus across 6 RAG applications. 1/2

07.04.2025 23:04 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

And some repos are even organically suggested by ChatGPT. (3/3)

17.02.2025 18:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Some of this might be through web search / tool use, but for at least some, knowledge about the projects is actually part of LLM model weights. (2/3)

17.02.2025 18:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

A substantial portion of traffic for some of my open-source projects comes from ChatGPT these days. Sometimes even a majority, beating traffic from Google. Time to prioritize LLM optimization over search engine optimization. (1/3)

17.02.2025 18:02 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@anishathalye is following 3 prominent accounts