BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Large language models (LLMs) have recently shown strong performance on mathematical benchmarks. At the same time, they are prone to hallucination and sycophancy, often providing convincing but flawed ...
Sycophancy in bots is an inimical part of AI in Teaching and Learning. When the bot wants to tell you that you are right, high dependence almost certainly means you will inculcate incorrect knowledge. Love papers like this who explore sycophancy in the discipline arxiv.org/abs/2510.04721
14.10.2025 18:24 β π 0 π 0 π¬ 0 π 0
I also brought them up multiple times, to multiple students in class and in reflection feedback.
13.10.2025 20:50 β π 1 π 0 π¬ 0 π 0
I was surprisingly disappointed (after I library reserved and even bought landmark SciFi books on AI for suggested readings in my AI in Teaching and Learning Course) when not a single student bothered with Murderbot, Moon is a Harsh Mistress or others. Need to assign them next time. :(
13.10.2025 20:50 β π 4 π 0 π¬ 1 π 0
How well does it work when prompted only to assess or evaluate the root cause? Is tuning or exemplar code helpful at all? Are the edge cases simply so varied that an LLM as root causal evaluator is doomed to fail?
13.10.2025 16:52 β π 0 π 0 π¬ 1 π 0
Try Copilot 365 if that is not sufficiently inaccurate or opaque for you. That $30/user/month fee pays for itself when your goal is to project existential dread into rage against a machine. Bonus points - it chides you for expletive use in emails :)
18.09.2025 17:24 β π 1 π 0 π¬ 0 π 0
Balatro is the answer
17.09.2025 19:15 β π 0 π 0 π¬ 0 π 0
what did it sound like?
16.09.2025 13:15 β π 1 π 0 π¬ 0 π 0
So, we can just keep on thinking Isak and Wirtz just wont work out for some reason, right???
01.09.2025 06:42 β π 0 π 0 π¬ 0 π 0
Love it. But I think you mispelled "Quadruple"
20.08.2025 18:49 β π 5 π 0 π¬ 0 π 0
1) AI claims fast learning; but learning is slow
2) Inauthentic or irrelevant work -> students outsourcing thinking work
3) We have MUCH less institutional infrastructure conveying value, authenticity and utility of what we teach
4) Chatbots do not care if we learn
5) AI detectors are horrid judges
19.08.2025 18:38 β π 1 π 0 π¬ 0 π 0
Our five big assumptions that shaped the week /
19.08.2025 18:35 β π 0 π 0 π¬ 0 π 0
19.08.2025 18:34 β π 1 π 0 π¬ 0 π 0
Purdue's AI Academy finished with 70+ instructors creating projects, plans, tools or critical approaches around and in response to AI. I was particularly enthused when multiple participants said "I thought I was gonna learn the tech, but I learned about learning"
19.08.2025 18:33 β π 1 π 0 π¬ 3 π 0
Today on the podcast: Study Hall! @leaton01.bsky.social @michellemillerphd.bsky.social and @thedavenelson.bsky.social and I discuss three recent studies exploring the intersection of AI and teaching. Cognitive offloading, chatbot sycophancy, & more! intentionalteaching.buzzsprout.com/2069949/epis...
19.08.2025 17:22 β π 6 π 3 π¬ 0 π 0
Democratizing prompt for LLMs:
Read and review new terms of service for X company. Compare and contrast with previous versions. What should I be aware of? What might any consumer be wary of or concerned about?
17.08.2025 15:09 β π 0 π 0 π¬ 0 π 0
Reminds me of the specter of internet throttling before net neutrality. Donβt want to pay us extra for our tech? Fine. You just might not like what you get. No transparency, varying quality of an information commodity on every use. Iβd guess an upswing in paid subs.
09.08.2025 16:58 β π 0 π 0 π¬ 0 π 0
Likely Microsoft wants to gain benefits of Chat but not replace any of its own software that could then hinder enterprise level negotiations
07.08.2025 17:41 β π 1 π 0 π¬ 0 π 0
I love this work. Biggest takeaways - AI Agents "often fail at effectively guiding students toward mastery" and "students prioritize scores over feedback, leading to off-task behavior that can hinder growth." Authenticity and relatedness are needed for these efforts. More humans in the process.
05.08.2025 16:35 β π 0 π 0 π¬ 1 π 0
VArsity: Can Large Language Models Keep Power Engineering Students in Phase?
This paper provides an educational case study regarding our experience in deploying ChatGPT Large Language Models (LLMs) in the Spring 2025 and Fall 2023 offerings of ECE 4320: Power System Analysis a...
As models become tuned to specialized academic content, the gap between novice and expert ability to critically evaluate outputs will grow. This paper demonstrates the change in student error recognition from GPT4 to o3. More longitudinal studies like this please. arxiv.org/abs/2507.20995
01.08.2025 16:55 β π 0 π 0 π¬ 0 π 0
Yep. The learning is definitely happening too slowly. Thatβs what little kids need for brain development. Speed. Jesus Christ these people.
28.07.2025 16:37 β π 0 π 0 π¬ 0 π 0
I lived it - in the 7th week I realized it, begged and was granted a gracious W from the professor - but that was a very scary phone call
16.07.2025 15:50 β π 13 π 0 π¬ 1 π 0
Why can't we just have nice things! Why do our brains make us doubt this way?
15.07.2025 19:59 β π 0 π 0 π¬ 0 π 0
A Large Language Model-Based Digital Twin Patient System Enhances Clinical Questioning Skills in Medical Education: A Randomized Controlled Trial | IOVS | ARVO Journals
Increasingly convinced that simulation practice in clinical settings is one of the biggest "killer app" prospects for AI in education. Generate at scale, add nuance + complexity easily, personalize, etc.
iovs.arvojournals.org/article.aspx...
10.07.2025 14:48 β π 0 π 0 π¬ 0 π 0
Uses machine learning to study literary imagination, and vice-versa. Likely to share news about AI & computational social science / Sozialwissenschaft / η€ΎδΌη§ε¦
Information Sciences and English, UIUC. Distant Horizons (Chicago, 2019). tedunderwood.com
Complex beings as us humans can not be summarised in a few lines but I am here for #AI #climate #EUtech #EU_politics. Lecturer in AI & IT. Some posts in Dutch. πͺπΊ
Professor. Sociologist. NYTimes Opinion Columnist. Books: THICK, LowerEd. Forthcoming: 1)Black Mothering & Daughtering and 2)Mama Bears.
Beliefs: C.R.E.A.M. + the internet ruined everything good + bring back shame.
βIβm just here so I donβt get fined.β
EPFL Professor, Co-Director EPFL AI Center, digital epidemiologist, CH++
The AI community building the future!
β½οΈ writing and analysis, usually in the form of giant Arsenal long-reads. I also write for SCOUTED. Here's my newsletter:
https://billycarpenter.substack.com/
Cognitive neuroscientist at CSIRO
Current: Human-AI collaboration π€π€π
Previous: cognitive control and theta oscillations, non-instrumental information, curiosity, EEG
Other: Mind controlled video games π§ πΎπΉ
He/Him
Writer: johngasaway.substack.com (it's free!)
Author
Election of 1840 revisionist
Taught basketball analytics
linguist / cognitive scientist / writing professor obsessed with all things relating to language, learning, and the mind. And knitting and sewing and music (especially violin and ukulele). And cats. Also a UU. neurodivergent. she/they. ππππ /π€π€π
Asst. Professor @usouthflorida. GenAI in education. AR/VR. Algorithms. Computational Research. Digital Writing Technologies. UI/UX. Rhetoric, writing studies, technical communication.
Prof. of English at University of Kansas. Tech and culture. AI ethics. Critical AI literacy. Pandoraβs Bot on Substack. #aiethics #criticalailiteracy
Department Chair of Integrated Studies; Writing and Rhetoric, American Lit; Higher-Ed Pedagogy; OER advocate
Educator, author, consultant. Associate director at the University of Virginia Center for Teaching Excellence. Author of Intentional Tech from West Virginia University Press. Host of the Intentional Teaching podcast. Birder.
CS PhD candidate at Princeton. I study the societal impact of AI.
Website: cs.princeton.edu/~sayashk
Book/Substack: aisnakeoil.com
Professor at Wharton, studying AI and its implications for education, entrepreneurship, and work. Author of Co-Intelligence.
Book: https://a.co/d/bC2kSj1
Substack: https://www.oneusefulthing.org/
Web: https://mgmt.wharton.upenn.edu/profile/emollick
Psychologist. Sr Assoc Director Teaching & Learning @simmonsuniversity.bsky.social. Author of SPARK OF LEARNING, HIVEMIND, & MIND OVER MONSTERS
Writes about emotions, teaching, brains. Enjoys speculative fiction, Halloween, and the sea. sarahrosecav.com
Researcher trying to shape AI towards positive outcomes. ML & Ethics +birds. Generally trying to do the right thing. TIME 100 | TED speaker | Senate testimony provider | Navigating public life as a recluse.
Former: Google, Microsoft; Current: Hugging Face
Assistant Director of Innovation at the University of Mississippi training faculty in AI literacy. Teaching, #OER, #OpenPedagogy, #AIED marcwatkins.org