David Nelson @thedavenelson

Invisible Saboteurs: Sycophantic LLMs Mislead Novices in Problem-Solving Tasks Sycophancy, the tendency of LLM-based chatbots to express excessive enthusiasm, agreement, flattery, and a lack of disagreement, is emerging as a significant risk in human-AI interactions. However, th...

This is an awesome small-scale look at how sycophantic LLMs lead learners astray in problem-solving. Next, I'd love to look at what type of sycophancy actually attracts students if given a choice of bots. arxiv.org/abs/2510.03667

15.10.2025 20:31 — 👍 2 🔁 0 💬 0 📌 0

BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs Large language models (LLMs) have recently shown strong performance on mathematical benchmarks. At the same time, they are prone to hallucination and sycophancy, often providing convincing but flawed ...

Sycophancy in bots is an inimical part of AI in Teaching and Learning. When the bot wants to tell you that you are right, high dependence almost certainly means you will inculcate incorrect knowledge. Love papers like this who explore sycophancy in the discipline arxiv.org/abs/2510.04721

14.10.2025 18:24 — 👍 0 🔁 0 💬 0 📌 0

I also brought them up multiple times, to multiple students in class and in reflection feedback.

13.10.2025 20:50 — 👍 1 🔁 0 💬 0 📌 0

I was surprisingly disappointed (after I library reserved and even bought landmark SciFi books on AI for suggested readings in my AI in Teaching and Learning Course) when not a single student bothered with Murderbot, Moon is a Harsh Mistress or others. Need to assign them next time. :(

13.10.2025 20:50 — 👍 4 🔁 0 💬 1 📌 0

How well does it work when prompted only to assess or evaluate the root cause? Is tuning or exemplar code helpful at all? Are the edge cases simply so varied that an LLM as root causal evaluator is doomed to fail?

13.10.2025 16:52 — 👍 0 🔁 0 💬 1 📌 0

TutorBench: A Benchmark To Assess Tutoring Capabilities Of Large Language Models As students increasingly adopt large language models (LLMs) as learning aids, it is crucial to build models that are adept at handling the nuances of tutoring: they need to identify the core needs of ...

Skeptical of most 3rd party IT company analyses of other software, but this SCALE AI paper outlines many of the limits of creating and evaluating "AI as Tutor" studies. Too many edge cases. arxiv.org/abs/2510.02663

06.10.2025 14:23 — 👍 0 🔁 0 💬 0 📌 0

Try Copilot 365 if that is not sufficiently inaccurate or opaque for you. That $30/user/month fee pays for itself when your goal is to project existential dread into rage against a machine. Bonus points - it chides you for expletive use in emails :)

18.09.2025 17:24 — 👍 1 🔁 0 💬 0 📌 0

Balatro is the answer

17.09.2025 19:15 — 👍 0 🔁 0 💬 0 📌 0

what did it sound like?

16.09.2025 13:15 — 👍 1 🔁 0 💬 0 📌 0

Investigating Student Interaction Patterns with Large Language Model-Powered Course Assistants in Computer Science Courses Providing students with flexible and timely academic support is a challenge at most colleges and universities, leaving many students without help outside scheduled hours. Large language models (LLMs) ...

Lots of guesses about student use of AI are turning out as expected: Students use them to solve/complete homework; they ignore LLM-posted questions guiding them to think deeper, single-prompts dominate interactions with little dynamic engagement.

arxiv.org/abs/2509.08862

15.09.2025 22:34 — 👍 1 🔁 0 💬 0 📌 0

So, we can just keep on thinking Isak and Wirtz just wont work out for some reason, right???

01.09.2025 06:42 — 👍 0 🔁 0 💬 0 📌 0

Using an LLM to Investigate Students' Explanations on Conceptual Physics Questions Analyzing students' written solutions to physics questions is a major area in PER. However, gauging student understanding in college courses is bottlenecked by large class sizes, which limits assessme...

Want more studies on small-level feedback mechanisms around disciplinary understandings. No need to have an LLM measure "everything". Start small, start with a need area.
arxiv.org/abs/2508.14823

21.08.2025 20:37 — 👍 0 🔁 0 💬 0 📌 0

Love it. But I think you mispelled "Quadruple"

20.08.2025 18:49 — 👍 5 🔁 0 💬 0 📌 0

1) AI claims fast learning; but learning is slow
2) Inauthentic or irrelevant work -> students outsourcing thinking work
3) We have MUCH less institutional infrastructure conveying value, authenticity and utility of what we teach
4) Chatbots do not care if we learn
5) AI detectors are horrid judges

19.08.2025 18:38 — 👍 1 🔁 0 💬 0 📌 0

Our five big assumptions that shaped the week /

19.08.2025 18:35 — 👍 0 🔁 0 💬 0 📌 0

19.08.2025 18:34 — 👍 1 🔁 0 💬 0 📌 0

Purdue's AI Academy finished with 70+ instructors creating projects, plans, tools or critical approaches around and in response to AI. I was particularly enthused when multiple participants said "I thought I was gonna learn the tech, but I learned about learning"

19.08.2025 18:33 — 👍 1 🔁 0 💬 3 📌 0

Today on the podcast: Study Hall! @leaton01.bsky.social @michellemillerphd.bsky.social and @thedavenelson.bsky.social and I discuss three recent studies exploring the intersection of AI and teaching. Cognitive offloading, chatbot sycophancy, & more! intentionalteaching.buzzsprout.com/2069949/epis...

19.08.2025 17:22 — 👍 6 🔁 3 💬 0 📌 0

Democratizing prompt for LLMs:

Read and review new terms of service for X company. Compare and contrast with previous versions. What should I be aware of? What might any consumer be wary of or concerned about?

17.08.2025 15:09 — 👍 0 🔁 0 💬 0 📌 0

Reminds me of the specter of internet throttling before net neutrality. Don’t want to pay us extra for our tech? Fine. You just might not like what you get. No transparency, varying quality of an information commodity on every use. I’d guess an upswing in paid subs.

09.08.2025 16:58 — 👍 0 🔁 0 💬 0 📌 0

Likely Microsoft wants to gain benefits of Chat but not replace any of its own software that could then hinder enterprise level negotiations

07.08.2025 17:41 — 👍 1 🔁 0 💬 0 📌 0

A Theory of Adaptive Scaffolding for LLM-Based Pedagogical Agents Large language models (LLMs) present new opportunities for creating pedagogical agents that engage in meaningful dialogue to support student learning. However, the current use of LLM systems like Chat...

Article

arxiv.org/abs/2508.01503

05.08.2025 16:35 — 👍 0 🔁 0 💬 0 📌 0

I love this work. Biggest takeaways - AI Agents "often fail at effectively guiding students toward mastery" and "students prioritize scores over feedback, leading to off-task behavior that can hinder growth." Authenticity and relatedness are needed for these efforts. More humans in the process.

05.08.2025 16:35 — 👍 0 🔁 0 💬 1 📌 0

VArsity: Can Large Language Models Keep Power Engineering Students in Phase? This paper provides an educational case study regarding our experience in deploying ChatGPT Large Language Models (LLMs) in the Spring 2025 and Fall 2023 offerings of ECE 4320: Power System Analysis a...

As models become tuned to specialized academic content, the gap between novice and expert ability to critically evaluate outputs will grow. This paper demonstrates the change in student error recognition from GPT4 to o3. More longitudinal studies like this please. arxiv.org/abs/2507.20995

01.08.2025 16:55 — 👍 0 🔁 0 💬 0 📌 0

Yep. The learning is definitely happening too slowly. That’s what little kids need for brain development. Speed. Jesus Christ these people.

28.07.2025 16:37 — 👍 0 🔁 0 💬 0 📌 0

LLM Agents for Education: Advances and Applications Large Language Model (LLM) agents have demonstrated remarkable capabilities in automating tasks and driving innovation across diverse educational applications. In this survey, we provide a systematic ...

It was relatively easy to keep up with the frontier AI models and a few open source clones. Agents are like electrical appliance manufacturers in the early twentieth century. What is actually useful? What is a niche product?
arxiv.org/abs/2503.11733

17.07.2025 15:56 — 👍 1 🔁 0 💬 0 📌 0

I lived it - in the 7th week I realized it, begged and was granted a gracious W from the professor - but that was a very scary phone call

16.07.2025 15:50 — 👍 13 🔁 0 💬 1 📌 0

Why can't we just have nice things! Why do our brains make us doubt this way?

15.07.2025 19:59 — 👍 0 🔁 0 💬 0 📌 0

Campus AI vs Commercial AI: A Late-Breaking Study on How LLM As-A-Service Customizations Shape Trust and Usage Patterns As the use of Large Language Models (LLMs) by students, lecturers and researchers becomes more prevalent, universities - like other organizations - are pressed to develop coherent AI strategies. LLMs ...

I've been focusing a lot on recently on trust and perceptions of trust + sycophancy + bot capability. Appreciate this preliminary approach for gauging Uni-owned bots vs Chat. arxiv.org/abs/2505.10490.

15.07.2025 19:03 — 👍 0 🔁 0 💬 0 📌 0

A Large Language Model-Based Digital Twin Patient System Enhances Clinical Questioning Skills in Medical Education: A Randomized Controlled Trial | IOVS | ARVO Journals

Increasingly convinced that simulation practice in clinical settings is one of the biggest "killer app" prospects for AI in education. Generate at scale, add nuance + complexity easily, personalize, etc.
iovs.arvojournals.org/article.aspx...

10.07.2025 14:48 — 👍 0 🔁 0 💬 0 📌 0

David Nelson

Latest posts by thedavenelson.bsky.social on Bluesky

@thedavenelson is following 20 prominent accounts