Shiwali Mohan @shiwali - Bluesky Profile

Shiwali Mohan

@shiwali.bsky.social

56 Followers | 42 Following | 10 Posts | Joined: 14.09.2023 | 1.6841

Latest posts by shiwali.bsky.social on Bluesky

Can Generative AI Support Patients' & Caregivers' Informational Needs? Towards Task-Centric Evaluation Of AI Systems Generative AI systems such as ChatGPT and Claude are built upon language models that are typically evaluated for accuracy on curated benchmark datasets. Such evaluation paradigms measure predictive an...

Its a preliminary study but it shows how we can make #AI #ML evaluations more informative; beyond benchmarks curated with minimal insights about what a useful questions is and what an appropriate answer looks like. (8/8)

📖 Paper: arxiv.org/abs/2402.00234
🎥Talk: drive.google.com/file/d/1m79W...

24.03.2025 19:27 — 👍 0 🔁 0 💬 0 📌 0

🤖 Measured how #GenAI systems did; not only in terms of correctness but also how similar they were to an expert answering the same question. (7/8)

24.03.2025 19:27 — 👍 0 🔁 0 💬 1 📌 0

📥 Curated an evaluation question set from observed interactions. The set contains real questions asked by participants as they were attempting to do a specific task. Such datasets are critical to measuring if an #AI system is producing responses that are useful. (6/8)

24.03.2025 19:27 — 👍 0 🔁 0 💬 1 📌 0

🤕👩‍⚕️ Studied how people interact with the expert if they were available. This uncovered specific needs people have as they make sense of data and also, how an expert addresses those needs. (5/8)

24.03.2025 19:27 — 👍 0 🔁 0 💬 1 📌 0

🏥 Identified a specific usecase in which people need support from an expert but the expert is not easily accessible; understanding medical scans and reports in order to make good decisions about your treatment. (4/8)

24.03.2025 19:27 — 👍 0 🔁 0 💬 1 📌 0

In our most recent paper, we explore an evaluation approach for #GenAI #GenerativeAI systems. Here are the steps we followed - (3/8)

24.03.2025 19:27 — 👍 0 🔁 0 💬 1 📌 0

As a science, we have to adopt rigorous evaluations that identify what #IntelligentSystem #AIAgent #Agent behavior should be & measure if it works as intended. Move beyond a 𝘱𝘳𝘰𝘣𝘭𝘦𝘮-𝘢𝘨𝘯𝘰𝘴𝘵𝘪𝘤 metric (accuracy) on a 𝘵𝘢𝘴𝘬-𝘢𝘨𝘰𝘯𝘴𝘵𝘪𝘤 benchmark. Adopt practices from #HCI, #psychology, #economics. (2/8)

24.03.2025 19:27 — 👍 0 🔁 0 💬 1 📌 0

#AI #ML evals measure accuracy on benchmarks, telling us how algorithms compare with each other. But, not much about how an #IntelligentSystem should be built. How do we make evals more informative? (1/8)

📖 Paper: arxiv.org/abs/2402.00234
🎥Talk: drive.google.com/file/d/1m79W...

24.03.2025 19:27 — 👍 0 🔁 0 💬 1 📌 0

An architecture showing how a planning agent can be extended with metareasoning.

At #AAAI2025? Looking for #AI #ML research beyond #GenAI hype & doom? Excited about #AI running on your laptop? Listen to my colleague Wiktor Piotrowski talk about #OpenWorldLearning #OWL at 9:30 am on Feb 28th (Journal Track).

arxiv.org/abs/2306.06272
#AIPlanning #MBR #CognitiveSystems #KRR

26.02.2025 20:14 — 👍 1 🔁 0 💬 0 📌 0

Wait, are the AnthropicAI people seriously claiming to “unlock a rich theoretical landscape” for AI evaluation by proposing the use of…. error bars? And this secret trove of deep statistical insight starts with “use the Central Limit Theorem”?

Befuddling

27.11.2024 22:09 — 👍 98 🔁 16 💬 11 📌 1

When your experiments show that your #AI is more human than humans, it is not that you have built #AGI or #SuperIntelligence, it is that you don't know how to evaluate, experiment, and measure.

27.11.2024 21:28 — 👍 2 🔁 0 💬 0 📌 0

@shiwali is following 19 prominent accounts

Upol Ehsan | hiring PhD students Fall'26
@upolehsan

🎯 Making AI less evil= human-centered + explainable + responsible AI 💼 Harvard Berkman Klein Fellow | CS Prof. @Northeastern | Data & Society 🏢 Prev-Georgia Tech, {Google, IBM, MSFT}Research 🔬 AI, HCI, Philosophy ☕ F1, memes 🌐 upolehsan.com

Sanjay Srivastava
@sanjaysrivastava.com

I’m a reasonable man, get off my case. sanjaysrivastava.com

Mimansa Jaiswal
@mimansaj

Robustness, Data & Annotations, Evaluation & Interpretability in LLMs http://mimansajaiswal.github.io/

Mila - Institut québécois d'IA
@mila-quebec

Le plus grand centre de recherche universitaire en apprentissage profond — The world's largest academic research center in deep learning.

Ian Goodfellow
@ian-goodfellow

Research Scientist at DeepMind. Opinions my own. Inventor of GANs. Lead author of http://www.deeplearningbook.org . Founding chairman of www.publichealthactionnetwork.org

Anil Ananthaswamy
@anilananth

Journalist with bylines in Nature, Quanta, Scientific American, New Scientist, and many more; former deputy news editor at New Scientist Author of 4 popular science books, including WHY MACHINES LEARN: The Elegant Math Behind Modern AI; TED speaker

Dr. Fei-Fei Li
@drfeifei

Prof (CS @Stanford), Co-Director @StanfordHAI, Cofounder/CEO @theworldlabs, CoFounder @ai4allorg #AI #computervision #robotics #AI-healthcare

Yoshua Bengio
@yoshuabengio

Working towards the safe development of AI for the benefit of all at Université de Montréal, LawZero and Mila. A.M. Turing Award Recipient and most-cited AI researcher. https://lawzero.org/en https://yoshuabengio.org/profile/

Gary Marcus
@garymarcus

AI and cognitive science, Founder and CEO (Geometric Intelligence, acquired by Uber). 8 books including Guitar Zero, Rebooting AI and Taming Silicon Valley. Newsletter (50k subscribers): garymarcus.substack.com

Thomas Dietterich
@tdietterich

Safe and robust AI/ML, computational sustainability. Former President AAAI and IMLS. Distinguished Professor Emeritus, Oregon State University. https://web.engr.oregonstate.edu/~tgd/

Jessica Hullman
@jessicahullman

Ginni Rometty Prof @NorthwesternCS | Fellow @NU_IPR | Uncertainty + decisions | Humans + AI/ML | Blog @statmodeling

Michael Katz
@ctpelok

AI Planning and SDM research @ IBM Research

Vahid Behzadan
@behzadan

Professionally curious about the science of making bad decisions; AI safety and security researcher; Assistant Professor of CS and Data Science & Director of the Secure and Assured Intelligent Learning (SAIL) lab @ University of New Haven.

@sachingrover

Arun
@arunbahl

Partner, friend, son, brother, dog dad. CEO at https://aloe.inc. Cognitive science and AI. Advocate for reason in both humans and machines. Specialization is for insects.

Christian Muise
@cjmuise

Assistant Professor @ Queen's University AI Planning ⋅ Dialogue Agents ⋅ Model Understanding 🔗 https://mulab.ai/ 🔗 https://haz.ca/ 📍Kingston, Ontario, Canada

Naman Shah
@naman-shah

Robotics postdoc @ Brown University

Pulkit Verma
@pulkitverma

Postdoc @csail.mit.edu, Ph.D. from @scai-asu.bsky.social Working on AI Safety, AI Assessment, Automated Planning, Interpretability, Robotics Previously: Masters from IITGuwahati, Research Intern at MetaAI https://pulkitverma.net

Charles Isbell
@isbellhfh

Computationalist. Professor. Provost. I also am and do a lot of other stuff.