๐ค Measured how #GenAI systems did; not only in terms of correctness but also how similar they were to an expert answering the same question. (7/8)
24.03.2025 19:27 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
๐ฅ Curated an evaluation question set from observed interactions. The set contains real questions asked by participants as they were attempting to do a specific task. Such datasets are critical to measuring if an #AI system is producing responses that are useful. (6/8)
24.03.2025 19:27 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
๐ค๐ฉโโ๏ธ Studied how people interact with the expert if they were available. This uncovered specific needs people have as they make sense of data and also, how an expert addresses those needs. (5/8)
24.03.2025 19:27 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
๐ฅ Identified a specific usecase in which people need support from an expert but the expert is not easily accessible; understanding medical scans and reports in order to make good decisions about your treatment. (4/8)
24.03.2025 19:27 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
In our most recent paper, we explore an evaluation approach for #GenAI #GenerativeAI systems. Here are the steps we followed - (3/8)
24.03.2025 19:27 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
As a science, we have to adopt rigorous evaluations that identify what #IntelligentSystem #AIAgent #Agent behavior should be & measure if it works as intended. Move beyond a ๐ฑ๐ณ๐ฐ๐ฃ๐ญ๐ฆ๐ฎ-๐ข๐จ๐ฏ๐ฐ๐ด๐ต๐ช๐ค metric (accuracy) on a ๐ต๐ข๐ด๐ฌ-๐ข๐จ๐ฐ๐ฏ๐ด๐ต๐ช๐ค benchmark. Adopt practices from #HCI, #psychology, #economics. (2/8)
24.03.2025 19:27 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
#AI #ML evals measure accuracy on benchmarks, telling us how algorithms compare with each other. But, not much about how an #IntelligentSystem should be built. How do we make evals more informative? (1/8)
๐ Paper: arxiv.org/abs/2402.00234
๐ฅTalk: drive.google.com/file/d/1m79W...
24.03.2025 19:27 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0
An architecture showing how a planning agent can be extended with metareasoning.
At #AAAI2025? Looking for #AI #ML research beyond #GenAI hype & doom? Excited about #AI running on your laptop? Listen to my colleague Wiktor Piotrowski talk about #OpenWorldLearning #OWL at 9:30 am on Feb 28th (Journal Track).
arxiv.org/abs/2306.06272
#AIPlanning #MBR #CognitiveSystems #KRR
26.02.2025 20:14 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
Wait, are the AnthropicAI people seriously claiming to โunlock a rich theoretical landscapeโ for AI evaluation by proposing the use ofโฆ. error bars? And this secret trove of deep statistical insight starts with โuse the Central Limit Theoremโ?
Befuddling
27.11.2024 22:09 โ ๐ 98 ๐ 16 ๐ฌ 11 ๐ 1
When your experiments show that your #AI is more human than humans, it is not that you have built #AGI or #SuperIntelligence, it is that you don't know how to evaluate, experiment, and measure.
27.11.2024 21:28 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0
๐ฏ Making AI less evil= human-centered + explainable + responsible AI
๐ผ Harvard Berkman Klein Fellow | CS Prof. @Northeastern | Data & Society
๐ข Prev-Georgia Tech, {Google, IBM, MSFT}Research
๐ฌ AI, HCI, Philosophy
โ F1, memes
๐ upolehsan.com
Iโm a reasonable man, get off my case.
sanjaysrivastava.com
Robustness, Data & Annotations, Evaluation & Interpretability in LLMs
http://mimansajaiswal.github.io/
Le plus grand centre de recherche universitaire en apprentissage profond โ The world's largest academic research center in deep learning.
Research Scientist at DeepMind. Opinions my own. Inventor of GANs. Lead author of http://www.deeplearningbook.org . Founding chairman of www.publichealthactionnetwork.org
Journalist with bylines in Nature, Quanta, Scientific American, New Scientist, and many more; former deputy news editor at New Scientist Author of 4 popular science books, including WHY MACHINES LEARN: The Elegant Math Behind Modern AI; TED speaker
Prof (CS @Stanford), Co-Director @StanfordHAI, Cofounder/CEO @theworldlabs, CoFounder @ai4allorg #AI #computervision #robotics #AI-healthcare
Working towards the safe development of AI for the benefit of all at Universitรฉ de Montrรฉal, LawZero and Mila.
A.M. Turing Award Recipient and most-cited AI researcher.
https://lawzero.org/en
https://yoshuabengio.org/profile/
AI and cognitive science, Founder and CEO (Geometric Intelligence, acquired by Uber). 8 books including Guitar Zero, Rebooting AI and Taming Silicon Valley.
Newsletter (50k subscribers): garymarcus.substack.com
Safe and robust AI/ML, computational sustainability. Former President AAAI and IMLS. Distinguished Professor Emeritus, Oregon State University. https://web.engr.oregonstate.edu/~tgd/
Ginni Rometty Prof @NorthwesternCS | Fellow @NU_IPR | Uncertainty + decisions | Humans + AI/ML | Blog @statmodeling
AI Planning and SDM research @ IBM Research
Professionally curious about the science of making bad decisions; AI safety and security researcher; Assistant Professor of CS and Data Science & Director of the Secure and Assured Intelligent Learning (SAIL) lab @ University of New Haven.
Partner, friend, son, brother, dog dad. CEO at https://aloe.inc. Cognitive science and AI. Advocate for reason in both humans and machines. Specialization is for insects.
Assistant Professor @ Queen's University
AI Planning โ
Dialogue Agents โ
Model Understanding
๐ https://mulab.ai/
๐ https://haz.ca/
๐Kingston, Ontario, Canada
Robotics postdoc @ Brown University
Postdoc @csail.mit.edu, Ph.D. from @scai-asu.bsky.social
Working on AI Safety, AI Assessment, Automated Planning, Interpretability, Robotics
Previously: Masters from IITGuwahati, Research Intern at MetaAI
https://pulkitverma.net
Computationalist. Professor. Provost. I also am and do a lot of other stuff.