Vidhisha Balachandran's Avatar

Vidhisha Balachandran

@vidhishab.bsky.social

AI Evaluation and Interpretability @MicrosoftResearch, Prev PhD @CMU.

693 Followers  |  107 Following  |  7 Posts  |  Joined: 19.11.2024  |  1.6605

Latest posts by vidhishab.bsky.social on Bluesky

All our evaluation logs and reasoning traces for open source models are now released! We hope this can be useful for the community for further research and analysis!

27.05.2025 19:20 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Eureka Inference-Time Scaling Insights: Where We Stand and What Lies Ahead - Microsoft Research Understanding and measuring the potential of inference-time scaling for reasoning. The new Eureka study tests nine state-of-the-art models on eight diverse reasoning tasks.

All Eureka inference-time scaling insights are now available here: www.microsoft.com/en-us/resear... It was fun sharing these and more together with Vidhisha Balachandran @vidhishab.bsky.social and Vibhav Vineet at #ICLR2025.

29.04.2025 15:36 β€” πŸ‘ 3    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image

πŸŽ‰The Phi-4 reasoning models have landed on HF and Azure AI Foundry. The new models are competitive and often outperform much larger frontier models. It is exciting to see the reasoning capabilities extend to more domains beyond math, including algorithmic reasoning, calendar planning, and coding.

01.05.2025 00:50 β€” πŸ‘ 21    πŸ” 8    πŸ’¬ 1    πŸ“Œ 1
Post image

Come see us in any of the following sessions on model understanding and evaluation! πŸ”¬ #ICLR2025 @msftresearch.bsky.social

24.04.2025 01:38 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

Our paper "Improving Instruction-Following in Language Models through Activation Steering” has been accepted to #ICLR2025!

We're also excited to share that our public GitHub repo is now live.
Code: github.com/microsoft/ll...
Camera-ready: arxiv.org/abs/2410.12877

15.04.2025 16:35 β€” πŸ‘ 7    πŸ” 2    πŸ’¬ 1    πŸ“Œ 2

πŸš€ Excited to share our new Eureka report!

We studied inference-time scaling across 9 models (conventional & reasoning) on 8 tough tasksβ€”from math & STEM reasoning to navigation, calendar planning, NP-hard problems & spatial planning.

Full Report: aka.ms/eureka-ml-in...

10.04.2025 20:46 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Asking the right questions can make or break decisions in fields like medicine, law, and beyond✴️
Our new framework ALFAβ€”ALignment with Fine-grained Attributesβ€”teaches LLMs to PROACTIVE seek information through better questions through **structured rewards**πŸ₯❓
(co-led with @jiminmun.bsky.social)
πŸ‘‰πŸ»πŸ§΅

21.02.2025 16:00 β€” πŸ‘ 24    πŸ” 7    πŸ’¬ 1    πŸ“Œ 3

Effective decision-making starts with asking the right questions. Our new framework, ALFA, teaches LLMs to ask questions through fine-grained attributes in expert domains.

Excited to see where this takes the next generation of effective LLM assistants and agents!

24.02.2025 22:26 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Excited to share our December updates on the state of progress in AI ! @msftresearch.bsky.social

Detailed report coming early next year ✨

15.12.2024 05:34 β€” πŸ‘ 8    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Stoked to share our new work on scaling training data attribution (TDA) toward LLM pretraining - and great insights we found along the way!

medium.com/people-ai-re... and more in the thread below from most excellent student researcher @tylerachang.bsky.social

14.12.2024 18:16 β€” πŸ‘ 12    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

Are you ready for an early Christmas present from our team at Microsoft Research?

Introducing the most powerful smol model ever built in the world!

Welcome to Phi-4! πŸ‘‡

13.12.2024 03:37 β€” πŸ‘ 12    πŸ” 1    πŸ’¬ 1    πŸ“Œ 1

The phi-4 technical report is now available on arxiv arxiv.org/abs/2412.08905 and on Azure AI. Congratulations to the phi team on the release and the major milestone on scaling data quality processes! πŸŽ‰ @msftresearch.bsky.social @sbubeck.bsky.social @suriyag.bsky.social @sytelus.bsky.social

13.12.2024 15:17 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Come talk to us about model evaluation! 4:30 pm today at West Meeting Room 301

Also to see @besmiranushi.bsky.social β€˜s cool demos 🍁

12.12.2024 00:08 β€” πŸ‘ 8    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

We will be presenting Eureka - our model evaluation framework and sharing in-depth insights at NeurIPS this week! Come join us on Wednesday (Dec 11) 4:30pm at West Meeting Room 301 to hear what we’ve been upto the past few months! :)

neurips.cc/Expo/Confere...

microsoft.github.io/eureka-ml-in...

09.12.2024 23:11 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

🚨 NeurIPS 2024 Spotlight
Did you know we lack standards for AI benchmarks, despite their role in tracking progress, comparing models, and shaping policy? 🀯 Enter BetterBench–our framework with 46 criteria to assess benchmark quality: betterbench.stanford.edu 1/x

25.11.2024 19:02 β€” πŸ‘ 139    πŸ” 25    πŸ’¬ 5    πŸ“Œ 7
Post image

Today is the International Day for the Elimination of Violence against Women. According to the UN, more than 50 000 women were killed by a partner or family member in 2023 news.un.org/en/story/202... This number is an underestimate given that only 37 countries reported in 2023.

26.11.2024 06:16 β€” πŸ‘ 8    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

Would love to be added as well!

22.11.2024 17:05 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Would love to be added thanks!

19.11.2024 02:46 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@vidhishab is following 20 prominent accounts