All our evaluation logs and reasoning traces for open source models are now released! We hope this can be useful for the community for further research and analysis!
27.05.2025 19:20 β π 0 π 0 π¬ 0 π 0@vidhishab.bsky.social
AI Evaluation and Interpretability @MicrosoftResearch, Prev PhD @CMU.
All our evaluation logs and reasoning traces for open source models are now released! We hope this can be useful for the community for further research and analysis!
27.05.2025 19:20 β π 0 π 0 π¬ 0 π 0All Eureka inference-time scaling insights are now available here: www.microsoft.com/en-us/resear... It was fun sharing these and more together with Vidhisha Balachandran @vidhishab.bsky.social and Vibhav Vineet at #ICLR2025.
29.04.2025 15:36 β π 3 π 2 π¬ 0 π 0πThe Phi-4 reasoning models have landed on HF and Azure AI Foundry. The new models are competitive and often outperform much larger frontier models. It is exciting to see the reasoning capabilities extend to more domains beyond math, including algorithmic reasoning, calendar planning, and coding.
01.05.2025 00:50 β π 21 π 8 π¬ 1 π 1Come see us in any of the following sessions on model understanding and evaluation! π¬ #ICLR2025 @msftresearch.bsky.social
24.04.2025 01:38 β π 1 π 1 π¬ 0 π 0Our paper "Improving Instruction-Following in Language Models through Activation Steeringβ has been accepted to #ICLR2025!
We're also excited to share that our public GitHub repo is now live. 
Code: github.com/microsoft/ll...
Camera-ready: arxiv.org/abs/2410.12877
π Excited to share our new Eureka report!
We studied inference-time scaling across 9 models (conventional & reasoning) on 8 tough tasksβfrom math & STEM reasoning to navigation, calendar planning, NP-hard problems & spatial planning.
Full Report: aka.ms/eureka-ml-in...
Asking the right questions can make or break decisions in fields like medicine, law, and beyondβ΄οΈ 
Our new framework ALFAβALignment with Fine-grained Attributesβteaches LLMs to PROACTIVE seek information through better questions through **structured rewards**π₯β
(co-led with @jiminmun.bsky.social)
ππ»π§΅
Effective decision-making starts with asking the right questions. Our new framework, ALFA, teaches LLMs to ask questions through fine-grained attributes in expert domains.
Excited to see where this takes the next generation of effective LLM assistants and agents!
Excited to share our December updates on the state of progress in AI ! @msftresearch.bsky.social 
Detailed report coming early next year β¨
Stoked to share our new work on scaling training data attribution (TDA) toward LLM pretraining - and great insights we found along the way! 
medium.com/people-ai-re...  and more in the thread below from most excellent student researcher @tylerachang.bsky.social
Are you ready for an early Christmas present from our team at Microsoft Research?
Introducing the most powerful smol model ever built in the world!
Welcome to Phi-4! π
The phi-4 technical report is now available on arxiv arxiv.org/abs/2412.08905 and on Azure AI. Congratulations to the phi team on the release and the major milestone on scaling data quality processes! π @msftresearch.bsky.social @sbubeck.bsky.social @suriyag.bsky.social @sytelus.bsky.social
13.12.2024 15:17 β π 4 π 1 π¬ 0 π 0Come talk to us about model evaluation! 4:30 pm today at West Meeting Room 301
Also to see @besmiranushi.bsky.social βs cool demos π
We will be presenting Eureka - our model evaluation framework and sharing in-depth insights at NeurIPS this week! Come join us on Wednesday (Dec 11) 4:30pm at West Meeting Room 301 to hear what weβve been upto the past few months! :)
neurips.cc/Expo/Confere...
microsoft.github.io/eureka-ml-in...
π¨ NeurIPS 2024 Spotlight
Did you know we lack standards for AI benchmarks, despite their role in tracking progress, comparing models, and shaping policy? π€― Enter BetterBenchβour framework with 46 criteria to assess benchmark quality: betterbench.stanford.edu 1/x
Today is the International Day for the Elimination of Violence against Women. According to the UN, more than 50 000 women were killed by a partner or family member in 2023 news.un.org/en/story/202... This number is an underestimate given that only 37 countries reported in 2023.
26.11.2024 06:16 β π 8 π 3 π¬ 0 π 0Would love to be added as well!
22.11.2024 17:05 β π 1 π 0 π¬ 0 π 0Would love to be added thanks!
19.11.2024 02:46 β π 1 π 0 π¬ 0 π 0