Nishant Subramani @ ACL's Avatar

Nishant Subramani @ ACL

@nsubramani23.bsky.social

PhD student @CMU LTI - working on model #interpretability, student researcher @google; prev predoc @ai2; intern @MSFT nishantsubramani.github.io

1,365 Followers  |  507 Following  |  23 Posts  |  Joined: 27.10.2023  |  2.4035

Latest posts by nsubramani23.bsky.social on Bluesky

Preview
Every Language Model Has a Forgery-Resistant Signature The ubiquity of closed-weight language models with public-facing APIs has generated interest in forensic methods, both for extracting hidden model details (e.g., parameters) and for identifying...

We discovered that language models leave a natural "signature" on their API outputs that's extremely hard to fake. Here's how it works πŸ”

πŸ“„ arxiv.org/abs/2510.14086 1/

17.10.2025 17:59 β€” πŸ‘ 87    πŸ” 24    πŸ’¬ 3    πŸ“Œ 6
Post image Post image Post image

At @colmweb.org all week πŸ₯―🍁! Presenting 3 mechinterp + actionable interp papers at @interplay-workshop.bsky.social

1. BERTology in the Modern World w/ @bearseascape.bsky.social
2. MICE for CATs
3. LLM Microscope w/ Jiarui Liu, Jivitesh Jain, @monadiab77.bsky.social

Reach out to chat! #COLM2025

06.10.2025 22:08 β€” πŸ‘ 9    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Excited to be attending NEMI in Boston today to present 🐁 MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools and co-moderate the model steering and control roundtable! Come find me to connect and chat about steering and actionable interp

22.08.2025 12:28 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

At #ACL2025 in Vienna πŸ‡¦πŸ‡Ή till next Saturday! Love to chat about anything #interpretability πŸ”Ž, understanding model internals πŸ”¬, and finding yummy vegan food πŸ₯¬

25.07.2025 21:53 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

At #ICML2025 πŸ‡¨πŸ‡¦ till Sunday! Love to chat about #interpretability, understanding model internals, and finding yummy vegan food in Vancouver πŸ₯¬πŸœ

14.07.2025 17:33 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Congrats πŸ₯³πŸ₯³πŸ₯³πŸ₯³

13.06.2025 19:08 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

🚨New #interpretability paper with @nsubramani23.bsky.social: πŸ•΅οΈ Model Internal Sleuthing: Finding Lexical Identity and Inflectional Morphology in Modern Language Models

04.06.2025 17:19 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 1

🚨 Check out our new #interpretability paper: πŸ•΅πŸ½ Model Internal Sleuthing led by the amazing @bearseascape.bsky.social who is an undergrad at @scsatcmu.bsky.social @ltiatcmu.bsky.social

04.06.2025 17:41 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Excited to announce that I started at @googleresearch.bsky.social on the cloud team as a student researcher last month working with Hamid Palangi on actionable #interpretability πŸ” to build better tool using #agents βš’οΈπŸ€–

02.06.2025 16:35 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Presenting this today at the poster session at #NAACL2025!

Come chat about interpretability, trustworthiness, and tool-using agents!

πŸ—“οΈ - Thursday May 1st (today)
πŸ“ - Hall 3
πŸ•‘ - 200-330pm

01.05.2025 15:28 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

At #NAACL2025 🌡till Sunday! Love to chat about interpretability, understanding model internals, and finding vegan food πŸ₯¬

30.04.2025 15:03 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Come to our poster in Albuquerque on Thursday 2-330pm in the interpretability & analysis section!

Paper: aclanthology.org/2025.naacl-l...
Code (coming soon): github.com/microsoft/mi...

🧡/🧡

29.04.2025 13:41 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

MICE 🐭:
🎯 - significantly beats baselines on expected tool-calling utility, especially in high risk scenarios
βœ… - matches expected calibration error of baselines
βœ… - is sample efficient
βœ… - generalizes zeroshot to unseen tools

5/🧡

29.04.2025 13:41 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Calibration is not sufficient: both an oracle and a model that just predicts the base rate are perfectly calibratedπŸ€¦πŸ½β€β™‚οΈ

We develop a new metric expected tool-calling utility πŸ› οΈto measure the utility of deciding whether or not to execute a tool call via a confidence score!

4/🧡

29.04.2025 13:41 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We propose 🐭 MICE to better assess confidence when calling tools:

1️⃣ decode from each intermediate layer of an LM
2️⃣ compute similarity scores between each layer’s generation and the final output.
3️⃣ train a probabilistic classifier on these features

3/🧡

29.04.2025 13:41 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

1️⃣ Tool-using agents need to be useful and safe as they take actions in the world
2️⃣ Language models are poorly calibrated

πŸ€” Can we use model internals to better calibrate language models to make tool-using agents safer and more useful?

2/🧡

29.04.2025 13:41 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸš€ Excited to share a new interp+agents paper: 🐭🐱 MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools appearing at #NAACL2025

This was work done @msftresearch.bsky.social last summer with Jason Eisner, Justin Svegliato, Ben Van Durme, Yu Su, and Sam Thomson

1/🧡

29.04.2025 13:41 β€” πŸ‘ 12    πŸ” 8    πŸ’¬ 1    πŸ“Œ 2

Congrats!!

24.04.2025 04:30 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Congrats! πŸ₯³

27.03.2025 03:10 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
A Test So Hard No AI System Can Pass It β€” Yet The creators of a new test called β€œHumanity’s Last Exam” argue we may soon lose the ability to create tests hard enough for A.I. models.

Have these people met … society? Read a book? Listened to music? Regurgitating esoteric facts isn’t intelligence.

This is more like humanity’s last stand at jeopardy

www.nytimes.com/2025/01/23/t...

25.01.2025 18:15 β€” πŸ‘ 50    πŸ” 13    πŸ’¬ 3    πŸ“Œ 2

πŸ‘πŸ½ looks good to me!

14.12.2024 01:27 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

πŸ‘πŸ½ Intro

πŸ’Ό PhD student @ltiatcmu.bsky.social

πŸ“œ My research is in model interpretability πŸ”Ž, understanding the internals of LLMs to build more controllable and trustworthy systems

🫡🏽 If you are interested in better understanding of language technology or model interpretability, let's connect!

10.12.2024 15:53 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸ™‹πŸ½

21.11.2024 14:25 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

πŸ™‹πŸ½

19.11.2024 14:45 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

1) I'm working on using intermediate model generations with LLMs to better calibrate tool using agents βš’οΈπŸ€– than the probabilities themselves! Turns out you can πŸ₯³

2) There's gotta be a nice geometric understanding of what's going on within LLMs when we tune them πŸ€”

18.11.2024 00:11 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Love to be added too!

17.11.2024 17:44 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Utah is hiring tenure-track/tenured faculty & a priority area is NLP!Β 

Please reach out over email if you have questions about the school and Salt Lake City, happy to share my experience so far.Β 

utah.peopleadmin.com/postings/154...

27.10.2023 17:48 β€” πŸ‘ 4    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

@nsubramani23 is following 20 prominent accounts