Koyena Pal's Avatar

Koyena Pal

@koyena.bsky.social

CS Ph.D. Candidate @ Northeastern | Interpretability + Data Science | BS/MS @ Brown koyenapal.github.io

175 Followers  |  53 Following  |  14 Posts  |  Joined: 23.11.2024
Posts Following

Posts by Koyena Pal (@koyena.bsky.social)

Post image

In this amazing multidisciplinary collaboration, we report our early experience with the @openclaw-x.bsky.social ->

23.02.2026 23:32 β€” πŸ‘ 40    πŸ” 21    πŸ’¬ 1    πŸ“Œ 9

Good question! We measure CoT generalizability: can Model A's reasoning guide Model B to the same conclusion? Whether real reasoning or shared hallucination, both appear "consistent." We test if explanations generalize across models (good explainer), regardless of internal faithfulness.

23.01.2026 04:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Do explanations generalize across large reasoning models? Large reasoning models (LRMs) produce a textual chain of thought (CoT) in the process of solving a problem, which serves as a potentially powerful tool to understand the problem by surfacing a human-r...

What does this mean for RLHF and post-training?

Cross-model consistency could be a valuable training signal for developing more interpretable reasoning models.

See more:
πŸ“„ arXiv: arxiv.org/abs/2601.11517
🌐 Website: genex.baulab.info

Thanks to CBAI for pairing me with Chandan! πŸ™

🧡(7/7)

22.01.2026 21:58 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

What do humans actually prefer?

Participants rated CoTs on clarity, ease of following, and confidence. We find that Consistency >> Accuracy for predicting preferences. Humans trust explanations that multiple models agree on!

🧡(6/7)

22.01.2026 21:58 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Testing on MedCalc-Bench & Instruction-Induction (we added new tasks):
πŸ“ˆ Transfer & ensemble β†’ much higher consistency than baselines
⚠️ Model choice matters! The same transfer approach works great with Model A's CoT but poorly with Model B's.
Some models are better "explainers" than others.

🧡(5/7)

22.01.2026 21:58 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Practical implication: CoT Monitorability

Cross-model consistency provides a complementary signal: explanations that generalize across models may be more monitorable and trustworthy for oversight.

bsky.app/profile/ai-f...

🧡(4/7)

22.01.2026 21:58 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We evaluate 4 approaches:
πŸ”Ή Empty CoT: No/empty reasoning
πŸ”Ή Default: Model's own CoT
πŸ”Ή Transfer CoT: Model A's CoT β†’ Model B
πŸ”Ή Ensemble CoT: Combine multiple models' thoughts

We measure cross-model consistency: How often do model pairs reach the same answer (including same wrong answers)?

🧡(3/7)

22.01.2026 21:58 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Why does this matter? Faithfulness research shows CoT doesn't always reflect internal reasoning.

What if explanations serve a different purpose?

If Model B follows Model A's reasoning to the same conclusion, maybe these explanations capture something generalizable.

🧡(2/7)

22.01.2026 21:58 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
Post image

Can models understand each other's reasoning? πŸ€”

When Model A explains its Chain-of-Thought (CoT) , do Models B, C, and D interpret it the same way?

Our new preprint with @davidbau.bsky.social and @csinva.bsky.social explores CoT generalizability πŸ§΅πŸ‘‡

(1/7)

22.01.2026 21:58 β€” πŸ‘ 27    πŸ” 7    πŸ’¬ 1    πŸ“Œ 0
Post image

Can you solve this algebra puzzle? 🧩

cb=c, ac=b, ab=?

A small transformer can learn to solve problems like this!

And since the letters don't have inherent meaning, this lets us study how context alone imparts meaning. Here's what we found:πŸ§΅β¬‡οΈ

22.01.2026 16:09 β€” πŸ‘ 48    πŸ” 10    πŸ’¬ 2    πŸ“Œ 2
Post image

How can a language model find the veggies in a menu?

New pre-print where we investigate the internal mechanisms of LLMs when filtering on a list of options.

Spoiler: turns out LLMs use strategies surprisingly similar to functional programming (think "filter" from python)! 🧡

04.11.2025 17:48 β€” πŸ‘ 24    πŸ” 9    πŸ’¬ 1    πŸ“Œ 2
Post image

What's the right unit of analysis for understanding LLM internals? We explore in our mech interp survey (a major update from our 2024 ms).

We’ve added more recent work and more immediately actionable directions for future work. Now published in Computational Linguistics!

01.10.2025 14:03 β€” πŸ‘ 41    πŸ” 14    πŸ’¬ 2    πŸ“Œ 2
NEMI 2024 (Last Year)

NEMI 2024 (Last Year)

🚨 Registration is live! 🚨

The New England Mechanistic Interpretability (NEMI) Workshop is happening Aug 22nd 2025 at Northeastern University!

A chance for the mech interp community to nerd out on how models really work πŸ§ πŸ€–

🌐 Info: nemiconf.github.io/summer25/
πŸ“ Register: forms.gle/v4kJCweE3UUH...

30.06.2025 22:55 β€” πŸ‘ 10    πŸ” 8    πŸ’¬ 0    πŸ“Œ 1
Post image

[πŸ“„] Are LLMs mindless token-shifters, or do they build meaningful representations of language? We study how LLMs copy text in-context, and physically separate out two types of induction heads: token heads, which copy literal tokens, and concept heads, which copy word meanings.

07.04.2025 13:54 β€” πŸ‘ 76    πŸ” 19    πŸ’¬ 1    πŸ“Œ 6
Preview
Model Lakes Given a set of deep learning models, it can be hard to find models appropriate to a task, understand the models, and characterize how models are different one from another. Currently, practitioners re...

The other key tasks are model search and benchmarking, with important applications like document generation and auditing.

Read more in our paper (with @davidbau.bsky.social
and RenΓ©e Miller) here: arxiv.org/abs/2403.02327

Excited to share that this is accepted to #EDBT2025! πŸŽ‰

🧡5/5

05.03.2025 18:28 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The second one is model versioning β€” where the aim is to map a model’s position within a lake of models, capturing these relationships using directed model graphs.

Other tasks, like model tree heritage recovery and differentiating outputs from various LLMs, are part of model versioning.

🧡4/5

05.03.2025 18:28 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We see four major tasks for Model Lakes.

The first is model attribution β€” tracing & understanding a model's output through attack techniques like model inversion (recovering user inputs) and interpretability methods like reverse engineering to analyze model behavior.

bsky.app/profile/srus...

🧡3/5

05.03.2025 18:28 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Model Lake is a system containing numerous heterogenous pre-trained models and related data in their natural formats. This concept is inspired from data lakes, which collect raw, unstructured data at scale.

By addressing shared challenges across research, we can unlock meaningful solutions. πŸ‘‡

🧡2/5

05.03.2025 18:28 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Model Lakes Design. A model lake stores models and processes them using techniques, like inference, interpretability, weight-space modeling and indexing to support various user interactions. It generates outputs like version graphs, model cards and ranked models, refining them into human-readable results, as shown on the figure's right side.

Model Lakes Design. A model lake stores models and processes them using techniques, like inference, interpretability, weight-space modeling and indexing to support various user interactions. It generates outputs like version graphs, model cards and ranked models, refining them into human-readable results, as shown on the figure's right side.

πŸš€ How would you know what model to use? πŸ€—

With millions of models emerging rapidly, how do we verify, track, and find the right one?

We survey and formalize Model Lakes πŸŒŠπŸ€– β€” a framework to structure, navigate, and make sense of this landscape.

Website: lakes.baulab.info

#AI #Database

🧡1/5

05.03.2025 18:28 β€” πŸ‘ 9    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
PhD Apply - Khoury College of Computer Sciences

PhD Applicants: remember that the Northeastern Computer Science PhD application deadline is Dec 15.

It's a terrific time to do a PhD, with so many interesting things happening in AI.

Apply here:

www.khoury.northeastern.edu/apply/phd-ap...

07.12.2024 10:31 β€” πŸ‘ 33    πŸ” 5    πŸ’¬ 0    πŸ“Œ 0
Post image

More big news! Applications are open for the NDIF Summer Engineering Fellowshipβ€”an opportunity to work on cutting-edge AI research infrastructure this summer in Boston! πŸš€

10.12.2024 21:59 β€” πŸ‘ 9    πŸ” 6    πŸ’¬ 1    πŸ“Œ 2