Kanishka Misra

Kanishka Misra

@kanishka.bsky.social

Assistant Professor of Linguistics, and Harrington Fellow at UT Austin. Works on computational understanding of language, concepts, and generalization. 🕸️👁️: https://kanishka.website

2,528 Followers 288 Following 298 Posts Joined Jul 2023
13 hours ago
Post image

⚛️ Introducing CREATE, a benchmark for creative associative reasoning in LLMs.

Making novel, meaningful connections is key for scientific & creative works.

We objectively measure how well LLMs can do this. 🧵👇

35 11 2 4
1 day ago

Martin = one of the kindest people I know! Don’t miss this opportunity to learn from one of the best in their field!

3 0 1 0
1 day ago
Preview
Laboratory Coordinator - 138788 Laboratory Coordinator - 138788 | Careers at UC San Diego

I'm hiring a new lab manager for my lab @ UCSD! For more info on the lab, check out our website: lillab.ucsd.edu

Target start date is June 1 (flexible) and application deadline is March 26. Please share with anyone you think might be a good fit!

Apply here: employment.ucsd.edu/laboratory-c...

32 28 0 4
3 days ago
Post image

📢 PhD position in Developmental Language Modelling
(PLZ RT)

What can human language acquisition teach us about training language models? Join us as a PhD!
mpi.nl/career-education/vacancies/vacancy/fully-funded-4-year-phd-position-developmental-language @carorowland.bsky.social
@mpi-nl.bsky.social

23 33 1 2
3 days ago

Thanks to everyone who gave us feedback: @lampinen.bsky.social, Ellie Pavlick, @glupyan.bsky.social, @phillipisola.bsky.social, and others!

Work with Tianyang Xu, @mudtriangle.com, Karen Livescu, and Greg Shakhnarovich!

4 0 0 0
3 days ago

This relates more broadly to literature reconciling how meaning obtained from relational grounding in language interacts with that obtained from other forms of grounding (see Mollo and Millere/@raphaelmilliere.com) and lays out a research program on the role of category coherence in learning!

11/

2 0 1 0
3 days ago

This suggests that representations learned from language are structured so as to expect incoming category information to cohere in a specific way in order to show cross-modal generalization!

10/

2 0 1 0
3 days ago
Results from counterfactual shuffling experiments. Models tend to generalize equally well when the coherence was preserved and not so well when it was disrupted, even in the absence of all hypernyms.

If models were generalizing arbitrarily, then we shouldn’t see any differences in their performance across these settings (i.e., no matter what, crow == bird). However, we find that models seem to only generalize when the training data preserves category coherence!

9/

2 0 1 0
3 days ago
Macro F1 scores on unseen images vs. Visual coherence across the 53 hypernym categories for the Qwen3-1.7B backbone (at 100% ablation). r (Pearson’s correlation) = .43, indicating positive relation.

By coherence we mean the visual similarity between members of the same category, which we calculate using the DINOv2 embeddings used in our VLM training. Even in the original configuration, we found models to perform better on categories that were visually more coherent

8/

2 0 1 0
3 days ago
Examples of image-leaf mappings resulting from our counterfactual shuffles, in comparison with the original configuration (top). VC indicates the visual coherence of the category under the data configuration. VC for birds in the original set: .30; for within-category shuffles: .30; for across-category shuffle: .12.

To test this, we created counterfactual data: 1) where category-label pairings were shuffled across categories (🪛= “robin”; 🎸= “crow”) and 2) where they were shuffled within categories (🦅=“robin”; 🦜=“crow”). These swaps also manipulate the categories’ visual coherence

7/

2 0 1 0
3 days ago
figure depicting two hypotheses that models might entertain – 1) arbitrary prediction of hypernyms regardless of what the input looks like during supervision; 2) sensitivity to the fact that the category (e.g., birds) is not visually coherent.

Are LMs simply executing something like “If crow THEN bird?” regardless of what the image shows? E.g., if during supervision we label images of kayaks as “crow” would the model still generalize to birds or does the model expect categories to have some level of coherence?

6/

2 0 1 0
3 days ago
Main results (see fig 4 in the paper). Salient result: models tend to generalize to hypernyms without any evidence encountered during training, suggesting that they show cross-modal generalization.

Having established these preconditions to our task, we then find that models are also able to generalize (non-trivially) to hypernyms without ever having “seen” them explicitly, suggesting that LM representations support cross-modal generalization!

5/

2 0 1 0
3 days ago
left: Plot showing that models using the DINOv2 Encoder, which has never seen text information tend to generalize similar to those using the SigLIP encoder, which has seen text information. Right: Table showing that both Qwen3 LMs to demonstrate non-trivial hypernymy knowledge.

We establish that this paradigm works in the first place with a vision encoder that has never been trained on language data (i.e., ❌ SigLIP ✅DINO), that the models learn the task on the lower-level categories themselves, and that the LMs indeed have taxonomic knowledge

4/

3 0 1 0
3 days ago
3 papers on hypernym acquisition in models (Hearst, 1992; Geffet and Dagan, 2005) and humans (Wilson et al., 2023) - see paper for details.

Taxonomic knowledge is interesting because of number of hypotheses about the learnability of category knowledge from linguistic cues, for both computational models and humans. Evidence of cross-modal generalization would lend strong support for these hypotheses!

3/

2 0 1 0
3 days ago
Figure depicting an instance of our experiments. During training, the projector is deprived of explicit supervision on high-level categories (hypernyms, e.g., animal) at various amounts, and is trained to detect the presence (and absence) of lower-level categories (e.g., koala), keeping the image encoder and the LM backbone frozen. After training, the VLM is tested for generalization to hypernym categories, given previously unseen images.

We use a VLM-training paradigm (frozen vision encoder w/o language training mapped to frozen LM) where we partially supervise on lower level categories during training, and then test if the LM recovers hypernymy knowledge from what it has seen in language data.

2/

2 0 1 0
3 days ago
title section of the paper: “Cross-Modal Taxonomic Generalization in (Vision) Language Models” by Tianyang Xu, Marcelo Sandoval-Castañeda, Karen Livescu, Greg Shakhnarovich, Kanishka Misra.

What is the interplay between representations learned from (language) surface forms alone, and those learned from more grounded evidence (e.g.,vision)?

Excited to share new work understanding “Cross-modal taxonomic generalization” in (V)LMs

arxiv.org/abs/2603.07474

1/

33 12 1 0
3 days ago

I want to unwatch this

0 0 1 0
4 days ago
Post image

@tylerachang.bsky.social and I will be presenting the Goldfish as an oral at #LREC2026 in Mallorca! 🌴

16 4 1 0
6 days ago
Preview
The no-magic approach to understanding intelligent systems Today I want to write a bit about the philosophy I think underlies much of the work that my collaborators and I (as well as many other researchers that I respect) have done on understanding artificial...

Short post on what I call the "no-magic approach to understanding intelligent systems" — the philosophy I think of as motivating our work on understanding intelligence without resorting to magical thinking about AI or humans!
infinitefaculty.substack.com/p/the-no-mag...

32 5 1 1
1 week ago
Post image

🚨New Paper!🚨 How do reasoning LLMs handle inferences that have no deterministic answer? We find that they diverge from humans in some significant ways, and fail to reflect human uncertainty… 🧵(1/10)

55 20 3 1
1 week ago

Check out our special theme: new missions for NLP research!

12 5 1 1
1 week ago

What’s a paper that made you think that way 👀

1 0 1 0
1 month ago

I wrote a short article on AI Model Evaluation for the Open Encyclopedia of Cognitive Science 📕👇

Hope this is helpful for anyone who wants a super broad, beginner-friendly intro to the topic!

Thanks @mcxfrank.bsky.social and @asifamajid.bsky.social for this amazing initiative!

53 22 0 1
1 week ago

Congratulations Andreas!!

1 0 0 0
1 week ago

Some days you finish 5 meta-reviews in ~one go, and some days you take 1.5 days to complete one meta-review. Such is the AC life!

3 0 0 0
1 week ago

Woohoo, will be in touch soon!

1 0 0 0
1 week ago

Wow!! Good luck with whatever it is you do next — so excited for you!!

1 0 1 0
1 week ago

Watch slow horses already!!

0 0 0 0
1 week ago

Japonaise and Jahunger mentioned in same thread 😍 my fav places in Boston!

0 0 0 0
1 week ago
South by Semantics Workshop:
"New horizons in evaluating pragmatic competence in language models",  Jennifer Hu (Johns Hopkins University), March 6, 2026.

I'm looking forward to @jennhu.bsky.social's South by Semantics talk next week at UT Austin! She'll discuss "micro-pragmatics" inferences and world modeling in language models 🤖

8 2 1 0