James Michaelov @ NeurIPS2025🌴's Avatar

James Michaelov @ NeurIPS2025🌴

@jamichaelov.bsky.social

Postdoc at MIT. Research: language, the brain, NLP. jmichaelov.com

4,223 Followers  |  524 Following  |  40 Posts  |  Joined: 20.08.2023  |  1.8895

Latest posts by jamichaelov.bsky.social on Bluesky

Presenting this at the poster session this morning (11-2pm) at #5109

04.12.2025 18:42 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Looking forward to #NeurIPS25 this week 🏝️! I'll be presenting at Poster Session 3 (11-2 on Thursday). Feel free to reach out!

01.12.2025 22:12 β€” πŸ‘ 10    πŸ” 3    πŸ’¬ 0    πŸ“Œ 1

I'll also be presenting this paper with @catherinearnett.bsky.social
at #CogInterp!

25.11.2025 14:32 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Language Model Behavioral Phases are Consistent Across Architecture, Training Data, and Scale We show that across architecture (Transformer vs. Mamba vs. RWKV), training dataset (OpenWebText vs. The Pile), and scale (14 million parameters to 12 billion parameters), autoregressive language mode...

Preprint: www.arxiv.org/abs/2510.24963

25.11.2025 14:27 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Excited to announce that I’ll be presenting a paper at #NeurIPS this year! Reach out if you’re interested in chatting about LM training dynamics, architectural differences, shortcuts/heuristics, or anything at the CogSci/NLP/AI interface in general! #Neurips2025

25.11.2025 14:27 β€” πŸ‘ 25    πŸ” 2    πŸ’¬ 2    πŸ“Œ 1

I’m in Vienna all week for @aclmeeting.bsky.social and I’ll be presenting this paper on Wednesday at 11am (Poster Session 4 in HALL X4 X5)! Reach out if you want to chat about multilingual NLP, tokenizers, and open models!

27.07.2025 15:29 β€” πŸ‘ 18    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
Not quite Sherlock Holmes: Language model predictions do not reliably differentiate impossible from improbable events Can language models reliably predict that possible events are more likely than merely improbable ones? By teasing apart possibility, typicality, and contextual relatedness, we show that despite the re...

See the full paper here: arxiv.org/abs/2506.06808
3/3

12.06.2025 17:54 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

In the most extreme case, LMs assign sentences such as β€˜the car was given a parking ticket by the explorer’ (unlikely but possible event) a lower probability than β€˜the car was given a parking ticket by the brake’ (animacy-violating event, semantically-related final word) over half of the time. 2/3

12.06.2025 17:54 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

New paper accepted at ACL Findings! TL;DR: While language models generally predict sentences describing possible events to have a higher probability than impossible (animacy-violating) ones, this is not robust for generally unlikely events and is impacted by semantic relatedness. 1/3

12.06.2025 17:54 β€” πŸ‘ 21    πŸ” 3    πŸ’¬ 1    πŸ“Œ 1

My paper with @tylerachang.bsky.social and @jamichaelov.bsky.social will appear at #ACL2025NLP! The updated preprint is available on arxiv. I look forward to chatting about bilingual models in Vienna!

05.06.2025 14:18 β€” πŸ‘ 8    πŸ” 2    πŸ’¬ 1    πŸ“Œ 1
Post image

✨New pre-print✨ Crosslingual transfer allows models to leverage their representations for one language to improve performance on another language. We characterize the acquisition of shared representations in order to better understand how and when crosslingual transfer happens.

07.03.2025 16:34 β€” πŸ‘ 36    πŸ” 7    πŸ’¬ 2    πŸ“Œ 2
Home

I’ve had success using the infini-gram API for this (though it can get overloaded with user requests at times): infini-gram.io

08.02.2025 12:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I don’t think this is quite what you’re looking for, but @camrobjones.bsky.social recently ran some Turing-test-style studies and found that some people believed ELIZA to be a human (and participants were asked to give reasons for their responses)

03.12.2024 13:09 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

With all the new people here on Bluesky, I think it’s a good time to (re-)introduce myself. I’m a postdoc at MIT carrying out research at the intersection of the cognitive science of language and AI. Here are some of the things I’ve worked on in the last year 🧡:

10.11.2024 19:34 β€” πŸ‘ 25    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0

Seems like a great initiative to have some of these location-based ones! I’d love to be added if possible!

19.11.2024 16:17 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Excited to be at #EMNLP #EMNLP2024 this year! Especially interested in chatting about the intersection of cognitive science/psycholinguistics and AI/NLP, training dynamics, robustness/reliability, meaning, and evaluation

11.11.2024 19:03 β€” πŸ‘ 11    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

If there’s still space (and you accept postdocs), could I be added?

11.11.2024 18:54 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Thanks for creating this list - looks great! I’d love to be added if there’s still room

11.11.2024 18:45 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Thank you!

11.11.2024 12:15 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

If there’s still room, is there any chance you could add me to this list?

11.11.2024 11:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Also, I’m going to be attending EMNLP next week - reach out if you want to meet/chat

10.11.2024 19:34 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1

Anyway, excited to learn and chat about about research along these lines and beyond here on Bluesky!

10.11.2024 19:34 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Of course, none of this work would have been possible without my amazing PhD advisor Ben Bergen, and my other great collaborators: Seana Coulson, @catherinearnett.bsky.social, Tyler Chang, Cyma Van Petten, and Megan Bardolph!

10.11.2024 19:34 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Revenge of the Fallen? Recurrent Models Match Transformers at... Transformers have generally supplanted recurrent neural networks as the dominant architecture for both natural language processing tasks and for modelling the effect of predictability on online...

5: Recurrent models like RWKV and Mamba have recently emerged as viable alternatives to transformers. While they are intuitively more cognitively plausible, when used to model human language processing, how do they compare transformers? We find that they perform about the same overall:

10.11.2024 19:34 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Ignoring the alternatives: The N400 is sensitive to stimulus preactivation alone The N400 component of the event-related brain potential is a neural signal of processing difficulty. In the language domain, it is widely believed to …

4: Is the N400 sensitive only to the predicted probability of the stimuli encountered, or also the predicted probability of alternatives? We revisit this question with state-of-the-art NLP methods, with the results supporting the former hypothesis:

10.11.2024 19:34 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Strong Prediction: Language Model Surprisal Explains Multiple N400 Effects Abstract. Theoretical accounts of the N400 are divided as to whether the amplitude of the N400 response to a stimulus reflects the extent to which the stimulus was predicted, the extent to which the s...

3: The N400, a neural index of language processing, is highly sensitive to the contextual probability of words. But to what extent can lexical prediction explain other N400 phenomena? Using GPT-3, we show that it can implicitly account for both semantic similarity and plausibility effects:

10.11.2024 19:34 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Structural Priming Demonstrates Abstract Grammatical Representations in Multilingual Language Models James Michaelov, Catherine Arnett, Tyler Chang, Ben Bergen. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023.

2: Do multilingual language models learn that different languages can have the same grammatical structures? We use the structural priming paradigm from psycholinguistics to provide evidence that they do:

10.11.2024 19:34 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Bigger, Not Necessarily Better The inverse scaling issue means larger LLMs sometimes handle things less well.

If you’re interested in hearing more of my thoughts on this topic, check out this article at Communications of the ACM by Sandrine Ceurstemont that includes quotes from an interview with me and my co-author Ben Bergen:

10.11.2024 19:34 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Emergent Inabilities? Inverse Scaling Over the Course of Pretraining James Michaelov, Ben Bergen. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023.

1. Training language models on more data generally improves their performance, but is this always the case? We show that inverse scaling can occur not just across models of different sizes, but also in individual models over the course of training:

10.11.2024 19:34 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

With all the new people here on Bluesky, I think it’s a good time to (re-)introduce myself. I’m a postdoc at MIT carrying out research at the intersection of the cognitive science of language and AI. Here are some of the things I’ve worked on in the last year 🧡:

10.11.2024 19:34 β€” πŸ‘ 25    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0

@jamichaelov is following 20 prominent accounts