Siyuan Song's Avatar

Siyuan Song

@siyuansong.bsky.social

senior undergrad@UTexas Linguistics Looking for Ph.D position 26 Fall Comp Psycholing & CogSci, human-like AI, rock🎸 @growai.bsky.social Prev: Summer Research Visit @MIT BCS(2025), Harvard Psych(2024), Undergrad@SJTU(2022-24) Opinions are my own.

163 Followers  |  331 Following  |  37 Posts  |  Joined: 19.11.2024  |  1.8743

Latest posts by siyuansong.bsky.social on Bluesky

Figure 1 showing alignment pipeline using CLIP models on BabyView data.

Figure 1 showing alignment pipeline using CLIP models on BabyView data.

Figure 2: human judgments are correlated with CLIP scores.

Figure 2: human judgments are correlated with CLIP scores.

Can we use VLMs to quantify multimodal alignment in children's experiences? We analyze a large corpus of headcam videos to find out!

New preprint from our BabyView project, led by @alvinwmtan.bsky.social and Jane Yang: arxiv.org/abs/2511.18824

01.12.2025 18:05 β€” πŸ‘ 25    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0

Looking forward to #NeurIPS25 this week 🏝️! I'll be presenting at Poster Session 3 (11-2 on Thursday). Feel free to reach out!

01.12.2025 22:12 β€” πŸ‘ 10    πŸ” 3    πŸ’¬ 0    πŸ“Œ 1
Post image

I’m excited to present SimpleStories at EurIPS!

Also if anyone at #EurIPS is interested in chatting about LLM data efficiency, interpretability, model inconsistency or other topics feel free to DM me.

Dataset and models: lnkd.in/e_VGWqhP
Code: lnkd.in/eEidmv74
Paper: lnkd.in/eH6jS9uY

01.12.2025 03:40 β€” πŸ‘ 18    πŸ” 4    πŸ’¬ 2    πŸ“Œ 0

String probability might be the best tool for assessing LMs' grammatical knowledge, yet it does not directly tell you 'how grammatical' a string is. Here's why and how we should use string probability and minimal pairs:
Excited to see this out - it's my great honor to be part of this amazing team!

10.11.2025 23:10 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Oh cool! Excited this LM + construction paper was SAC-Highlighted! Check it out to see how LM-derived measures of statistical affinity separate out constructions with similar words like "I was so happy I saw you" vs "It was so big it fell over".

10.11.2025 16:27 β€” πŸ‘ 17    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0

Delighted Sasha's (first year PhD!) work using mech interp to study complex syntax constructions won an Outstanding Paper Award at EMNLP!

Also delighted the ACL community continues to recognize unabashedly linguistic topics like filler-gaps... and the huge potential for LMs to inform such topics!

07.11.2025 18:22 β€” πŸ‘ 33    πŸ” 8    πŸ’¬ 1    πŸ“Œ 0

Interested in doing a PhD at the intersection of human and machine cognition? ✨ I'm recruiting students for Fall 2026! ✨

Topics of interest include pragmatics, metacognition, reasoning, & interpretability (in humans and AI).

Check out JHU's mentoring program (due 11/15) for help with your SoP πŸ‘‡

04.11.2025 14:44 β€” πŸ‘ 27    πŸ” 15    πŸ’¬ 0    πŸ“Œ 1
Post image

🧠 New at #NeurIPS2025!
🎡 We're far from the shallow now🎡
TL;DR: We introduce the first "reasoning embedding" and uncover its unique spatio-temporal pattern in the brain.

πŸ”— arxiv.org/abs/2510.228...

30.10.2025 22:25 β€” πŸ‘ 8    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0
Post image

Introducing Global PIQA, a new multilingual benchmark for 100+ languages. This benchmark is the outcome of this year’s MRL shared task, in collaboration with 300+ researchers from 65 countries. This dataset evaluates physical commonsense reasoning in culturally relevant contexts.

29.10.2025 15:50 β€” πŸ‘ 21    πŸ” 10    πŸ’¬ 1    πŸ“Œ 4
Post image

Very excited to be going to Chicago for
@agnescallard.bsky.social's famous Night Owls next week! I'll be discussing my essay "ChatGPT and the Meaning of Life". Hope to see you there if you're local!

24.10.2025 16:01 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Title of our paper: β€œHey, wait a minute: on at-issue sensitivity in Language Models” by Sanghee Kim and Kanishka Misra.

Below: A person says β€œSue, Max’s girlfriend, was a tennis champ!”; a second person responds with β€œWhat racket does she use?” (which targets at-issue content); a third person replies with β€œThey’re dating?” (which targets not at-issue content)

Title of our paper: β€œHey, wait a minute: on at-issue sensitivity in Language Models” by Sanghee Kim and Kanishka Misra. Below: A person says β€œSue, Max’s girlfriend, was a tennis champ!”; a second person responds with β€œWhat racket does she use?” (which targets at-issue content); a third person replies with β€œThey’re dating?” (which targets not at-issue content)

If I spill the teaβ€”β€œDid you know Sue, Max’s gf, was a tennis champ?”—but then if you reply β€œThey’re dating?!” I’d be a bit puzzled, since that’s not the main point! Humans can track what’s β€˜at issue’ in conversation. How sensitive are LMs to this distinction?

New paper w/ @sangheekim.bsky.social!

21.10.2025 14:02 β€” πŸ‘ 35    πŸ” 4    πŸ’¬ 3    πŸ“Œ 2
Post image

I will be recruiting PhD students via Georgetown Linguistics this application cycle! Come join us in the PICoL (pronounced β€œpickle”) lab. We focus on psycholinguistics and cognitive modeling using LLMs. See the linked flyer for more details: bit.ly/3L3vcyA

21.10.2025 21:52 β€” πŸ‘ 28    πŸ” 14    πŸ’¬ 2    πŸ“Œ 0
Title page of the paper: WUGNECTIVES: Novel Entity Inferences of Language Models from Discourse Connectives, with two figures at the bottom

Left: Our figure 1 -- comparing previous work, which usually predicted the connective given the arguments (grounded in the world); our work flips this premise by getting models to use their knowledge of connectives to predict something about the world.

Right: Our main results across 7 types of connective senses. Models are especially bad at Concession connectives.

Title page of the paper: WUGNECTIVES: Novel Entity Inferences of Language Models from Discourse Connectives, with two figures at the bottom Left: Our figure 1 -- comparing previous work, which usually predicted the connective given the arguments (grounded in the world); our work flips this premise by getting models to use their knowledge of connectives to predict something about the world. Right: Our main results across 7 types of connective senses. Models are especially bad at Concession connectives.

"Although I hate leafy vegetables, I prefer daxes to blickets." Can you tell if daxes are leafy vegetables? LM's can't seem to! πŸ“·

We investigate if LMs capture these inferences from connectives when they cannot rely on world knowledge.

New paper w/ Daniel, Will, @jessyjli.bsky.social

16.10.2025 15:27 β€” πŸ‘ 32    πŸ” 10    πŸ’¬ 2    πŸ“Œ 1

Gonna keep updating it regularly and have some fun with the resources we’ve got to grow & test Chinese BabyLMs🐣stay tuned!

15.10.2025 17:52 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Honored to get the chance to contribute to the Chinese dataset! And had a great time working with all the awesome collaborators!

15.10.2025 17:52 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Excited to present this at COLM tomorrow! (Tuesday, 11:00 AM poster session)

06.10.2025 15:21 β€” πŸ‘ 3    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

I will be giving a short talk on this work at the COLM Interplay workshop on Friday (also to appear at EMNLP)!

Will be in Montreal all week and excited to chat about LM interpretability + its interaction with human cognition and ling theory.

06.10.2025 12:05 β€” πŸ‘ 8    πŸ” 5    πŸ’¬ 0    πŸ“Œ 0
Preview
Both Direct and Indirect Evidence Contribute to Dative Alternation Preferences in Language Models Language models (LMs) tend to show human-like preferences on a number of syntactic phenomena, but the extent to which these are attributable to direct exposure to the phenomena or more general propert...

Traveling to my first @colmweb.org🍁

Not presenting anything but here are two posters you should visit:

1. @qyao.bsky.social on Controlled rearing for direct and indirect evidence for datives (w/ me, @weissweiler.bsky.social and @kmahowald.bsky.social), W morning

Paper: arxiv.org/abs/2503.20850

06.10.2025 15:22 β€” πŸ‘ 13    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0
Post image

On my way to #COLM2025 🍁

Check out jessyli.com/colm2025

QUDsim: Discourse templates in LLM stories arxiv.org/abs/2504.09373

EvalAgent: retrieval-based eval targeting implicit criteria arxiv.org/abs/2504.15219

RoboInstruct: code generation for robotics with simulators arxiv.org/abs/2405.20179

06.10.2025 15:50 β€” πŸ‘ 12    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0
Preview
Language Models Fail to Introspect About Their Knowledge of Language There has been recent interest in whether large language models (LLMs) can introspect about their own internal states. Such abilities would make LLMs more interpretable, and also validate the use of s...

I’m at #COLM2025 from Wed with:

@siyuansong.bsky.social Tue am introspection arxiv.org/abs/2503.07513

@qyao.bsky.social Wed am controlled rearing: arxiv.org/abs/2503.20850

@sashaboguraev.bsky.social INTERPLAY ling interp: arxiv.org/abs/2505.16002

I’ll talk at INTERPLAY too. Come say hi!

06.10.2025 15:57 β€” πŸ‘ 20    πŸ” 6    πŸ’¬ 1    πŸ“Œ 0

Heading to #COLM2025 to present my first paper w/ @jennhu.bsky.social @kmahowald.bsky.social !

When: Tuesday, 11 AM – 1 PM
Where: Poster #75

Happy to chat about my work and topics in computational linguistics & cogsci!

Also, I'm on the PhD application journey this cycle!

Paper info πŸ‘‡:

06.10.2025 16:05 β€” πŸ‘ 7    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Illustration of the blog post's main argument, summarized as: "Theory of Mind as a Central Skill for Researchers: Research involves many skills.If each skill is viewed separately, each one takes a long time to learn. These skills can instead be connected via theory of mind – the ability to reason about the mental states of others. This allows you to transfer your abilities across areas, making it easier to gain new skills."

Illustration of the blog post's main argument, summarized as: "Theory of Mind as a Central Skill for Researchers: Research involves many skills.If each skill is viewed separately, each one takes a long time to learn. These skills can instead be connected via theory of mind – the ability to reason about the mental states of others. This allows you to transfer your abilities across areas, making it easier to gain new skills."

πŸ€– 🧠 NEW BLOG POST 🧠 πŸ€–

What skills do you need to be a successful researcher?

The list seems long: collaborating, writing, presenting, reviewing, etc

But I argue that many of these skills can be unified under a single overarching ability: theory of mind

rtmccoy.com/posts/theory...

30.09.2025 15:14 β€” πŸ‘ 20    πŸ” 2    πŸ’¬ 2    πŸ“Œ 2
Picture of the UT Tower with "UT Austin Computational Linguistics" written in bigger font, and "Humans processing computers processing human processing language" in smaller font

Picture of the UT Tower with "UT Austin Computational Linguistics" written in bigger font, and "Humans processing computers processing human processing language" in smaller font

The compling group at UT Austin (sites.utexas.edu/compling/) is looking for PhD students!

Come join me, @kmahowald.bsky.social, and @jessyjli.bsky.social as we tackle interesting research questions at the intersection of ling, cogsci, and ai!

Some topics I am particularly interested in:

30.09.2025 16:17 β€” πŸ‘ 18    πŸ” 10    πŸ’¬ 3    πŸ“Œ 2

Can AI aid scientists amidst their own workflows, when they do not know step-by-step workflows and may not know, in advance, the kinds of scientific utility a visualization would bring?

Check out @sebajoe.bsky.social’s feature on ✨AstroVisBench:

25.09.2025 20:52 β€” πŸ‘ 8    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Preview
Simon Goldstein & Harvey Lederman, What Does ChatGPT Want? An Interpretationist Guide - PhilPapers This paper investigates LLMs from the perspective of interpretationism, a theory of belief and desire in the philosophy of mind. We argue for three conclusions. First, the right object of study ...

Simon Goldstein and I have a new paper, β€œWhat does ChatGPT want? An interpretationist guide”.

The paper argues for three main claims.

philpapers.org/rec/GOLWDC-2 1/7

24.09.2025 12:37 β€” πŸ‘ 24    πŸ” 6    πŸ’¬ 2    πŸ“Œ 5

I did a QA with Quanta about interpretability and training dynamics! I got to talk about a bunch of research hobby horses and how I got into them.

24.09.2025 13:57 β€” πŸ‘ 66    πŸ” 12    πŸ’¬ 2    πŸ“Œ 0
Preview
Latent learning: episodic memory complements parametric learning by enabling flexible reuse of experiences When do machine learning systems fail to generalize, and what mechanisms could improve their generalization? Here, we draw inspiration from cognitive science to argue that one weakness of machine lear...

Why does AI sometimes fail to generalize, and what might help? In a new paper (arxiv.org/abs/2509.16189), we highlight the latent learning gap β€” which unifies findings from language modeling to agent navigation β€” and suggest that episodic memory complements parametric learning to bridge it. Thread:

22.09.2025 04:21 β€” πŸ‘ 47    πŸ” 10    πŸ’¬ 1    πŸ“Œ 1

Announcing the first (and perhaps only) Multilingual Minds and Machines Meeting! Come join us in Nijmegen, June 22-23, 2026, if you are interested in computational models of human multilingualism: mmmm2026.github.io

19.09.2025 11:27 β€” πŸ‘ 13    πŸ” 8    πŸ’¬ 0    πŸ“Œ 0

Did you know?

❌77% of language models on @hf.co are not tagged for any language
πŸ“ˆFor 95% of languages, most models are multilingual
🚨88% of models with tags are trained on English

In a new blog post, @tylerachang.bsky.social and I dig into these trends and why they matter! πŸ‘‡

19.09.2025 14:53 β€” πŸ‘ 13    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Post image

Our new lab for Human & Machine Intelligence is officially open at Princeton University!

Consider applying for a PhD or Postdoc position, either through Computer Science or Psychology. You can register interest on our new website lake-lab.github.io (1/2)

08.09.2025 13:59 β€” πŸ‘ 54    πŸ” 15    πŸ’¬ 1    πŸ“Œ 0

@siyuansong is following 20 prominent accounts