I'm excited for #CogSci2025 in SF this week! I would love to meet more people thinking about language and cognition, especially signed languages! Please feel free to reach out :)
29.07.2025 22:46 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0@kayoyin.bsky.social
PhD student at UC Berkeley. NLP for signed languages and LLM interpretability. kayoyin.github.io ๐๐น๐ตโโ๏ธ๐ฅ
I'm excited for #CogSci2025 in SF this week! I would love to meet more people thinking about language and cognition, especially signed languages! Please feel free to reach out :)
29.07.2025 22:46 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0Happy to announce the first workshop on Pragmatic Reasoning in Language Models โ PragLM @ COLM 2025! ๐
How do LLMs engage in pragmatic reasoning, and what core pragmatic capacities remain beyond their reach?
๐ sites.google.com/berkeley.edu/praglm/
๐
Submit by June 23rd
More evidence of the importance of training analysis for interp! Induction heads might serve as *preliminary* function vector heads (which directly compute in-context learning tasks). Ultimately, LMs rely on FV heads more than IH heads for ICL. from @kayoyin.bsky.social
03.03.2025 16:51 โ ๐ 14 ๐ 2 ๐ฌ 2 ๐ 0On Nov. 7th, I told my sociolinguistics students this was going to happen. They did an activity on language policy and one of the lessons was that repressive regimes always try to criminalize languages that are not ideologically associated with nationalism. Iโm sorry I was right.
28.02.2025 17:22 โ ๐ 468 ๐ 157 ๐ฌ 17 ๐ 5We speculate that induction heads help models learn the more complex FV mechanism, which ultimately drives in-context learning ๐ค
Paper: arxiv.org/abs/2502.14010
How to reconcile this with previous studies on ICL?
Key difference is that previous works:
- measure ICL using differences between token losses, which we find behaves differently to few-shot ICL accuracy
- don't control for overlap between induction and FV
- focus on small models
Other interesting findings:
- FV heads have relatively high induction scores and vice versa compared to other heads
- FV heads emerge later in training than induction heads
- ICL accuracy rises around the same time induction emerges during training, but increases more gradually
We also find evidence of induction heads that evolve into FV heads.
Several instances of FV heads have a high induction score earlier in training (around when induction heads first emerge). However, the reverse (induction heads with high FV scores earlier) does not occur.
2 mechanisms have been proposed to explain ICL: induction heads that find and copy relevant tokens, and FV heads that compute a latent encoding of the task from examples.
Our ablations show that FV heads are crucial for few-shot ICL, whereas induction heads are not necessary.
Induction heads are commonly associated with in-context learning, but are they the primary driver of ICL at scale?
We find that recently discovered "function vector" heads, which encode the ICL task, are the actual primary mechanisms behind few-shot ICL!
arxiv.org/abs/2502.14010
๐งต๐
Iโm visiting pittsburgh this Sunday until next Tuesday, Iโd love to catch up with friends and meet new people! DM me :)
this will be my first time back since I graduated ๐ซถ
I'm presenting this virtually at #TISLR15 very soon at 2:35pm today Ethiopia time (3:35am Pacific ๐) !
17.01.2025 09:18 โ ๐ 7 ๐ 1 ๐ฌ 0 ๐ 0Phase Transition xkcd.com/3025
16.12.2024 20:01 โ ๐ 20525 ๐ 2507 ๐ฌ 181 ๐ 141Thanks for the kind words, Seth ๐ glad you joined the dinner!
17.12.2024 01:56 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0A small white bichon frisรฉ dog running on the water at a beach, the sky behind is late sunset colors
took my dog to the beach today. His name is apollo and heโs 14 years old :)
15.12.2024 12:35 โ ๐ 13 ๐ 0 ๐ฌ 0 ๐ 0sad Iโm not in town for this, looks super exciting!! ๐ฟ
04.12.2024 22:07 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0oof yeah I was afraid something like that was maybe going on. I hope she gets the help she needsโฆ
04.12.2024 21:26 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0My latest talk on NLP for signed languages presented at the Japanese NLP colloquium doesnโt have an English version yet, maybe you can fix this by inviting me to give a talk :D
03.12.2024 22:36 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0Just watched @kayoyin.bsky.social's fantastic talk at the Japanese NLP colloquium on sign language processing. It broadened my perspective on what language is and how AI can support Deaf and hard-of-hearing communities. Highly recommended! youtu.be/7dD8wu-Chbo
02.12.2024 21:38 โ ๐ 3 ๐ 1 ๐ฌ 0 ๐ 0ahh yes this is it thank you!! I hallucinated the end haha
26.11.2024 23:20 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0glad to know at least I didnโt just make this up ๐ญ I think I heard it recently too but canโt remember at alll
26.11.2024 12:05 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0this is driving me crazy, does anyone recognize this song?? Itโs stuck in my head but I canโt remember where I heard it
26.11.2024 09:51 โ ๐ 0 ๐ 0 ๐ฌ 2 ๐ 0Photo from window outside my office in berkeley where the sky is white from clouds and itโs raining
multiple people told me I seem to be in a visibly better mood the past 2 days which coincides perfectly with Berkeley finally getting rainy weather
22.11.2024 22:53 โ ๐ 6 ๐ 0 ๐ฌ 0 ๐ 0Overall, handshapes in native ASL signs reflect communicative efficiency, but *not in signs borrowed from English*!
Check out our paper+code (w/ Terry Regier & Dan Klein) for more details and why we think that's the case: aclanthology.org/2024.acl-lon...
See you at TISLR in Ethiopia! โ๏ธ 8/8
The scatter plot is titled "Fingerspelling" and illustrates the relationship between handshape similarity and English letter confusability. - **X-axis:** English letter confusability - **Y-axis:** Handshape similarity Points on the plot are labeled with pairs of letters representing different handshapes. There is a positive correlation line with the label (r=0.19, p=0.00), indicating a slight positive correlation between English letter confusability and handshape similarity.
What about perceptual effort - could it be correlated with English usage?
Perceptual effort to distinguish between 2 handshapes is very weakly correlated with how often the 2 letters appear in similar contexts in English, and in the "wrong" direction for efficiency. 7/8
The scatter plot is titled "Fingerspelling" and depicts the relationship between finger independence and English letter frequency. X-axis: English letter frequency Y-axis: Finger independence Points on the plot are labeled with letters representing different handshapes. There is a negative correlation line with the label (r=-0.31, p=0.15), indicating a slight negative correlation between English letter frequency and finger independence.
We also look at handshapes in ASL fingerspelling (used to spell out English words, 1 handshape = 1 letter) and their correlation with letter frequency in English text.
No significant correlation between fingerspelling handshapes and English letter frequency! 6/8
The image contains two scatter plots comparing finger independence and handshape frequency. Left Plot: Core signs Title: "Core signs" X-axis: Handshape frequency Y-axis: Finger independence Points are labeled with letters representing different handshapes. A negative correlation line is shown with the label (r=-0.46, p=0.04). Right Plot: Initialized & loan signs Title: "Initialized & loan signs" X-axis: Handshape frequency Y-axis: Finger independence Points are labeled with letters representing different handshapes. A nearly flat correlation line is shown with the label (r=-0.06, p=0.81).
We compute the correlation between articulatory effort and handshape frequency in a lexicon of ASL signs (ASL-LEX).
In core signs native to ASL (left), frequent handshapes are easier to produce!
In initialized and loan signs borrowed from English (right), no correlation! 5/8
The image is divided into two sections. On the left side, with a light green background, the title reads "Low handshape similarity (Low perceptual effort)." Below this title, there are two pairs of handshapes, each pair connected by a double-headed arrow to indicate low similarity. (B and X, H and S) On the right side, with a light red background, the title reads "High handshape similarity (High perceptual effort)." Below this title, there are two pairs of handshapes, each pair connected by a double-headed arrow to indicate high similarity. (R and U, N and M)
For perceptual effort, we measure handshape similarity.
When two handshapes have similar finger joint angles, they appear more alike, making it harder to distinguish between them perceptually. 4/8
The image is divided into two sections. On the left side, with a light green background, the title reads "Low finger independence (Low articulatory effort)." Below this title, there are four illustrations of handshapes that require low finger independence (B, C, S, A). On the right side, with a light red background, the title reads "High finger independence (High articulatory effort)." Below this title, there are four illustrations of handshapes that require high finger independence (W, R, P, H).
For articulatory effort, we measure finger independence.
The more variation there is in finger joint angles within a handshape, the more difficult it is to produce that handshape. 3/8
We analyze handshapes used in native ASL signs and in signs borrowed from English to compare efficiency pressures from both ASL and English usage.
To do so, we quantify the articulatory effort needed to produce handshapes and the perceptual effort needed to recognize them๐ 2/8