Auditory-Visual Speech Association (AVISA)'s Avatar

Auditory-Visual Speech Association (AVISA)

@avsp.bsky.social

The official(ish) account of the Auditory-VIsual Speech Association (AVISA) AV πŸ‘„ πŸ‘“ speech references, but mostly what interests me avisa.loria.fr

1,143 Followers  |  941 Following  |  529 Posts  |  Joined: 12.09.2023  |  1.7518

Latest posts by avsp.bsky.social on Bluesky

Preview
Facial gestures are enacted through a cortical hierarchy of dynamic and stable codes Facial gestures are one fundamental set of communicative behaviors in primates, generated through the dynamic arrangement of many fine muscles. Anatomy shows that facial muscles are under direct contr...

Facial gestures are enacted through a cortical hierarchy of dynamic and stable codes | Science www.science.org/doi/10.1126/...

#neuroskyence

16.02.2026 08:17 β€” πŸ‘ 13    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0

Some lips are red
Some eyes are blue
Seeing you speak
means I can identify more of the key words you said in noise

15.02.2026 05:32 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Watching Yourself Talk: Motor Experience Sharpens Sensitivity to Gesture-Speech Asynchrony

Tiziana Vercillo, Judith Holler, Uta Noppeney

www.biorxiv.org/content/10.6...

13.02.2026 10:40 β€” πŸ‘ 7    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

πŸ“š Citation Classic

"Phonetic and phonological representation of stop consonant voicing"
Patricia Keating (1984)
Citations: 859+

Structured view of [voice] feature to phonetic implement...

πŸ”— https://www.jstor.org/stable/pdf/413642.pdf

#SpeechScience

12.02.2026 12:19 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
How does a deep neural network look at lexical stress in English words? Despite their success in speech processing, neural networks often operate as black boxes, prompting the following questions: What informs their decisions, and h

How does a deep neural network look at lexical stress in English words? pubs.aip.org/asa/jasa/art... CNNs trained to predict stress position from a spectrographic representation of disyllabic words ->92% accuracy on held-out tests, interpretability analysis >stressed vowel's 1st &2nd formants key

11.02.2026 22:16 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Visual language models show widespread visual deficits on neuropsychological tests - Nature Machine Intelligence Tangtartharakul and Storrs use standardized neuropsychological tests to compare human visual abilities with those of visual language models (VLMs). They report that while VLMs excel in high-level obje...

Our latest paper, β€œVisual language models show widespread visual deficits on neuropsychological tests”, is now out in Nature Machine Intelligence: www.nature.com/articles/s42...

Non-paywalled version:
arxiv.org/abs/2504.10786

Tweet thread below from first author @genetang.bsky.social...

09.02.2026 02:40 β€” πŸ‘ 67    πŸ” 33    πŸ’¬ 1    πŸ“Œ 2
Cross-modal processing of auditory and visual symbol representations in the temporo-parietal cortex Numeracy and literacy are fundamental cognitive skills that rely on associating visual symbols with their spoken representations. Prior research has identified the posterior temporal-parietal cortex a...

X-modal processing of auditory & visual symbol representations in the temporo-parietal cortex
www.researchsquare.com/article/rs-8...
Slow-event-related 3T fMRI: A passive listening/viewing task auditory/visual letters & numbers overlapping activation in auditory cortex for auditory letters/numbers

09.02.2026 01:49 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Attention decoding at the cocktail party: Preserved in hearing aid users, reduced in cochlear implant users Users of hearing aids (HAs) and cochlear implants (CIs) experience significant difficulty understanding a target speaker in multi-talker environments …

Attention decoding at the cocktail party: Preserved in hearing aid users, reduced in cochlear implant users www.sciencedirect.com/science/arti... 29 HA, 24 CI users & 29 age-matched TH people EEG attending 1 of 2 talkers (female/male) in free-field; EEG <-> envelope linear backward & forward models

08.02.2026 21:27 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
NYAS Publications From birth, respiration constitutes an intrinsic rhythm. We suggest that vocalizations and bodily movements are interactively coordinated with this respiratory rhythm, providing a temporal framework ....

Toward Fuller Integration of Respiratory Rhythms Into Research on Infant Vocal&Motor Development nyaspubs.onlinelibrary.wiley.com/doi/10.1111/... Assays motor control,physiology,speech & language acquisition proposes respiration is core in early rhythmic coordination linking vocalization & movement

07.02.2026 03:23 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
Human newborns form musical predictions based on rhythmic but not melodic structure The ability to anticipate musical structure is a fundamental human trait, but whether it exists at birth is unclear. This study shows that newborns encode rhythmic expectations based on statistical re...

Human newborns form musical predictions based on rhythmic but not melodic structure journals.plos.org/plosbiology/... TRF analyses had high inter-individual variability for overall neural tracking of musical stimuli - note-by-note predictability tracked (not shuffled) a rhythmic not melodic effect

06.02.2026 10:00 β€” πŸ‘ 15    πŸ” 6    πŸ’¬ 0    πŸ“Œ 0
Preview
NYAS Publications Musical experience enhances speech-in-noise (SIN) perception, yet the mechanisms remain unclear. We tested 62 young adults using continuous measures of musical engagement, auditory and cognitive skil...

Explaining the Musical Advantage in Speech Perception Through Beat Perception and Working Memory nyaspubs.onlinelibrary.wiley.com/doi/10.1111/... "Our findings clarify the cognitive and temporal foundations of the musician advantage and highlight the value of considering musical engagement"

05.02.2026 22:47 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Individuals with congenital amusia show degraded performance in a nonword repetition task with lexical tones Congenital amusia is a disorder characterized by abnormal pitch processing, including pitch encoding and pitch memory. Individuals with amusia were im…

Individuals with congenital amusia show degraded performance in a nonword repetition task with lexical tones www.sciencedirect.com/science/arti... Nonword repetition task for syllable-tone combinations with length of the nonwords gradually increased from 1 to 7 syllables accuracy & error analysed

05.02.2026 07:21 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Early multimodal behavioral cues in autism: a micro-analytical exploration of actions, gestures and speech during naturalistic parent-child interactions https://pubmed.ncbi.nlm.nih.gov/41631016/

03.02.2026 14:49 β€” πŸ‘ 1    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Speech reading - "get me outta here"

02.02.2026 23:59 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

In 1961, physicist John Kelly programmed an IBM 704 to sing 'Daisy Bell' - the first song ever sung by a computer. This inspired HAL 9000's song in 2001: A Space Odyssey!

🎡 Historic: youtube.com/watch?v=41U78QP8nBk

#SpeechScience #Technology

01.02.2026 10:58 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

"We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done" scholar.google.com.au/scholar?oi=b... 😲

01.02.2026 00:42 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Thank you. What a wonderful representation of HARPY's knowledge graph, how it was based on an expert formal lexical grammar, rules for phone junctures & how the frequency contrast & matching worked. Superb!

31.01.2026 23:27 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Harpy Search (Long Version)
YouTube video by Raj Reddy Harpy Search (Long Version)

Check out HARPY: 1971 DARPA project:
Speech understanding system that aimed to:
Goal (November, 1971)
Accept connected speech
Many cooperative speakers
Use 1000 word vocabulary
Task-oriented grammar
Constraining task
Less than 10% Semantic Errors
Requiring ~300 MIPSS www.youtube.com/watch?v=NiiD...

31.01.2026 22:36 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Cutaneous alternating current stimulation can cause a phasic modulation of speech perception https://pubmed.ncbi.nlm.nih.gov/41617605/

31.01.2026 05:48 β€” πŸ‘ 2    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Preview
Introducing Causion: A web app for playing with DAGs | Peder M. Isager Personal website of Dr. Peder M. Isager

pedermisager.org/blog/causion...

29.01.2026 05:35 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

something, something, phone ... er, ring?

28.01.2026 10:50 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@speechpapers.bsky.social I see you're posting AV speech stuff - πŸ‘

28.01.2026 07:27 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0


The cortical contribution to the speech-FFR is not modulated by visual information

https://www.biorxiv.org/content/10.64898/2026.01.26.701703v1

28.01.2026 07:24 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Audio-visual speech-in-noise tests for evaluating speech reception thresholds: A scoping review https://pubmed.ncbi.nlm.nih.gov/41592005/

28.01.2026 02:34 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

For context, see KaradΓΆller D. Z., SΓΌmer B., Γ–zyΓΌrek A. (2025). First-language acquisition in a multimodal framework: Insights from speech, gesture, and sign journals.sagepub.com/doi/pdf/10.1... Miles et al argue "An embodied multi-articulatory multimodal framework is needed"

26.01.2026 21:22 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
An embodied multi-articulatory multimodal language framework: A commentary on KaradΓΆller, SΓΌmer and Γ–zyΓΌrek - Rachel Miles, Shai Lynne Nielson, Deniz Δ°lkbaşaran, Rachel I Mayberry, 2025 While many researchers working in spoken languages have used modality to distinguish language and gesture, this is not possible for sign language researchers. W...

An embodied multi-articulatory multimodal language framework: A commentary on KaradΓΆller et al
journals.sagepub.com/doi/10.1177/...
"we believe it shows that our understanding of the role of gesture in language is incomplete and lacks crucial insight when co-sign gesture is not accounted for"

26.01.2026 21:22 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
The involvement of endogenous brain rhythms in speech processing Endogenous brain rhythms are at the core of oscillation-based neurobiological theories of speech. These brain rhythms have been proposed to play a cru…

The involvement of endogenous brain rhythms in speech processing www.sciencedirect.com/science/arti... Reviews oscillation-based theories (dynamic attending, active sensing, asymmetric sampling in time, segmentation theories) & evidence > Naturalistic paradigms and resting-state data key to progress

23.01.2026 21:03 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Children Sustain Their Attention on Spatial Scenes When Planning to Describe Spatial Relations Multimodally in Speech & Gesture onlinelibrary.wiley.com/doi/10.1111/... "How do children allocate visual attention to scenes as they prepare to describe them multimodally in speech and co-speech gesture?"

20.01.2026 22:55 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
The Effects of Visual Input in Virtual Reality on Voice Production: Comparing Trained Singers and Untrained Speakers This study examined whether visual spatial cues presented in immersive virtual reality (IVR)β€”room size and speaker-to-listener distanceβ€”are associated with changes in vocal production, and whether the...

Effects of Visual Input in Virtual Reality on Voice Production: Comparing Trained Singers & Untrained Speakers www.jvoice.org/article/S089... Study examined if visual spatial cues in immersive virtual reality (room size, speaker-to-listener distance) are associated with changes in vocal production πŸ—£οΈ

19.01.2026 12:28 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
APA PsycNet

Distinct Temporal Dynamics of Speech & Gesture Processing: Insights From ERP Across L1 and L2 psycnet.apa.org/fulltext/202... "results point to potentially distinct neural and temporal dynamics in processing speech versus gestures" -> speech processing earlier as gestures recruit later stages (?)

17.01.2026 09:28 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@avsp is following 20 prominent accounts