Shikhar's Avatar

Shikhar

@shikharb.bsky.social

PhD student WAVLab@LTI, CMU Multimodality and multilinguality prev. predoc Google Deepmind

419 Followers  |  1,022 Following  |  9 Posts  |  Joined: 18.11.2024  |  2.008

Latest posts by shikharb.bsky.social on Bluesky


Shikhar Bharadwaj, Chin-Jou Li, Yoonjae Kim, Kwanghee Choi, Eunjung Yeo, Ryan Soh-Eun Shim, Hanyu Zhou, Brendon Boldt, Karen Rosero Jacome, Kalvin Chang, Darsh Agrawal, Keer Xu, ...
PRiSM: Benchmarking Phone Realization in Speech Models
https://arxiv.org/abs/2601.14046

21.01.2026 09:30 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Bharadwaj, Li, Kim, Choi, Yeo, Shim, Zhou, Boldt, Jacome, Chang, Agrawal, Xu, Yang, Zhu, Watanabe, Mortensen: PRiSM: Benchmarking Phone Realization in Speech Models https://arxiv.org/abs/2601.14046 https://arxiv.org/pdf/2601.14046 https://arxiv.org/html/2601.14046

21.01.2026 06:32 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image

Can we make discrete speech units lightweightπŸͺΆ and streamable🏎? Excited to share our new #Interspeech2025 paper: On-device Streaming Discrete Speech Units arxiv.org/abs/2506.01845 (1/n)

15.08.2025 20:44 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0

Meows, music, murmurs and more - we trained a general purpose audio encoder and open sourced the code, checkpoint and evaluation toolkit.

22.07.2025 03:36 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

πŸ“’ We've open-sourced NatureLM-audio, the first audio-language foundation model for #bioacoustics.

Trained on large-scale animal vocalization, human speech & music datasets, the model enables zero-shot classification, detection & querying across diverse species & environments πŸ‘‡πŸ½

24.04.2025 15:54 β€” πŸ‘ 27    πŸ” 12    πŸ’¬ 2    πŸ“Œ 0

πŸ”— Resources for ESPnet-SDS:
πŸ“‚ Codebase (part of ESPnet): github.com/espnet/espnet
πŸ“– README & User Guide: github.com/espnet/espne...
πŸŽ₯ Demo Video: www.youtube.com/watch?v=kI_D...

17.03.2025 14:29 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

New #NAACL2025 demo, Excited to introduce ESPnet-SDS, a new open-source toolkit for building unified web interfaces for both cascaded & end-to-end spoken dialogue system, providing real-time evaluation, and more!
πŸ“œ: arxiv.org/abs/2503.08533
Live Demo: huggingface.co/spaces/Siddh...

17.03.2025 14:29 β€” πŸ‘ 7    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸš€ New #ICLR2025 Paper Alert! πŸš€

Can Audio Foundation Models like Moshi and GPT-4o truly engage in natural conversations? πŸ—£οΈπŸ”Š

We benchmark their turn-taking abilities and uncover major gaps in conversational AI. πŸ§΅πŸ‘‡

πŸ“œ: arxiv.org/abs/2503.01174

05.03.2025 16:03 β€” πŸ‘ 9    πŸ” 6    πŸ’¬ 1    πŸ“Œ 0

Wait I thought the rock was named Dwayne Johnson

06.02.2025 13:29 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

gpu poverty is real

28.01.2025 05:10 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Happy New Year

02.01.2025 23:21 β€” πŸ‘ 23832    πŸ” 4476    πŸ’¬ 386    πŸ“Œ 313

Philip Whittington, Gregor Bachmann, Tiago Pimentel
Tokenisation is NP-Complete
https://arxiv.org/abs/2412.15210

20.12.2024 05:18 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

Today, we’re introducing NatureLM-audio: the first large audio-language model tailored for understanding animal sounds. arxiv.org/abs/2411.07186 πŸ§΅πŸ‘‡

05.12.2024 00:45 β€” πŸ‘ 15    πŸ” 8    πŸ’¬ 2    πŸ“Œ 4
Post image

Announcing πŸ₯‚ FineWeb2: A sparkling update with 1000s of πŸ—£οΈlanguages.

We applied the same data-driven approach that led to SOTA English performance in🍷 FineWeb to thousands of languages.

πŸ₯‚ FineWeb2 has 8TB of compressed text data and outperforms other datasets.

08.12.2024 09:19 β€” πŸ‘ 76    πŸ” 19    πŸ’¬ 1    πŸ“Œ 0
Preview
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment The video-language (VL) pretraining has achieved remarkable improvement in multiple downstream tasks. However, the current VL pretraining framework is hard to extend to multiple modalities (N modaliti...

Language bind arxiv.org/abs/2310.01852
Language as the pivoting modality instead of images. Different training dataset.

08.12.2024 14:24 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

WAVLab is up in bsky!

06.12.2024 19:15 β€” πŸ‘ 8    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image Post image Post image Post image

We are excited to announce the launch of ML SUPERB 2.0 (multilingual.superbbenchmark.org) as part of the Interspeech 2024 official challenge! We hope this upgraded version of ML SUPERB advances universal access to speech processing worldwide. Please join it!

#Interspeech2025

04.12.2024 14:45 β€” πŸ‘ 20    πŸ” 9    πŸ’¬ 1    πŸ“Œ 1

πŸ™‹β€β™‚οΈ

30.11.2024 16:55 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I've started putting together a starter pack with people working on Speech Technology and Speech Science: go.bsky.app/BQ7mbkA

(Self-)nominations welcome!

19.11.2024 11:13 β€” πŸ‘ 82    πŸ” 34    πŸ’¬ 44    πŸ“Œ 3
Examples from dataset, a world map surrounded by spectrograms showing animal sounds from different regions of the world

Examples from dataset, a world map surrounded by spectrograms showing animal sounds from different regions of the world

Scatter plot where points are sound data sets, x axis is number of categories in dataset and y axis is duration of dataset in hours

iNatSounds is shown as the largest dataset on both axes

Scatter plot where points are sound data sets, x axis is number of categories in dataset and y axis is duration of dataset in hours iNatSounds is shown as the largest dataset on both axes

iNatSounds: new dataset from folks @inaturalist.bsky.social & co-authors; looks to be one of the largest public datasets of animal sounds

openreview.net/forum?id=QCY...

github.com/visipedia/in...

#prattle πŸ’¬
#bioacoustics

29.11.2024 03:30 β€” πŸ‘ 30    πŸ” 14    πŸ’¬ 1    πŸ“Œ 5

πŸ™‹β€β™‚οΈπŸ™

24.11.2024 23:49 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

πŸ™‹β€β™‚οΈπŸ™

24.11.2024 23:44 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

πŸ™‹β€β™‚οΈ

23.11.2024 00:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We're here too now! πŸ₯³

22.11.2024 14:42 β€” πŸ‘ 8    πŸ” 6    πŸ’¬ 0    πŸ“Œ 0

Me (shikharb@bsky.social) and our lab bsky.app/profile/wavl...

22.11.2024 23:09 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@shikharb is following 20 prominent accounts