Ben Litterer 's Avatar

Ben Litterer

@blitt.bsky.social

PhD student interested in computational approaches to language, politics, and media Iowa | Michigan

95 Followers  |  24 Following  |  10 Posts  |  Joined: 16.02.2024  |  2.232

Latest posts by blitt.bsky.social on Bluesky

Thanks to @dallascard.bsky.social and @davidjurgens.bsky.social for their help on this project! We also received great feedback from members of the Blablablab and CLC lab

14.11.2024 22:36 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
blitt/SPoRC ยท Datasets at Hugging Face Weโ€™re on a journey to advance and democratize artificial intelligence through open source and open science.

Interested in working with SPoRC? Our data, paper, and code for creating data and doing the analysis are freely available!

data: huggingface.co/datasets/bli...
paper: arxiv.org/abs/2411.07892
processing code: github.com/blitt2018/SP...
analysis code: github.com/blitt2018/SP...

14.11.2024 22:36 โ€” ๐Ÿ‘ 9    ๐Ÿ” 1    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

We're excited for people to use this data to explore the dynamics of long-form conversation, linguistic style matching, diffusion of information, understanding power and prestige within the podcast ecosystem, and more!

14.11.2024 22:36 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

What about the audio aspect of podcasts? We provide speaker turn information, along with audio features that capture this information, such as pitch, allowing future research to consider elements like emotion, humor, or sarcasm

14.11.2024 22:36 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

...Discussion of George Floyd was widespread across categories, with 21% of podcasts saying his name in at least one of their episodes in our time-period. Furthermore, discussion of racial justice peaked quickly around George Floyd but transitioned to a longer-lasting focus on Black Lives Matter

14.11.2024 22:36 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

How does the podcast ecosystem react to major events? As a case study, we consider collective attention in the podcast ecosystem following the murder of George Floyd in 2020...

14.11.2024 22:36 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
A network figure where podcasts are connected by edges if they have hosted the same guest. Color is assigned based on self-ascribed podcast category labels. Layout is determined with the force-directed Yifan-Hu algorithm. Podcasts in the same category appear closer.

A network figure where podcasts are connected by edges if they have hosted the same guest. Color is assigned based on self-ascribed podcast category labels. Layout is determined with the force-directed Yifan-Hu algorithm. Podcasts in the same category appear closer.

How do the creators of podcast content exchange ideas and form communities? We find that the Business, Sports, and News categories form communities through shared guests, whereas other large categories such as Religion and Society do not

14.11.2024 22:36 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
A figure where podcast episodes are projected such that distance indicates topical similarity. Color is assigned based on the self-ascribed podcast category label.

A figure where podcast episodes are projected such that distance indicates topical similarity. Color is assigned based on the self-ascribed podcast category label.

Podcasts have categories, but how similar are podcasts within categories in terms of what they talk about? In our content analysis, we find it's mixed! Some topics belong to distinct categoriesโ€”but other topics like "racial justice" or "spirituality" cut across many categories!

14.11.2024 22:36 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

SPoRC covers nearly all English episodes during May-June 2020, with transcripts + host/guest inferences for over 1M episodes, and audio features + speaker turns for over 370K episodes. Using this data, we study the content, structure, and responsiveness of the podcast ecosystem

14.11.2024 22:36 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Mapping the Podcast Ecosystem with the Structured Podcast Research Corpus Podcasts provide highly diverse content to a massive listener base through a unique on-demand modality. However, limited data has prevented large-scale computational analysis of the podcast ecosystem....

Podcasts are a popular medium, but data for computational research is limited! We introduce the Structured Podcast Research Corpus (SPoRC - huggingface.co/datasets/bli...), a large, multimodal dataset of English podcasts ๐Ÿงต

arxiv.org/abs/2411.07892

14.11.2024 22:36 โ€” ๐Ÿ‘ 69    ๐Ÿ” 26    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 3

@blitt is following 20 prominent accounts