PLC 50 logo
The 50th Penn Linguistics Conference (PLC) is Feb 28βMar 1. PLC brings together students, faculty & researchers interested in languages & linguistics to share new work and connect with peers. We wish everyone a great and productive conference. @pennlinguistics.bsky.social tinyurl.com/verswp3z
27.02.2026 13:42 β
π 0
π 0
π¬ 0
π 0
More LDC data in the LORELEI series: LORELEI Russian Representative Language Pack features monolingual and parallel text, annotations, software tools and more for human language technology development to address emergent situations bit.ly/3MHDr4v
24.02.2026 16:02 β
π 1
π 0
π¬ 0
π 0
International Mother Language Day
21 February
Happy International #MotherLanguageDay This yearβs theme celebrates youth voices on multilingual education β emphasizing that language is central to identity, learning, well-being and participation in society. Letβs celebrate every language, every voice www.unesco.org/en/days/moth...
20.02.2026 17:41 β
π 0
π 0
π¬ 0
π 0
KAIROS Schema Learning Background Source Data: 14K English & Spanish multimodal resources collected by LDC for a Schema Learning Corpus; schemas were used with event extraction to characterize & make predictions about real-world events in the corpus bit.ly/4tPVeYa
19.02.2026 15:22 β
π 0
π 0
π¬ 0
π 0
2022 NIST Language Recognition Evaluation Test and Development Sets: 222 hours of telephone speech and broadcast narrowband speech in 14 languages, plus turnkey evaluation documentation, emphasizing African languages and related English and French dialects bit.ly/4rIEJLs
18.02.2026 14:29 β
π 0
π 0
π¬ 0
π 0
Catch up on 2026 membership discounts, spring data scholarship awards and the release of three new publications in LDCβs February newsletter ldc-upenn.blogspot.com
17.02.2026 15:03 β
π 1
π 0
π¬ 0
π 0
MATERIAL Swahili-English Language Pack has 112 hours of Swahili conversational telephone speech, transcripts, English translations, annotations and queries designed to support cross language information retrieval bit.ly/49SWG3R
23.01.2026 16:46 β
π 0
π 0
π¬ 0
π 0
CALLHOME Japanese Lexicon Second Edition: morphological, phonological and stress information for 80,688 Japanese words from transcripts of telephone conversations between native Japanese speakers, along with a pronunciation dictionary and G2P tools bit.ly/3NlxvhC
22.01.2026 15:28 β
π 0
π 0
π¬ 0
π 0
CALLHOME Japanese Second Edition brings original speech and transcript datasets up to date with new transcripts and revised directories, file formats and documentation bit.ly/49kSdqz
21.01.2026 15:28 β
π 0
π 0
π¬ 0
π 0
LDC welcomes 2026 with its January newsletter featuring three publications and membership renewal information ldc-upenn.blogspot.com
20.01.2026 15:21 β
π 0
π 0
π¬ 0
π 0
LORELEI Sinhala Incident Language Pack: monolingual and parallel text, annotations, software tools and more for human language technology development in this under-resourced language bit.ly/4iVnJP1
18.12.2025 15:47 β
π 0
π 0
π¬ 0
π 0
2021 NIST SRE Test Set: 447 hours of Cantonese, Mandarin, and English conversational telephone speech, audio from video, and selfie image data for development and test, along with answer keys, enrollment, trial files and documentation bit.ly/4q35JV4
17.12.2025 15:43 β
π 0
π 0
π¬ 0
π 0
Check out LDCβs Decemberβs newsletter for the latest news and publications and join us in celebrating the release of our 1000th corpus! ldc-upenn.blogspot.com
16.12.2025 15:38 β
π 0
π 0
π¬ 0
π 0
#18.9 Interspeech 2025 Impressions - Denise Dipersio
Meet Denise Dipersio Associate Director at Linguistic Data Consortium sharing her experience with us. Host: Pascal Hecker Post-production: Wei Xue
Check out ISCA-SACβs Speech Pitch podcast to hear from LDCβs Denise DiPersio #18.9. This session was recorded during Interspeech 2025. Listen to Denise talk about LDCβs past, present and future and LDCβs involvement in Interspeech since the 2009 conference in Brighton. tinyurl.com/488rske4
05.12.2025 15:00 β
π 1
π 0
π¬ 0
π 0
LORELEI Ilocano Incident Language Pack: monolingual and parallel text, annotations, software tools and more for human language technology development in this under-resourced language bit.ly/43moVEw
20.11.2025 14:58 β
π 1
π 0
π¬ 0
π 0
AnnoDIFP CTS Audio and Transcripts: 242.52 hours of English telephone audio and transcripts from 1179 calls involving 327 participants, paired with scores from two self-reported personality assessments bit.ly/47J6JHX
19.11.2025 15:13 β
π 0
π 0
π¬ 0
π 0
LDCβs November newsletter has details on 2026 membership renewal, the spring data scholarship deadline and two new publications ldc-upenn.blogspot.com
18.11.2025 14:46 β
π 0
π 0
π¬ 0
π 0
BOLT CTS CALLFRIEND CALLHOME Egyptian Arabic Transcripts and Translations: transcripts and English translations for 116 hours of BOLT CTS telephone recordings; all speech was transcribed; 99% of the transcripts were translated bit.ly/4ockuEo
21.10.2025 13:30 β
π 0
π 0
π¬ 0
π 0
BOLT CTS CALLFRIEND CALLHOME Egyptian Arabic Audio: 116 hours of telephone speech from 274 conversations between native speakers; developed by LDC for the DARPA BOLT program; contains previously unexposed calls from the CF/CH collections bit.ly/42rsg4S
20.10.2025 14:32 β
π 0
π 0
π¬ 0
π 0
KAIROS Phase 2 Quizlet contains English and Spanish web data annotated for events, relations and arguments, a reference knowledge graph and a knowledge base; quizlets were defined tasks to explore evaluation objectives before the full program evaluation bit.ly/3WqvYYR
17.10.2025 14:47 β
π 0
π 0
π¬ 0
π 0
See LDCβs October newsletter for a preview of 2026 publications, fall data scholarship recipients and three new publications ldc-upenn.blogspot.com
16.10.2025 15:09 β
π 0
π 0
π¬ 0
π 0
More LDC data in the LORELEI series: LORELEI Hindi Representative Language Pack features monolingual and parallel text, annotations, software tools and more for human language technology development to address emergent situations bit.ly/4nCp3ar
22.09.2025 20:37 β
π 1
π 0
π¬ 0
π 0
AIDA Scenario 1 Evaluation Topic Source Data, Annotation & Assessment: 10k+ English, Russian & Ukrainian web docs on political relations between Russia & Ukraine in the 2010s annotated for entities & cross-reference, w/ judgments for scoring submissions bit.ly/3K7ynoA
22.09.2025 16:01 β
π 0
π 0
π¬ 0
π 0
Mixer 7 English Speech has 12,321 hours of telephone conversations, interviews and transcript readings from 222 English speakers, some collected using a 14-microphone array; speaker metadata is included bit.ly/4nvSYkG
19.09.2025 15:24 β
π 0
π 0
π¬ 0
π 0
Check out our September newsletter for three new LDC publications: Mixer 7 English Speech, AIDA Scenario 1 Evaluation Topic Source Data, Annotation and Assessment, and LORELEI Hindi Representative Language Pack ldc-upenn.blogspot.com
18.09.2025 15:08 β
π 0
π 0
π¬ 0
π 0
KAIROS Phase 1 Quizlet contains English and Spanish web data annotated for events, relations and arguments and a reference knowledge graph; quizlets were defined tasks to explore evaluation objectives before the full program evaluation bit.ly/3HvDU7k
26.08.2025 18:39 β
π 1
π 0
π¬ 0
π 0
Abstract Meaning Representation 2.0 - Machine Translations translates 1,371 English sentences from LDCβs AMR 2.0 corpus into Spanish, German, Italian and Mandarin Chinese using Google Translate bit.ly/4n1m8bp
26.08.2025 14:50 β
π 0
π 0
π¬ 0
π 0
Mixer 6 - CHiME 8 Transcribed Calls and Interviews: 80 hours of Mixer 6 English interviews and telephone speech across 13 channels (1063 hours) with transcripts divided into training, development and test sets bit.ly/4oyUCn5
25.08.2025 18:33 β
π 0
π 0
π¬ 0
π 0
LDCβs August newsletter has the last call for fall data scholarship applications and details on new publications: Mixer 6 CHiME 8 Transcribed Calls and Interviews, Abstract Meaning Representation 2.0 β Machine Translations and KAIRO Phase 1 Quizlet ldc-upenn.blogspot.com
25.08.2025 13:09 β
π 1
π 0
π¬ 0
π 0
What a great conference #Interspeech2025! There is still time to stop by our booth and grab a limited-edition TIMIT word poetry magnet. Also donβt miss our colleagueβs oral session on TELVID: A multilingual, multi-modal corpus for speaker recognition at 13:30, A04, Port 1A @interspeech.bsky.social
21.08.2025 09:40 β
π 1
π 0
π¬ 0
π 0