BOLT CTS CALLFRIEND CALLHOME Egyptian Arabic Transcripts and Translations: transcripts and English translations for 116 hours of BOLT CTS telephone recordings; all speech was transcribed; 99% of the transcripts were translated bit.ly/4ockuEo
21.10.2025 13:30 β π 0 π 0 π¬ 0 π 0
BOLT CTS CALLFRIEND CALLHOME Egyptian Arabic Audio: 116 hours of telephone speech from 274 conversations between native speakers; developed by LDC for the DARPA BOLT program; contains previously unexposed calls from the CF/CH collections bit.ly/42rsg4S
20.10.2025 14:32 β π 0 π 0 π¬ 0 π 0
KAIROS Phase 2 Quizlet contains English and Spanish web data annotated for events, relations and arguments, a reference knowledge graph and a knowledge base; quizlets were defined tasks to explore evaluation objectives before the full program evaluation bit.ly/3WqvYYR
17.10.2025 14:47 β π 0 π 0 π¬ 0 π 0
See LDCβs October newsletter for a preview of 2026 publications, fall data scholarship recipients and three new publications ldc-upenn.blogspot.com
16.10.2025 15:09 β π 0 π 0 π¬ 0 π 0
More LDC data in the LORELEI series: LORELEI Hindi Representative Language Pack features monolingual and parallel text, annotations, software tools and more for human language technology development to address emergent situations bit.ly/4nCp3ar
22.09.2025 20:37 β π 1 π 0 π¬ 0 π 0
AIDA Scenario 1 Evaluation Topic Source Data, Annotation & Assessment: 10k+ English, Russian & Ukrainian web docs on political relations between Russia & Ukraine in the 2010s annotated for entities & cross-reference, w/ judgments for scoring submissions bit.ly/3K7ynoA
22.09.2025 16:01 β π 0 π 0 π¬ 0 π 0
Mixer 7 English Speech has 12,321 hours of telephone conversations, interviews and transcript readings from 222 English speakers, some collected using a 14-microphone array; speaker metadata is included bit.ly/4nvSYkG
19.09.2025 15:24 β π 0 π 0 π¬ 0 π 0
Check out our September newsletter for three new LDC publications: Mixer 7 English Speech, AIDA Scenario 1 Evaluation Topic Source Data, Annotation and Assessment, and LORELEI Hindi Representative Language Pack ldc-upenn.blogspot.com
18.09.2025 15:08 β π 0 π 0 π¬ 0 π 0
KAIROS Phase 1 Quizlet contains English and Spanish web data annotated for events, relations and arguments and a reference knowledge graph; quizlets were defined tasks to explore evaluation objectives before the full program evaluation bit.ly/3HvDU7k
26.08.2025 18:39 β π 1 π 0 π¬ 0 π 0
Abstract Meaning Representation 2.0 - Machine Translations translates 1,371 English sentences from LDCβs AMR 2.0 corpus into Spanish, German, Italian and Mandarin Chinese using Google Translate bit.ly/4n1m8bp
26.08.2025 14:50 β π 0 π 0 π¬ 0 π 0
Mixer 6 - CHiME 8 Transcribed Calls and Interviews: 80 hours of Mixer 6 English interviews and telephone speech across 13 channels (1063 hours) with transcripts divided into training, development and test sets bit.ly/4oyUCn5
25.08.2025 18:33 β π 0 π 0 π¬ 0 π 0
LDCβs August newsletter has the last call for fall data scholarship applications and details on new publications: Mixer 6 CHiME 8 Transcribed Calls and Interviews, Abstract Meaning Representation 2.0 β Machine Translations and KAIRO Phase 1 Quizlet ldc-upenn.blogspot.com
25.08.2025 13:09 β π 1 π 0 π¬ 0 π 0
What a great conference #Interspeech2025! There is still time to stop by our booth and grab a limited-edition TIMIT word poetry magnet. Also donβt miss our colleagueβs oral session on TELVID: A multilingual, multi-modal corpus for speaker recognition at 13:30, A04, Port 1A @interspeech.bsky.social
21.08.2025 09:40 β π 1 π 0 π¬ 0 π 0
Good morning #Interspeech2025 Stop by our booth during the coffee breaks today to say hello. Also don't miss today's special session co-organized by LDC on Challenges in Speech Collection, Curation and Annotation in two parts beginning at 13:30, Dock 15. @interspeech.bsky.social
20.08.2025 07:11 β π 1 π 0 π¬ 0 π 0
Good morning Interspeech. It's a great second day. Come by and grab one of our limited giveaways. @interspeech.bsky.social
#Interspeech2025
19.08.2025 07:22 β π 0 π 0 π¬ 0 π 0
We are excited to be here at Interspeech 2025 @interspeech.bsky.social⬠Come see us at the first coffee break today to learn more about the latest developments at LDC. #Interspeech2025
18.08.2025 08:11 β π 1 π 1 π¬ 0 π 0
LDC will be exhibiting at #Interspeech2025, August 17-21 in Rotterdam. Stop by our booth to say hello and learn the latest developments at the Consortium. LDC work will also be featured in presentations, posters and a special session. We look forward to seeing you there. www.interspeech2025.org
12.08.2025 15:51 β π 0 π 0 π¬ 0 π 0
From the LORELEI companion project: LoReHLT Uzbek Representative Language Pack features monolingual and parallel text, annotations, audio recordings, software tools and more for human language technology development to address emergent situations bit.ly/4lL0zuL
22.07.2025 14:08 β π 1 π 0 π¬ 0 π 0
Penn Parsed Corpora of Historical English Second Release: POS-tagged & syntactically annotated British English text (1100 CE -1914 CE); updates the 2020 release with new annotation, revised guidelines, philological information & the Corpus2 search tool bit.ly/46zR1hR
18.07.2025 14:34 β π 0 π 0 π¬ 0 π 0
AnnoDIFP Session Audio and Transcripts: 438.34 hours of English audio and transcripts from in-person interviews of 366 participants paired with scores from two self-reported personality assessments bit.ly/4nEYQJr
17.07.2025 15:16 β π 0 π 0 π¬ 0 π 0
Check out the July newsletter for Fall 2025 data scholarship application deadlines & 3 new publications: AnnoDIFP Session Audio and Transcripts, Penn Parsed Corpora of Historical English Second Release & LoReHLT Uzbek Representative Language Pack ldc-upenn.blogspot.com
16.07.2025 14:29 β π 0 π 0 π¬ 0 π 0
KAIROS Schema Learning Complex Event Annotation has English and Spanish web text, audio, video and image data labeled for 93 real-world complex events with event, relation and argument annotations linking to document provenance bit.ly/4jNrDIq
25.06.2025 13:07 β π 0 π 0 π¬ 0 π 0
IWSLT 2022 - 2023 Shared Task Training, Development and Test Set: 210 hours of Tunisian Arabic conversational telephone speech, transcripts, English translations, speaker metadata, and documentation used in IWSLT dialectal speech and low resource tracks bit.ly/3HEO4lL
24.06.2025 14:24 β π 0 π 0 π¬ 0 π 0
Chinese Sentence Pattern Structure Treebank contains 5,016 sentences and 119,627 tokens from modern and ancient Chinese works annotated for lexical sense, syntactic structure and inter-clause relations bit.ly/4kZVGh3
23.06.2025 13:57 β π 0 π 0 π¬ 0 π 0
LDCβs June newsletter has the latest on three new publications: Chinese Sentence Pattern Structure Treebank, IWSLT 2022-2023 Shared Task Training, Development and Test Set, and KAIROS Schema Learning Complex Event Annotation ldc-upenn.blogspot.com
17.06.2025 13:39 β π 0 π 0 π¬ 0 π 0
BOLT CTS CALLFRIEND CALLHOME Mandarin Chinese Transcripts and Translations: transcripts and English translations for 93 hours of BOLT CTS telephone recordings; all speech was transcribed; 89% of the transcripts were translated bit.ly/4jKul2j
20.05.2025 13:29 β π 0 π 0 π¬ 0 π 0
BOLT CTS CALLFRIEND CALLHOME Mandarin Chinese Audio: 93 hours of telephone speech from 236 conversations between native speakers; developed by LDC for the DARPA BOLT program; contains previously unexposed calls from the CF/CH collections bit.ly/4kbsBPy
19.05.2025 14:12 β π 0 π 0 π¬ 0 π 0
Check out LDCβs May newsletter for two new companion releases developed by LDC to support the DARPA BOLT program, BOLT CTS CALLFRIEND CALLHOME Mandarin Chinese Audio and BOLT CTS CALLFRIEND CALLHOME Mandarin Chinese Transcripts and Translations ldc-upenn.blogspot.com
16.05.2025 14:07 β π 0 π 0 π¬ 0 π 0
MATERIAL Kazakh-English Language Pack has 57 hours of Kazakh conversational telephone speech, transcripts, English translations, annotations and queries designed to support cross language information retrieval bit.ly/42cwe01
18.04.2025 13:53 β π 0 π 0 π¬ 0 π 0
DEFT Spanish Light and Rich ERE Annotation: 158 Latin American discussion forum and Spanish newswire documents annotated for entities, relations and events, including conference (light) and event hoppers (rich), developed by LDC for the DARPA DEFT program bit.ly/3YcGCnd
17.04.2025 14:39 β π 0 π 0 π¬ 0 π 0