Linguistic Data Consortium's Avatar

Linguistic Data Consortium

@ldcupenn.bsky.social

LDC creates and distributes language resources to universities, labs, companies and libraries for linguistic education, research and technology development.

26 Followers  |  1 Following  |  14 Posts  |  Joined: 12.03.2025  |  1.4522

Latest posts by ldcupenn.bsky.social on Bluesky

From the LORELEI companion project: LoReHLT Uzbek Representative Language Pack features monolingual and parallel text, annotations, audio recordings, software tools and more for human language technology development to address emergent situations bit.ly/4lL0zuL

22.07.2025 14:08 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Penn Parsed Corpora of Historical English Second Release: POS-tagged & syntactically annotated British English text (1100 CE -1914 CE); updates the 2020 release with new annotation, revised guidelines, philological information & the Corpus2 search tool bit.ly/46zR1hR

18.07.2025 14:34 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

AnnoDIFP Session Audio and Transcripts: 438.34 hours of English audio and transcripts from in-person interviews of 366 participants paired with scores from two self-reported personality assessments bit.ly/4nEYQJr

17.07.2025 15:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Check out the July newsletter for Fall 2025 data scholarship application deadlines & 3 new publications: AnnoDIFP Session Audio and Transcripts, Penn Parsed Corpora of Historical English Second Release & LoReHLT Uzbek Representative Language Pack ldc-upenn.blogspot.com

16.07.2025 14:29 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

KAIROS Schema Learning Complex Event Annotation has English and Spanish web text, audio, video and image data labeled for 93 real-world complex events with event, relation and argument annotations linking to document provenance bit.ly/4jNrDIq

25.06.2025 13:07 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

IWSLT 2022 - 2023 Shared Task Training, Development and Test Set: 210 hours of Tunisian Arabic conversational telephone speech, transcripts, English translations, speaker metadata, and documentation used in IWSLT dialectal speech and low resource tracks bit.ly/3HEO4lL

24.06.2025 14:24 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Chinese Sentence Pattern Structure Treebank contains 5,016 sentences and 119,627 tokens from modern and ancient Chinese works annotated for lexical sense, syntactic structure and inter-clause relations bit.ly/4kZVGh3

23.06.2025 13:57 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

LDC’s June newsletter has the latest on three new publications: Chinese Sentence Pattern Structure Treebank, IWSLT 2022-2023 Shared Task Training, Development and Test Set, and KAIROS Schema Learning Complex Event Annotation ldc-upenn.blogspot.com

17.06.2025 13:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

BOLT CTS CALLFRIEND CALLHOME Mandarin Chinese Transcripts and Translations: transcripts and English translations for 93 hours of BOLT CTS telephone recordings; all speech was transcribed; 89% of the transcripts were translated bit.ly/4jKul2j

20.05.2025 13:29 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

BOLT CTS CALLFRIEND CALLHOME Mandarin Chinese Audio: 93 hours of telephone speech from 236 conversations between native speakers; developed by LDC for the DARPA BOLT program; contains previously unexposed calls from the CF/CH collections bit.ly/4kbsBPy

19.05.2025 14:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Check out LDC’s May newsletter for two new companion releases developed by LDC to support the DARPA BOLT program, BOLT CTS CALLFRIEND CALLHOME Mandarin Chinese Audio and BOLT CTS CALLFRIEND CALLHOME Mandarin Chinese Transcripts and Translations ldc-upenn.blogspot.com

16.05.2025 14:07 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

MATERIAL Kazakh-English Language Pack has 57 hours of Kazakh conversational telephone speech, transcripts, English translations, annotations and queries designed to support cross language information retrieval bit.ly/42cwe01

18.04.2025 13:53 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

DEFT Spanish Light and Rich ERE Annotation: 158 Latin American discussion forum and Spanish newswire documents annotated for entities, relations and events, including conference (light) and event hoppers (rich), developed by LDC for the DARPA DEFT program bit.ly/3YcGCnd

17.04.2025 14:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The April newsletter introduces LDC’s upgraded website, welcomes Bluesky to our social media channels and has the latest on LDC’s two new publications, DEFT Spanish Light and Rich ERE Annotation and MATERIAL Kazakh-English Language Pack ldc-upenn.blogspot.com

16.04.2025 14:27 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@ldcupenn is following 1 prominent accounts