CIS, LMU Munich's Avatar

CIS, LMU Munich

@cislmu.bsky.social

Center for Information and Language Processing (CIS): NLP research group at LMU Munich led by Hinrich Schuetze and @barbaraplank.bsky.social

106 Followers  |  33 Following  |  7 Posts  |  Joined: 02.02.2025  |  1.7267

Latest posts by cislmu.bsky.social on Bluesky

Piper title ("A multi-dialectal dataset for German dialect ASR and dialect-to-standard speech translation") and a map of the German state Bavaria showing where the Franconian, Bavarian, and Alemannic dialect groups are spoken

Piper title ("A multi-dialectal dataset for German dialect ASR and dialect-to-standard speech translation") and a map of the German state Bavaria showing where the Franconian, Bavarian, and Alemannic dialect groups are spoken

At #Interspeech2025 I'm going to present Betthupferl, a dataset for German dialect ASR & dialect-to-standard speech translation! We analyze differences between dialectal & Standard German transcriptions, benchmark ASR models, and examine shortcomings of current ASR models & evaluation metrics.

07.08.2025 08:46 β€” πŸ‘ 16    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0
Post image

I’ll be at @icmlconf.bsky.social next week presenting NoLiMa!
Poster on Tue July 15, 4:30–7pm (E-2312).

Happy to grab a coffee and chat about long-context, memory, research, or just to catch up.

I’ll be in Toronto for a couple of days after the conference, let me know if you’re around!

09.07.2025 13:53 β€” πŸ‘ 4    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Post image

New paper: How does pretraining on programming languages + English shape LLMs' concept space?
πŸ” Do LLMs use English or a programming language as a kind of pivot language?
🧠 Are neurons language-specific or shared across programming languages and English?
πŸ”— arxiv.org/abs/2506.01074

03.06.2025 17:22 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

πŸ“„ Collapse of Dense Retrievers

Accepted to #ACL2025 main conference πŸŽ‰πŸŽ‰

In this paper we uncover major vulnerabilities in dense retrievers like Contriever, showing they favor:
πŸ“Œ Shorter docs
πŸ“Œ Early positions
πŸ“Œ Repeated entities
πŸ“Œ Literal matches
...all while ignoring the answer's presence!

17.05.2025 20:28 β€” πŸ‘ 9    πŸ” 2    πŸ’¬ 1    πŸ“Œ 1

πŸ—¨οΈ Beyond β€œnoisy” text: How (and why) to process dialect data
πŸ”Ž Keynote talk at WNUT @ NAACL
πŸ‘₯ @verenablaschke.bsky.social
πŸ“ Workshop on noisy and user-generated text (May 3)
The full workshop programme is here: noisy-text.github.io/2025/
bsky.app/profile/vere...

29.04.2025 15:03 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

πŸ“ Privacy-Preserving Federated Learning for Hate Speech Detection
πŸ”Ž We present a federated learning system with differential privacy and fine-tuned ALBERT models for low-resource hate speech detection.
πŸ‘₯ Ivo JΓΊnior, @htyeh1, Axel Wisiorek, @HinrichSchuetze
πŸ“ SRW - Long

29.04.2025 15:03 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

πŸ“ Linguistic Features in German BERT: The Role of Morphology, Syntax, and Semantics in Multi-Class Text Classification
πŸ”Ž Analysis of linguistic features used by German BERT in a classification task.
πŸ‘₯ Henrike Beyer (University of Dundee), Diego Frassinelli
πŸ“ SRW - Short

29.04.2025 15:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples Recent studies indicate that leveraging off-the-shelf or fine-tuned retrievers, capable of retrieving relevant in-context examples tailored to the input query, enhances few-shot in-context learning of...

πŸ“ XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples
πŸ”Ž a simple yet effective method to retrieve cross-lingual few-shot examples for multilingual in-context learning
πŸ‘₯ @lpq29743, @andre_t_martins, @HinrichSchuetze
πŸ”— arxiv.org/abs/2405.05116
πŸ“ Finding - Short

29.04.2025 15:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum There is increasing interest in looking at dialects in NLP. However, most work to date still treats dialects as discrete categories. For instance, evaluative work in variation-oriented NLP for English...

πŸ“ Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum
πŸ”Ž We predict speech-to-text model performance on dialect continua with geostatistics.
πŸ‘₯ Ryan Soh-Eun Shim, Barbara Plank
πŸ”— arxiv.org/abs/2410.14589
πŸ“Findings - Long

29.04.2025 15:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
A Recipe of Parallel Corpora Exploitation for Multilingual Large Language Models Recent studies have highlighted the potential of exploiting parallel corpora to enhance multilingual large language models, improving performance in both bilingual tasks, e.g., machine translation, an...

πŸ“ A Recipe of Parallel Corpora Exploitation for Multilingual Large Language Models
πŸ”ŽAn investigation of the impact of parallel corpora, ... on the performance of multilingual LLMs.
πŸ‘₯ @lpq29743, @andre_t_martins, @HinrichSchuetze
πŸ”— arxiv.org/abs/2407.00436
πŸ“Finding - Long

29.04.2025 15:03 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸ₯³ We are happy to share that CIS will be presenting 6 papers and talks at #NAACL2025!
Find out about each of them below in the 🧡

29.04.2025 15:03 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1

On my way to #NAACL2025 where I'll give a keynote at the noisy text workshop (WNUT), presenting some of the challenges & methods for dialect NLP + also discussing dialect speakers' perspectives!

πŸ—¨οΈ Beyond β€œnoisy” text: How (and why) to process dialect data
πŸ—“οΈ Saturday, May 3, 9:30–10:30

29.04.2025 09:17 β€” πŸ‘ 27    πŸ” 7    πŸ’¬ 1    πŸ“Œ 1

@cislmu is following 20 prominent accounts