CIS, LMU Munich @cislmu - Bluesky Profile

Piper title ("A multi-dialectal dataset for German dialect ASR and dialect-to-standard speech translation") and a map of the German state Bavaria showing where the Franconian, Bavarian, and Alemannic dialect groups are spoken

At #Interspeech2025 I'm going to present Betthupferl, a dataset for German dialect ASR & dialect-to-standard speech translation! We analyze differences between dialectal & Standard German transcriptions, benchmark ASR models, and examine shortcomings of current ASR models & evaluation metrics.

07.08.2025 08:46 — 👍 16 🔁 4 💬 1 📌 1

I’ll be at @icmlconf.bsky.social next week presenting NoLiMa!
Poster on Tue July 15, 4:30–7pm (E-2312).

Happy to grab a coffee and chat about long-context, memory, research, or just to catch up.

I’ll be in Toronto for a couple of days after the conference, let me know if you’re around!

09.07.2025 13:53 — 👍 4 🔁 2 💬 1 📌 0

New paper: How does pretraining on programming languages + English shape LLMs' concept space?
🔍 Do LLMs use English or a programming language as a kind of pivot language?
🧠 Are neurons language-specific or shared across programming languages and English?
🔗 arxiv.org/abs/2506.01074

03.06.2025 17:22 — 👍 6 🔁 1 💬 1 📌 0

📄 Collapse of Dense Retrievers

Accepted to #ACL2025 main conference 🎉🎉

In this paper we uncover major vulnerabilities in dense retrievers like Contriever, showing they favor:
📌 Shorter docs
📌 Early positions
📌 Repeated entities
📌 Literal matches
...all while ignoring the answer's presence!

17.05.2025 20:28 — 👍 9 🔁 2 💬 1 📌 1

🗨️ Beyond “noisy” text: How (and why) to process dialect data
🔎 Keynote talk at WNUT @ NAACL
👥 @verenablaschke.bsky.social
📁 Workshop on noisy and user-generated text (May 3)
The full workshop programme is here: noisy-text.github.io/2025/
bsky.app/profile/vere...

29.04.2025 15:03 — 👍 2 🔁 1 💬 0 📌 0

📝 Privacy-Preserving Federated Learning for Hate Speech Detection
🔎 We present a federated learning system with differential privacy and fine-tuned ALBERT models for low-resource hate speech detection.
👥 Ivo Júnior, @htyeh1, Axel Wisiorek, @HinrichSchuetze
📁 SRW - Long

29.04.2025 15:03 — 👍 1 🔁 0 💬 1 📌 0

📝 Linguistic Features in German BERT: The Role of Morphology, Syntax, and Semantics in Multi-Class Text Classification
🔎 Analysis of linguistic features used by German BERT in a classification task.
👥 Henrike Beyer (University of Dundee), Diego Frassinelli
📁 SRW - Short

29.04.2025 15:03 — 👍 0 🔁 0 💬 1 📌 0

XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples Recent studies indicate that leveraging off-the-shelf or fine-tuned retrievers, capable of retrieving relevant in-context examples tailored to the input query, enhances few-shot in-context learning of...

📝 XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples
🔎 a simple yet effective method to retrieve cross-lingual few-shot examples for multilingual in-context learning
👥 @lpq29743, @andre_t_martins, @HinrichSchuetze
🔗 arxiv.org/abs/2405.05116
📁 Finding - Short

29.04.2025 15:03 — 👍 0 🔁 0 💬 1 📌 0

Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum There is increasing interest in looking at dialects in NLP. However, most work to date still treats dialects as discrete categories. For instance, evaluative work in variation-oriented NLP for English...

📝 Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum
🔎 We predict speech-to-text model performance on dialect continua with geostatistics.
👥 Ryan Soh-Eun Shim, Barbara Plank
🔗 arxiv.org/abs/2410.14589
📁Findings - Long

29.04.2025 15:03 — 👍 0 🔁 0 💬 1 📌 0

A Recipe of Parallel Corpora Exploitation for Multilingual Large Language Models Recent studies have highlighted the potential of exploiting parallel corpora to enhance multilingual large language models, improving performance in both bilingual tasks, e.g., machine translation, an...

📝 A Recipe of Parallel Corpora Exploitation for Multilingual Large Language Models
🔎An investigation of the impact of parallel corpora, ... on the performance of multilingual LLMs.
👥 @lpq29743, @andre_t_martins, @HinrichSchuetze
🔗 arxiv.org/abs/2407.00436
📁Finding - Long

29.04.2025 15:03 — 👍 1 🔁 0 💬 1 📌 0

🥳 We are happy to share that CIS will be presenting 6 papers and talks at #NAACL2025!
Find out about each of them below in the 🧵

29.04.2025 15:03 — 👍 10 🔁 0 💬 1 📌 1

On my way to #NAACL2025 where I'll give a keynote at the noisy text workshop (WNUT), presenting some of the challenges & methods for dialect NLP + also discussing dialect speakers' perspectives!

🗨️ Beyond “noisy” text: How (and why) to process dialect data
🗓️ Saturday, May 3, 9:30–10:30

29.04.2025 09:17 — 👍 27 🔁 7 💬 1 📌 1

CIS, LMU Munich

Latest posts by cislmu.bsky.social on Bluesky

@cislmu is following 20 prominent accounts