Piper title ("A multi-dialectal dataset for German dialect ASR and dialect-to-standard speech translation") and a map of the German state Bavaria showing where the Franconian, Bavarian, and Alemannic dialect groups are spoken
At #Interspeech2025 I'm going to present Betthupferl, a dataset for German dialect ASR & dialect-to-standard speech translation! We analyze differences between dialectal & Standard German transcriptions, benchmark ASR models, and examine shortcomings of current ASR models & evaluation metrics.
07.08.2025 08:46 β π 16 π 4 π¬ 1 π 0
Iβll be at @icmlconf.bsky.social next week presenting NoLiMa!
Poster on Tue July 15, 4:30β7pm (E-2312).
Happy to grab a coffee and chat about long-context, memory, research, or just to catch up.
Iβll be in Toronto for a couple of days after the conference, let me know if youβre around!
09.07.2025 13:53 β π 4 π 2 π¬ 1 π 0
New paper: How does pretraining on programming languages + English shape LLMs' concept space?
π Do LLMs use English or a programming language as a kind of pivot language?
π§ Are neurons language-specific or shared across programming languages and English?
π arxiv.org/abs/2506.01074
03.06.2025 17:22 β π 6 π 1 π¬ 1 π 0
π Collapse of Dense Retrievers
Accepted to #ACL2025 main conference ππ
In this paper we uncover major vulnerabilities in dense retrievers like Contriever, showing they favor:
π Shorter docs
π Early positions
π Repeated entities
π Literal matches
...all while ignoring the answer's presence!
17.05.2025 20:28 β π 9 π 2 π¬ 1 π 1
π¨οΈ Beyond βnoisyβ text: How (and why) to process dialect data
π Keynote talk at WNUT @ NAACL
π₯ @verenablaschke.bsky.social
π Workshop on noisy and user-generated text (May 3)
The full workshop programme is here: noisy-text.github.io/2025/
bsky.app/profile/vere...
29.04.2025 15:03 β π 2 π 1 π¬ 0 π 0
π Privacy-Preserving Federated Learning for Hate Speech Detection
π We present a federated learning system with differential privacy and fine-tuned ALBERT models for low-resource hate speech detection.
π₯ Ivo JΓΊnior, @htyeh1, Axel Wisiorek, @HinrichSchuetze
π SRW - Long
29.04.2025 15:03 β π 1 π 0 π¬ 1 π 0
π Linguistic Features in German BERT: The Role of Morphology, Syntax, and Semantics in Multi-Class Text Classification
π Analysis of linguistic features used by German BERT in a classification task.
π₯ Henrike Beyer (University of Dundee), Diego Frassinelli
π SRW - Short
29.04.2025 15:03 β π 0 π 0 π¬ 1 π 0
XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples
Recent studies indicate that leveraging off-the-shelf or fine-tuned retrievers, capable of retrieving relevant in-context examples tailored to the input query, enhances few-shot in-context learning of...
π XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples
π a simple yet effective method to retrieve cross-lingual few-shot examples for multilingual in-context learning
π₯ @lpq29743, @andre_t_martins, @HinrichSchuetze
π arxiv.org/abs/2405.05116
π Finding - Short
29.04.2025 15:03 β π 0 π 0 π¬ 1 π 0
A Recipe of Parallel Corpora Exploitation for Multilingual Large Language Models
Recent studies have highlighted the potential of exploiting parallel corpora to enhance multilingual large language models, improving performance in both bilingual tasks, e.g., machine translation, an...
π A Recipe of Parallel Corpora Exploitation for Multilingual Large Language Models
πAn investigation of the impact of parallel corpora, ... on the performance of multilingual LLMs.
π₯ @lpq29743, @andre_t_martins, @HinrichSchuetze
π arxiv.org/abs/2407.00436
πFinding - Long
29.04.2025 15:03 β π 1 π 0 π¬ 1 π 0
π₯³ We are happy to share that CIS will be presenting 6 papers and talks at #NAACL2025!
Find out about each of them below in the π§΅
29.04.2025 15:03 β π 9 π 0 π¬ 1 π 1
On my way to #NAACL2025 where I'll give a keynote at the noisy text workshop (WNUT), presenting some of the challenges & methods for dialect NLP + also discussing dialect speakers' perspectives!
π¨οΈ Beyond βnoisyβ text: How (and why) to process dialect data
ποΈ Saturday, May 3, 9:30β10:30
29.04.2025 09:17 β π 27 π 7 π¬ 1 π 1
PhD student in NLP @MaiNLPlab, @CIS, @LMU
#nlp @ Sorbonne UniversitΓ© & CNRS - https://fyvo.github.io
Assistant Professor at @cs.ubc.caβ¬ and βͺ@vectorinstitute.aiβ¬ working on Natural Language Processing. Book: https://lostinautomatictranslation.com/
The Official Bluesky Account of the Department of Language Science and Technology @uni-saarland.de
Imprint: https://www.uni-saarland.de/en/department/lst/about-us/contact.html
Homepage: https://www.lst.uni-saarland.de/
The largest workshop on analysing and interpreting neural networks for NLP.
BlackboxNLP will be held at EMNLP 2025 in Suzhou, China
blackboxnlp.github.io
Researchers on natural language processing / computational linguistics in the beautiful city of Augsburg, Germany (near Munich). Find out more about us on https://hlt-augsburg.github.io.
PhD student, NLP Researcher at @cislmu.bsky.social | Prev. Intern @Adobe.com
PhD student @ TU Munich, Human-centered AI, Computational Social Science
https://sxu3.github.io/
PhD student @ LMU Munich
I like dialects πΊοΈ
#NLProc research group @itu.dk (Copenhagen, Denmark)
π nlpnorth.github.io
ELLIS PhD student in NLP @MaiNLPlab, @CisLmu, @LMU_Muenchen
https://mckysse.github.io/
Postdoc β Aalborg University (CPH) π©π°
#NLPxEducation #NLPxHR #NLP
Past:
π©π° IT University of Copenhagen
π¨π Swiss Federal Institute of Technology Lausanne
πΈπ¬ National University of Singapore
π©πͺ NEC
π³π± University of Groningen
π https://jjzha.github.io/
Research scientist @ sme.do working on ML for remote vital sign sensing. Human-centric and biomedical NLP in my academic pastπNuremberg. π₯οΈ https://leonweber.me
Postdoc in NLP @milanlp.bsky.social (Milan) and @nlpnorth.bsky.social (Copenhagen) | affiliated @aicentre.dk | past @mainlp.bsky.social, Amazon Alexa
π elisabassignana.github.io
Postdoc AI Researcher (NLP) @ ITU Copenhagen
π§ https://mxij.me
PhD student @MaiNLP (Munich AI & NLP lab), @LMU.
Working on reasoning in large language models.
ELLIS PhD Student at MaiNLP
@ellis.eu @mainlp.bsky.social @munichcenterml.bsky.social
Semi-serious runner for Berlin Track Club and my sanity
PhD student @mainlp.bsky.social (@cislmu.bsky.social, LMU Munich). Interested in language variation & change, currently working on NLP for dialects and low-resource languages.
verenablaschke.github.io
Research Intern @Apple MLR β’ PhD Student @Uni Vienna β’ prev: @CisLMU, alumna @DAAD_Germany
#NLProc
π« asst. prof. of compling at university of pittsburgh
past:
ποΈ postdoc @mainlp.bsky.social, LMU Munich
π€ PhD in CompLing from Georgetown
πΊπ» x2 intern @Spotify @SpotifyResearch
https://janetlauyeung.github.io/