Anna Wegmann's Avatar

Anna Wegmann

@annawegmann.bsky.social

Postdoctoral Researcher at Utrecht University | Including different styles in NLP | she/her https://annawegmann.github.io/

875 Followers  |  411 Following  |  46 Posts  |  Joined: 18.11.2024  |  2.4063

Latest posts by annawegmann.bsky.social on Bluesky

aclanthology.org/2025.finding...

07.11.2025 10:19 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

If you're attending #EMNLP2025, we'll be presenting virtually in Gather Session 1 on Nov 5 at 4pm PT. Come say hello!

w/ the wonderful:
@mellymeldubs.bsky.social
Anna Preus,
@mariaa.bsky.social

Paper: arxiv.org/abs/2510.16713
Code/Data: github.com/darthbhyrava/wisp
Dash: poetry.darthbhyrava.com

31.10.2025 15:36 β€” πŸ‘ 8    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

What if a single model could recognize an author's writing style no matter what language they wrote in? 🌍✍️ Our new #EMNLP2025 paper explores multilingual authorship representation, showing how training across 36 languages can sharpen stylistic signals and reduce topic bias.
πŸ‘‡πŸ§΅

06.11.2025 05:42 β€” πŸ‘ 18    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Preview
We Need to Measure Data Diversity in NLP β€” Better and Broader Dong Nguyen, Esther Ploeger. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.

New opinion paper out with Esther Ploeger (Aalborg University): We Need to Measure Data Diversity in NLP β€” Better and Broader at #EMNLP2025 (main) aclanthology.org/2025.emnlp-m...

04.11.2025 15:43 β€” πŸ‘ 13    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0

aclanthology.org/2025.emnlp-m...
aclanthology.org/2025.emnlp-m...
aclanthology.org/2025.emnlp-m...
aclanthology.org/2025.finding...
aclanthology.org/2025.emnlp-m...
aclanthology.org/2025.finding...
aclanthology.org/2025.emnlp-m...
aclanthology.org/2025.emnlp-m...
aclanthology.org/2025.emnlp-m...

04.11.2025 13:42 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Lot's of exciting work on linguistic style this year at #EMNLP2025 #EMNLP! Including work on machine-text detection, authorship representation and more

🧡 with anthology links below
πŸ“£ with an open call to everyone to add style work that's missing

04.11.2025 13:42 β€” πŸ‘ 9    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

I have a new blog post about the so-called β€œtokenizer-free” approach to language modeling and why it’s not tokenizer-free at all. I also talk about why people hate tokenizers so much!

25.09.2025 15:14 β€” πŸ‘ 60    πŸ” 15    πŸ’¬ 5    πŸ“Œ 2
Post image

I successfully defended my PhD in Dutch fashion and required a PhD certificate in Latin. Thank you to the amazing people that got me here, a.o. @dongng.bsky.social and the ones I blur here.

22.10.2025 14:20 β€” πŸ‘ 33    πŸ” 2    πŸ’¬ 1    πŸ“Œ 1

Come join next Wednesday if you want to rant about society's love-hate relationship with LLMs!

16.10.2025 09:32 β€” πŸ‘ 13    πŸ” 7    πŸ’¬ 0    πŸ“Œ 0

one of the other entrances was closed off yesterday, increasing my commute from front door to office by another 10 minutes

20.08.2025 06:29 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image Post image Post image Post image

Is this the Dutch budget cuts or does utrecht uni really not want me to come to the office? My highlight is the door that has been broken for weeks, with the only change being a laminated piece of paper saying I should enter uni maze through two other buildings.

20.08.2025 06:28 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Tussen MΓΆnchengladbach en Venlo rijden geen treinen. De dienstregeling wordt gehandhaafd door een bus. De bus codeswitcht: een monitor waarop staat β€žde bus hΓ€ltβ€œ.

Tussen MΓΆnchengladbach en Venlo rijden geen treinen. De dienstregeling wordt gehandhaafd door een bus. De bus codeswitcht: een monitor waarop staat β€žde bus hΓ€ltβ€œ.

Tussen MΓΆnchengladbach en Venlo rijden geen treinen. De dienstregeling wordt gehandhaafd door een bus. De bus codeswitcht

01.08.2025 08:48 β€” πŸ‘ 18    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

work with and by @yupeidu.bsky.social‬

05.08.2025 15:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Tokenization is Sensitive to Language Variation Variation in language is ubiquitous and often systematically linked to regional, social, and contextual factors. Tokenizers split texts into smaller units and might behave differently for less common ...

Tokenization is Sensitive to Language Variation arxiv.org/abs/2502.15343

05.08.2025 15:37 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Disentangling the Roles of Representation and Selection in Data Pruning. arxiv.org/abs/2507.03648

On Support Samples of Next Word Prediction. arxiv.org/abs/2506.04047

05.08.2025 15:37 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

VAQUUM: Are Vague Quantifiers Grounded in Visual Data? arxiv.org/pdf/2502.11874

05.08.2025 15:37 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Utrecht is back from #ACL2025! We had a blast.

I should have posted this before but here are some papers from people in our group that were presented at ACL.

05.08.2025 15:37 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I'm sadly not at #ACL2025, but the work on tokenization seem to continue to explode. Here are the tokenization related papers I could find, in no particular order. Let me know if I missed any.

30.07.2025 14:03 β€” πŸ‘ 11    πŸ” 4    πŸ’¬ 2    πŸ“Œ 0
Post image

Since people at #ACL2025 are very interested in tokenization, a reminder to join the discussion on discord set up by @mcognetta.bsky.social

29.07.2025 12:52 β€” πŸ‘ 9    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Anyone tried the kiss the cook lunch place at #ACL2025?

28.07.2025 12:57 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I think accepted

28.07.2025 12:56 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I will present our #ACL2025 paper Tokenization is Sensitive to Language Variation in the poster session after Tuesday's keynote, 10.30 - 12.00 in Hall 4/5

28.07.2025 05:43 β€” πŸ‘ 10    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@philipwitti.bsky.social will be presenting our paper "Tokenisation is NP-Complete" at #ACL2025 😁 Come to the language modelling 2 session (Wednesday morning, 9h~10h30) to learn more about how challenging tokenisation can be!

27.07.2025 09:41 β€” πŸ‘ 7    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

We are presenting this paper at #ACL2025 😁 Find us at poster session 4 (Wednesday morning, 11h~12h30) to learn more about tokenisation bias!

27.07.2025 11:59 β€” πŸ‘ 11    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Im at #ACL2025 this week.

Happy to chat about measuring linguistic style, data diversity, creating synthetic data for analyzing (L)LMs, authorship attribution, paraphrases and tokenizers.

Let’s chat if you’re around

27.07.2025 17:30 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

The #ACL2025 #ACL2025NLP feed is up and running! It matches both hashtags and any posts from or mentions of @aclmeeting.bsky.social

Pin it to your home πŸ“Œ and enjoy!

bsky.app/profile/did:...

17.07.2025 11:15 β€” πŸ‘ 48    πŸ” 14    πŸ’¬ 2    πŸ“Œ 0
Post image

Who's presenting on subjectivity in annotation (human label variation, learning from disagreement, perspectivism) at #ACL2025?

papers by e.g. @liweijiang.bsky.social @tiancheng.bsky.social @gabriellalapesa.bsky.social @romanklinger.de

keynote @verenarieser.bsky.social

link to full list below ‡️

24.07.2025 16:57 β€” πŸ‘ 12    πŸ” 2    πŸ’¬ 2    πŸ“Œ 0

I love it.

24.07.2025 15:48 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I'll be attending ACL 2025 in Vienna! Looking forward to seeing people there!πŸ˜ŠπŸ‡¦πŸ‡Ή We are going to present 'LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks' aclanthology.org/2025.acl-sho... #acl2025 #acl2025nlp

24.07.2025 12:34 β€” πŸ‘ 10    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

My fault for liking to prepare early in our last minute, deadline-driven community ...

22.07.2025 12:34 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@annawegmann is following 20 prominent accounts