Myrthe Reuver's Avatar

Myrthe Reuver

@myrthereuver.bsky.social

PhD #NLProc from CLTL, Vrije Universiteit Amsterdam || Interests: Computational Argumentation, Responsible AI, interdisciplinarity, cats || I express my own views

2,120 Followers  |  882 Following  |  59 Posts  |  Joined: 07.11.2023  |  2.0883

Latest posts by myrthereuver.bsky.social on Bluesky

For folks considering grad school in ML, my advice is to explore programs that mix ML with a domain interest. ML programs are wildly oversubscribed while a lot of the fun right now is in figuring out what you can do with it

25.09.2025 03:25 โ€” ๐Ÿ‘ 153    ๐Ÿ” 17    ๐Ÿ’ฌ 8    ๐Ÿ“Œ 7

So, what *is* the @ecir2026.eu Information Retrieval for Good track? by Maria Heuss and Bhaskar Mitra:

https://bhaskar-mitra.github.io/posts/2025/09/01/what-is-ir-for-good/

23.09.2025 05:52 โ€” ๐Ÿ‘ 5    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Super important paper and what a nice interdisciplinary group of co authors!!! ๐Ÿ˜

12.09.2025 12:31 โ€” ๐Ÿ‘ 5    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation".
We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks.
For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations.
Then, we collect 13 million LLM annotations across plausible LLM configurations.
These annotations feed into 1.4 million regressions testing the hypotheses. 
For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions.
Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors.
Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models.
Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.

We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation". We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks. For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations. Then, we collect 13 million LLM annotations across plausible LLM configurations. These annotations feed into 1.4 million regressions testing the hypotheses. For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions. Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors. Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models. Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.

๐Ÿšจ New paper alert ๐Ÿšจ Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.

Paper: arxiv.org/pdf/2509.08825

12.09.2025 10:33 โ€” ๐Ÿ‘ 268    ๐Ÿ” 96    ๐Ÿ’ฌ 6    ๐Ÿ“Œ 21

Curious about my PhD research?
โ–ถ๏ธ Watch a 10-min talk + my defense: lnkd.in/ej_MWDtt
๐Ÿ“˜ Read the dissertation: lnkd.in/efBW97WB
๐Ÿ“ฐ Or read the short news article: lnkd.in/eizZg5VN

09.09.2025 17:56 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Amazing co-authors broadened my perspective and made me a better scientist. Thank you so much for that! ๐Ÿ™

Also to my doctoral committee: @damiantrilling.net , Annette Hautli-Janisz, reshmi G Pillai, @Khalid Al Khatib & Antal van den Bosch: thank you for your thoughtful (and fun!) questions.

09.09.2025 17:56 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

And huge thanks to my incredible paranymphs @urjakh.bsky.social and Selene Baez Santamaria ๐Ÿ‘ฏโ€โ™€๏ธ. From Zoom rooms to the stage, our journey has been full of growth, laughter, and mutual support. โค๏ธ

In fact, all PhDs from @cltl.bsky.social were a great community of support. ๐Ÿ’–

09.09.2025 17:56 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image Post image Post image Post image

Last week, I defended my dissertation "๐˜ˆ ๐˜—๐˜ถ๐˜ป๐˜ป๐˜ญ๐˜ฆ ๐˜ฐ๐˜ง ๐˜—๐˜ฆ๐˜ณ๐˜ด๐˜ฑ๐˜ฆ๐˜ค๐˜ต๐˜ช๐˜ท๐˜ฆ๐˜ด: ๐˜๐˜ฏ๐˜ต๐˜ฆ๐˜ณ๐˜ฅ๐˜ช๐˜ด๐˜ค๐˜ช๐˜ฑ๐˜ญ๐˜ช๐˜ฏ๐˜ข๐˜ณ๐˜บ ๐˜“๐˜ข๐˜ฏ๐˜จ๐˜ถ๐˜ข๐˜จ๐˜ฆ ๐˜›๐˜ฆ๐˜ค๐˜ฉ๐˜ฏ๐˜ฐ๐˜ญ๐˜ฐ๐˜จ๐˜บ ๐˜ง๐˜ฐ๐˜ณ ๐˜™๐˜ฆ๐˜ด๐˜ฑ๐˜ฐ๐˜ฏ๐˜ด๐˜ช๐˜ฃ๐˜ญ๐˜ฆ ๐˜•๐˜ฆ๐˜ธ๐˜ด ๐˜™๐˜ฆ๐˜ค๐˜ฐ๐˜ฎ๐˜ฎ๐˜ฆ๐˜ฏ๐˜ฅ๐˜ข๐˜ต๐˜ช๐˜ฐ๐˜ฏ" at the Vrije Universiteit Amsterdam. *the* moment: #PhDone! ๐ŸŽ“โœจ๐ŸŽ‰

I couldnโ€™t have asked for better supervisors than Antske Fokkens & @suzanv.bsky.social ๐Ÿ’–

09.09.2025 17:56 โ€” ๐Ÿ‘ 21    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Its the final countdown ๐ŸŽถ๐ŸŽค (I am re-reading my dissertation for my defense next week), and actually I realized I had some fun findings hidden in some papers that I myself forgot about! ๐Ÿ˜‚ I donโ€™t know if thatโ€™s a good or bad sign for my defense.. ๐Ÿ˜‚

28.08.2025 21:20 โ€” ๐Ÿ‘ 8    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

But then working as a (university) researcher also comes with a lot of downsides, including insecurity and pressure in random โ€œwhich grant or paper winsโ€ arenas which I do not vibe well with.

But what then? What do?

01.07.2025 15:32 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Btw Iโ€™m serious about this career change comment.

Iโ€™m having a sort of post-PhD career reflection where I realize that these kind of things donโ€™t spark joy for me but seem to be a big part of being an AI dev in industry.

01.07.2025 15:31 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I mean, I have heard people say they enjoy the puzzling aspect and the feeling accomplished when they fix it.

Personally, for me that never weights up against the annoyance and what feels like endless wasted time.

01.07.2025 13:06 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Also, I realize some people really love the โ€œpuzzleโ€ aspect but I donโ€™t like these kind of puzzles. It makes me stressed and annoyed. Maybe I should find another field to work in. ๐Ÿ˜›

01.07.2025 12:31 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

I also really hate it when people who do not work in NLP/LLMs then say โ€œoh no but with conda and a requirements.txt itโ€™s easy, right?โ€, not realizing the morass of ever-new models and architectures I live in.

01.07.2025 12:30 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Realization: I really, really, really hate the part of my job where it is managing conda environments and going through a deep deep cave of issue reports trying to find why something randomly doesnโ€™t work.

01.07.2025 12:28 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0

Chatbots โ€” LLMs โ€” do not know facts and are not designed to be able to accurately answer factual questions. They are designed to find and mimic patterns of words, probabilistically. When theyโ€™re โ€œrightโ€ itโ€™s because correct things are often written down, so those patterns are frequent. Thatโ€™s all.

19.06.2025 11:21 โ€” ๐Ÿ‘ 36983    ๐Ÿ” 11418    ๐Ÿ’ฌ 640    ๐Ÿ“Œ 967

Deadline approaching! Workshop on Computational Linguistics for the Political and Social Sciences #KONVENS2025, archival long-short papers (acl anthology) & non-archival abstracts and phd project descriptions (get feedback from a great community!) ! Deadline: June 13th.

01.06.2025 16:07 โ€” ๐Ÿ‘ 4    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 2

Yay, so happy to host CLIN in Leuven this year! It'll take place on September 12th. Abstract submission deadline on June 13th!

02.06.2025 07:09 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

My love language is sending my academic friends the papers/datasets/posts on social media that I know align with their research interest. ๐Ÿ’–

13.05.2025 11:35 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
GESIS Workshop
Adapters: Lightweight Machine Learning for Social Science Research
02 to 04 June 2025 | Hybrid (Cologne | Online)
Julia Romberg, Vigneshwaran Shankaran, Maximilian Maurer (all GESIS)

GESIS Workshop Adapters: Lightweight Machine Learning for Social Science Research 02 to 04 June 2025 | Hybrid (Cologne | Online) Julia Romberg, Vigneshwaran Shankaran, Maximilian Maurer (all GESIS)

Unlock the power of large language models for your research!
Join this #GESISworkshop with Julia Romberg, @vigneshwaran-s.bsky.social, and @mmmaurer.bsky.social to explore adapters โ€” an efficient alternative to fine-tuning your models.

๐Ÿ”— Book now โžก๏ธ t1p.de/adapters-lig...

@gesis.org

07.05.2025 13:32 โ€” ๐Ÿ‘ 5    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 2

While I am not at #NAACL, I gave a talk about this paper (and more work in my dissertation) last Friday at @annarogers.bsky.social โ€™s lab, very nice discussion there! ๐Ÿ˜ƒ

Paper: lnkd.in/eBBSi6_p
Code: lnkd.in/ezwRGpjP
Slides: lnkd.in/erPP5fpV

Want to know more? Message me!

29.04.2025 11:40 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

๐Ÿ’กWe find that:
- Experts use ๐—ฑ๐—ถ๐—ณ๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐˜ ๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ฒ๐—ด๐—ถ๐—ฒ๐˜€ to assess the LLM;
- Surprisingly, ๐—น๐—ผ๐—ป๐—ด๐—ฒ๐—ฟ ๐—ฎ๐—ป๐—ฑ ๐—บ๐—ผ๐—ฟ๐—ฒ ๐—ป๐˜‚๐—ฎ๐—ป๐—ฐ๐—ฒ๐—ฑ ๐—ฑ๐—ฒ๐—ณ๐—ถ๐—ป๐—ถ๐˜๐—ถ๐—ผ๐—ป๐˜€ ๐—ผ๐—ณ ๐˜€๐—ฒ๐˜…๐—ถ๐˜€๐—บ developed via LLM-human collaboration;
- Some experts improve zero-shot performance with their improved definition.

#NLProc #CSS #computationalsocialscience

29.04.2025 11:40 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Our study consisted of four components:

1) a survey of sexism researchers
two interactive experiments on expert-LLM interactions; 2). assessing the LLM;
3). co-creating of sexism definitions with the LLM;
4) using these definitions in zero-shot detection with LLMs on five sexism datasets: ๐Ÿ‘ฉโ€๐Ÿ”ฌ + ๐Ÿค–

29.04.2025 11:40 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

This work was the outcome of my Junior Research Visit grant at @gesis.org last year, and is the final chapter of my dissertation! ๐Ÿคฉ

Our method allowed us to measure connections between experts, sexism definition, dataset, & classification performance in zero-shot sexism classification. ๐Ÿ”๐Ÿ”ฌ

29.04.2025 11:40 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
A visual description of how our expert survey led to two interactive experiments and finally to definitions that were used in zero-shot sexism detection.

A visual description of how our expert survey led to two interactive experiments and finally to definitions that were used in zero-shot sexism detection.

Expert + LLM = Better Sexism Detection? โœจ

Paper:
๐˜›๐˜ฆ๐˜ญ๐˜ญ ๐˜”๐˜ฆ ๐˜ž๐˜ฉ๐˜ข๐˜ต ๐˜ ๐˜ฐ๐˜ถ ๐˜’๐˜ฏ๐˜ฐ๐˜ธ ๐˜ˆ๐˜ฃ๐˜ฐ๐˜ถ๐˜ต ๐˜š๐˜ฆ๐˜น๐˜ช๐˜ด๐˜ฎ: ๐˜Œ๐˜น๐˜ฑ๐˜ฆ๐˜ณ๐˜ต-๐˜“๐˜“๐˜” ๐˜๐˜ฏ๐˜ต๐˜ฆ๐˜ณ๐˜ข๐˜ค๐˜ต๐˜ช๐˜ฐ๐˜ฏ ๐˜š๐˜ต๐˜ณ๐˜ข๐˜ต๐˜ฆ๐˜จ๐˜ช๐˜ฆ๐˜ด ๐˜ข๐˜ฏ๐˜ฅ ๐˜Š๐˜ฐ-๐˜Š๐˜ณ๐˜ฆ๐˜ข๐˜ต๐˜ฆ๐˜ฅ ๐˜‹๐˜ฆ๐˜ง๐˜ช๐˜ฏ๐˜ช๐˜ต๐˜ช๐˜ฐ๐˜ฏ๐˜ด ๐˜ง๐˜ฐ๐˜ณ ๐˜ก๐˜ฆ๐˜ณ๐˜ฐ-๐˜š๐˜ฉ๐˜ฐ๐˜ต ๐˜š๐˜ฆ๐˜น๐˜ช๐˜ด๐˜ฎ ๐˜‹๐˜ฆ๐˜ต๐˜ฆ๐˜ค๐˜ต๐˜ช๐˜ฐ๐˜ฏ

w: @indiiigo.bsky.social, @matteo-mls.bsky.social y.social & @gabriellalapesa.bsky.social

@ Findings #NAACL2025 !๐Ÿคฉ

29.04.2025 11:40 โ€” ๐Ÿ‘ 10    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2

Oh it is super common in Amsterdam! I see it all the time.

And even in Mexico I have seen it, so it is definitely a worldwide phenomenon, an international vibe working trend.

28.04.2025 08:21 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I am now doing a lot of stuff locally with M1 on the Mac and while an interesting challenge it also has very obvious limitations. ๐Ÿ˜…

13.03.2025 08:52 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Call For Papers The 9th Workshop on Online Abuse and Harms (WOAH) at ACL 2025.

๐Ÿšจ Deadline Extended! ๐Ÿšจ

We've extended the submission deadline to Friday, April 18, 2025 (AoE)!

Please share widely!

www.workshopononlineabuse.com/cfp.html

01.03.2025 08:37 โ€” ๐Ÿ‘ 13    ๐Ÿ” 7    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Microsoft Forms

ACL Rolling Review and the EMNLP PCs are seeking input on the current state of reviewing for *CL conferences. We would love to get your feedback on the current process and how it could be improved. To contribute your ideas and opinions, please follow this link! forms.office.com/r/P68uvwXYqfemn

27.02.2025 17:01 โ€” ๐Ÿ‘ 11    ๐Ÿ” 13    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
CLTL

Join us at the VU Amsterdam's Master's Event, Saturday, March 8, 10:30-15:00!

Learn about our two Master's in Linguistics programs from faculty and students: Language and AI (1 year) and Human Language Technology (2 years).

Programs: home.cltl.labs.vu.nl
Location & details: vu.nl/en/education...

27.02.2025 16:08 โ€” ๐Ÿ‘ 3    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@myrthereuver is following 20 prominent accounts