Dirk Hovy's Avatar

Dirk Hovy

@dirkhovy.bsky.social

Professor @milanlp.bsky.social for #NLProc, compsocsci, #ML Also at http://dirkhovy.com/

605 Followers  |  324 Following  |  46 Posts  |  Joined: 29.11.2024  |  1.6594

Latest posts by dirkhovy.bsky.social on Bluesky


Table titled โ€œTaxonomy for evaluation of AI in mental health applications,โ€ organized into columns for quality criteria (validity and reliability) and real-world use (implementation and maintenance). Rows distinguish support types: assessment, intervention, and information synthesis. Each cell lists detailed evaluation questions, such as construct and criterion validity, consistency across populations and time, feasibility, effectiveness, usability, acceptability, safety, and unintended consequences, providing a structured framework for assessing AI systems in mental health contexts.

Table titled โ€œTaxonomy for evaluation of AI in mental health applications,โ€ organized into columns for quality criteria (validity and reliability) and real-world use (implementation and maintenance). Rows distinguish support types: assessment, intervention, and information synthesis. Each cell lists detailed evaluation questions, such as construct and criterion validity, consistency across populations and time, feasibility, effectiveness, usability, acceptability, safety, and unintended consequences, providing a structured framework for assessing AI systems in mental health contexts.

๐Ÿ”Ž๐Ÿงฉ ๐—•๐—ฒ๐˜†๐—ผ๐—ป๐—ฑ ๐—•๐—ฒ๐—ป๐—ฐ๐—ต๐—บ๐—ฎ๐—ฟ๐—ธ๐˜€: ๐—›๐—ผ๐˜„ ๐˜๐—ผ ๐—˜๐˜ƒ๐—ฎ๐—น๐˜‚๐—ฎ๐˜๐—ฒ ๐— ๐—ฒ๐—ป๐˜๐—ฎ๐—น ๐—›๐—ฒ๐—ฎ๐—น๐˜๐—ต ๐—”๐—œ ๐—ฅ๐—ฒ๐˜€๐—ฝ๐—ผ๐—ป๐˜€๐—ถ๐—ฏ๐—น๐˜†
AI for mental health is a high-stakes area: its evaluation needs to meet the highest expectations.

The new preprint ๐˜™๐˜ฆ๐˜ด๐˜ฑ๐˜ฐ๐˜ฏ๐˜ด๐˜ช๐˜ฃ๐˜ญ๐˜ฆ ๐˜Œ๐˜ท๐˜ข๐˜ญ๐˜ถ๐˜ข๐˜ต๐˜ช๐˜ฐ๐˜ฏ ๐˜ฐ๐˜ง ๐˜ˆ๐˜ ๐˜ง๐˜ฐ๐˜ณ ๐˜”๐˜ฆ๐˜ฏ๐˜ต๐˜ข๐˜ญ ๐˜๐˜ฆ๐˜ข๐˜ญ๐˜ต๐˜ฉ, written by an interdisciplinary team spanning AI [...]

19.02.2026 09:46 โ€” ๐Ÿ‘ 3    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Honored to give my first keynote at #IRCDL2026 on February 19th.

Iโ€™ll talk about how LLMs have shifted from productivity tools to everyday sources of info & personal guidance and what that means for risk, trust, bias, and alignment.

ircdl2026.unimore.it

17.02.2026 10:22 โ€” ๐Ÿ‘ 14    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
The image displays the words "Politics & Gender" in yellow text on a green background, with the hashtag "#OpenAccess" below it. A vertical yellow stripe is on the left side.

The image displays the words "Politics & Gender" in yellow text on a green background, with the hashtag "#OpenAccess" below it. A vertical yellow stripe is on the left side.

#OpenAccess from @politicsgenderj.bsky.social -

Male Agency? Analyzing Fatherhood Roles in Swedish Parliamentary Documents, 1993โ€“2021 - https://cup.org/40el36q

- Lena Wรคngnerud, Elin Naurin, @dirkhovy.bsky.social, Lorenzo Lupo & Oscar Magnusson

#FirstView

17.02.2026 05:20 โ€” ๐Ÿ‘ 6    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

#MemoryModay #NLProc
@gattanasio.cc et al. study asks 'Is It Worth the (Environmental) Cost?' analyzing continuous training for language models. Balances benefits, environmental impacts, for responsible use. #AI #Sustainability

arxiv.org/pdf/2210.07365

26.01.2026 17:10 โ€” ๐Ÿ‘ 7    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Tutorials and Resources โ€“ CSS @ IP-Paris Site web de l'axe sciences sociales computationnelles du CREST-CNRS. Cours et tutoriels pour l'analyse des donnรฉes numรฉriques en sciences sociales.

What are the main issues discussed in a set of documents?

Weโ€™ve just released a step-by-step BERTopic tutorial.

We also launch a new page, gathering various NLP tutorials for social scientists.
๐Ÿ‘‰ www.css.cnrs.fr/tutorials-an...

27.01.2026 15:16 โ€” ๐Ÿ‘ 48    ๐Ÿ” 21    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 4

Citation is the foundation of academic promotion. Itโ€™s noisy, sure, but its integrity is worth fighting for. Hallucinated citations should be a desk reject.

22.01.2026 01:16 โ€” ๐Ÿ‘ 27    ๐Ÿ” 5    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
CSE 598-004 - Building Small Language Models

The second new class I'm teaching is a very experimental graduate level seminar in CSE: "Building Small Language Models". I taught the grad level NLP class last semester (so fun!) but students wanted moreโ€”which of these new ideas work, and which work for SLMs? jurgens.people.si.umich.edu/CSE598-004/

19.01.2026 21:29 โ€” ๐Ÿ‘ 32    ๐Ÿ” 9    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1
Post image

๐ŸŽ‰ MilaNLP 2025 Wrapped ๐ŸŽ‰
Lots of learning, building , sharing, and growing together ๐ŸŒฑ

#NLProc

20.01.2026 11:15 โ€” ๐Ÿ‘ 10    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Found and added under data/

20.01.2026 11:21 โ€” ๐Ÿ‘ 5    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I included some test cases on GitHub, will look if I still have the ones we used in the paper.

20.01.2026 11:11 โ€” ๐Ÿ‘ 4    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

โณ Deadline approaching! Weโ€™re hiring 2 fully funded postdocs in #NLP.

Join the MilaNLP team and contribute to our upcoming research projects (SALMON & TOLD)

๐Ÿ”— Details + how to apply: milanlproc.github.io/open_positio...

โฐ Deadline: Jan 31, 2026

19.01.2026 17:24 โ€” ๐Ÿ‘ 11    ๐Ÿ” 10    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1

If you are curious about the theoretical background, see

Hovy, D., Berg-Kirkpatrick, T., Vaswani, A., & Hovy E. (2013). Learning Whom to Trust With MACE. In: Proceedings of NAACL-HLT. ACL.

aclanthology.org/N13-1132.pdf

And for even more details:

aclanthology.org/Q18-1040.pdf

N/N

20.01.2026 10:20 โ€” ๐Ÿ‘ 8    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I always wanted to revisit it, port it from Java to Python & extend to continuous data, but never found the time.
Last week, I played around with Cursor โ€“ and got it all done in ~1 hour. ๐Ÿคฏ

If you work with any response data that needs aggregation, give it a tryโ€”and let me know what you think!

4/N

20.01.2026 10:17 โ€” ๐Ÿ‘ 12    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

MACE estimates:
1. Annotator reliability (whoโ€™s consistent?)
2. Item difficulty (which examples spark disagreement?)
3. The most likely aggregate label (the latent โ€œbest guessโ€)

That โ€œside projectโ€ ended up powering hundreds of annotation projects over the years.

3/N

20.01.2026 10:15 โ€” ๐Ÿ‘ 10    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

However, disagreement isnโ€™t just noiseโ€”itโ€™s information. It can mean an item is genuinely hardโ€”or someone wasnโ€™t paying attention. If only you knew whom to trustโ€ฆ

That summer, Taylor Berg-Kirkpatrick, Ashish Vaswani, and I built MACE (Multi-Annotator Competence Estimation).

2/N

20.01.2026 10:14 โ€” ๐Ÿ‘ 13    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
GitHub - dirkhovy/MACE: Multi-Annotator Competence Estimation tool Multi-Annotator Competence Estimation tool. Contribute to dirkhovy/MACE development by creating an account on GitHub.

๐Ÿšจ(Software) Update:

In my PhD, I had a side project to fix an annoying problem: when you ask 5 people to label the same thing, you often get different answers. But in ML (and lots of other analyses), you still need a single aggregated answer. Using the majority vote is easyโ€“but often wrong.

1/N

20.01.2026 10:12 โ€” ๐Ÿ‘ 74    ๐Ÿ” 13    ๐Ÿ’ฌ 6    ๐Ÿ“Œ 0
Postdoctoral Researcher โ€“ NLP (2 positions) | MilaNLP Lab @ Bocconi University Two Postdoctoral Researcher positions โ€“ Deadline January 31st, 2026

New year, new job? If that is your current mantra, check the open postdoc positions with Debora Nozza and me at our lab. Deadline is January 31st.

milanlproc.github.io/open_positio...

19.01.2026 16:13 โ€” ๐Ÿ‘ 11    ๐Ÿ” 10    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Post image

๐Ÿš€ Weโ€™re opening 2 fully funded postdoc positions in #NLP!

Join the MilaNLP team and contribute to our upcoming research projects.

๐Ÿ”— More details: milanlproc.github.io/open_positio...

โฐ Deadline: Jan 31, 2026

18.12.2025 15:29 โ€” ๐Ÿ‘ 19    ๐Ÿ” 13    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 2

Happy to have contributed to this

23.12.2025 13:55 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Countering Hateful and Offensive Speech Online - Open Challenges Flor Miriam Plaza-del-Arco, Debora Nozza, Marco Guerini, Jeffrey Sorensen, Marcos Zampieri. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts.โ€ฆ

#MemoryModay #NLProc Countering Hateful and Offensive Speech Online - Open Challenges" by Plaza-Del-Arco, @debora_nozza, Guerini, Sorensen, Zampieri, 2024 is a tutorial on the challenges and solutions for detecting and mitigating hate speech.

22.12.2025 16:03 โ€” ๐Ÿ‘ 4    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

#MemoryModay #NLProc Uma, A. N. et al. examine AI model training in 'Learning from Disagreement: A Survey'. Disagreement-handling methods' performance is shaped by evaluation methods & dataset traits.

15.12.2025 16:02 โ€” ๐Ÿ‘ 4    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large... Training large language models to follow instructions makes them perform better on a wide range of tasks and generally become more helpful. However, a perfectly helpful model will follow even the most...

#TBT #NLProc #MachineLearning #SafetyFirst 'Safety-Tuned LLaMAs: Improving LLMs Safety' by Bianchi et al. explores training LLMs for safe refusals, warns of over-tuning.

18.12.2025 16:02 โ€” ๐Ÿ‘ 4    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Come work with @deboranozza.bsky.social, me, and the lab in Milan!

19.12.2025 10:58 โ€” ๐Ÿ‘ 6    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Video thumbnail

We don't actually trust AI.
We trust the companies behind it.

As Maria Antoniak notes, every "private" chat flows through corporate systems with long histories of data misuse. If we care about AI ethics, we need to name power, not anthropomorphize models.

15.12.2025 17:04 โ€” ๐Ÿ‘ 54    ๐Ÿ” 13    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 5
Preview
Research Intern - Computational Social Science | Microsoft Careers Research Interns put inquiry and theory into practice. Alongside fellow doctoral candidates and some of the world's best researchers, Research Interns learn, collaborate, and network for life. Researc...

We're hiring interns in the Computational Social Science group at Microsoft Research NYC!

If you're interested in designing AIโ€‘based systems and understanding their impact at both individual and societal scales, apply here by Jan 9, 2026: apply.careers.microsoft.com/careers/job/...

15.12.2025 16:33 โ€” ๐Ÿ‘ 21    ๐Ÿ” 18    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
How to Write Gooder | Dirk Hovy After publishing โ€œ How to professorโ€, several people said they found it helpful, and asked whether I had a similar post on writing. Luckily, we have held an annual writing workshop in the lab for the last few years, so there already was a presentation.

After I shared โ€œHow to professorโ€ last year, some people asked for a similar post on writing. Now I finally got around to typing up our lab's writing workshop slides.
It covers basic advice for research papers and grant applications.
Curious? Read it here: dirkhovy.com/post/2025_11...

12.12.2025 11:49 โ€” ๐Ÿ‘ 12    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Respectful or Toxic? Using Zero-Shot Learning with Language Models to Detect Hate Speech Flor Miriam Plaza-del-arco, Debora Nozza, Dirk Hovy. The 7th Workshop on Online Abuse and Harms (WOAH). 2023.

#TBT #NLProc 'Respectful or Toxic?' by Plaza-del-Arco, @debora & @dirkhovy.bsky.social (2023) explores zero-shot learning for multilingual hate speech detection. Highlights prompt & model choice for accuracy. #AI #LanguageModels #HateSpeechDetection

11.12.2025 16:03 โ€” ๐Ÿ‘ 2    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

#MemoryModay #NLProc 'Leveraging Social Interactions to Detect Misinformation on Social Media' by Fornaciari et al. (2023) uses combined text and network analysis to spot unreliable threads.

08.12.2025 16:03 โ€” ๐Ÿ‘ 3    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

The Center for Information Technology Policy at Princeton invites applications for a Postdoctoral Fellow to work with Andy Guess (Politics/SPIA), Brandon Stewart (Sociology), and me (CS).

puwebp.princeton.edu/AcadHire/app...

Please apply before Sunday, the 13th of December!

09.12.2025 20:51 โ€” ๐Ÿ‘ 16    ๐Ÿ” 10    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models Paul Rรถttger, Haitham Seelawi, Debora Nozza, Zeerak Talat, Bertie Vidgen. Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH). 2022.

#MemoryModay #NLProc 'Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models' by @paul-rottger.bsky.social et al. (2022). A suite of tests for 10 languages.

01.12.2025 16:03 โ€” ๐Ÿ‘ 3    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@dirkhovy is following 20 prominent accounts