Yu (Hope) Hou's Avatar

Yu (Hope) Hou

@houyu0930.bsky.social

PhD-ing Clip@UMD https://houyu0930.github.io/

354 Followers  |  348 Following  |  1 Posts  |  Joined: 08.11.2024  |  1.9768

Latest posts by houyu0930.bsky.social on Bluesky

Preview
An Interdisciplinary Approach to Human-Centered Machine Translation Machine Translation (MT) tools are widely used today, often in contexts where professional translators are not present. Despite progress in MT technology, a gap persists between system development and...

What should Machine Translation research look like in the age of multilingual LLMs?

Hereโ€™s one answer from researchers across NLP/MT, Translation Studies, and HCI.
"An Interdisciplinary Approach to Human-Centered Machine Translation"
arxiv.org/abs/2506.13468

18.06.2025 12:08 โ€” ๐Ÿ‘ 16    ๐Ÿ” 7    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

A bit late to announce, but Iโ€™m excited to share that I'll be starting as an assistant professor at UMD CS @univofmaryland.bsky.social this August.

I'll be recruiting PhD students this upcoming cycle for fall 2026. (And if you're a UMD grad student, sign up for my fall seminar!)

13.06.2025 18:20 โ€” ๐Ÿ‘ 65    ๐Ÿ” 3    ๐Ÿ’ฌ 13    ๐Ÿ“Œ 1
Post image

๐Ÿค” What if you gave an LLM thousands of random human-written paragraphs and told it to write something new -- while copying 90% of its output from those texts?

๐ŸงŸ You get what we call a Frankentext!

๐Ÿ’ก Frankentexts are surprisingly coherent and tough for AI detectors to flag.

03.06.2025 15:09 โ€” ๐Ÿ‘ 31    ๐Ÿ” 7    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Book cover - Lost in Automatic Translation: Navigating Life in English in the Age of Language Technologies. By Vered Shwartz. Publisher: Cambridge University Press.

Book cover - Lost in Automatic Translation: Navigating Life in English in the Age of Language Technologies. By Vered Shwartz. Publisher: Cambridge University Press.

I guess that now that I have 1% of my Twitter followers follow me here ๐Ÿ˜…, I should announce it here too for those of you no longer checking Twitter: my nonfiction book, "Lost in Automatic Translation" is coming out this July: lostinautomatictranslation.com. I'm very excited to share it with you!

27.05.2025 19:16 โ€” ๐Ÿ‘ 76    ๐Ÿ” 16    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2
Post image

1/ How can a monolingual English speaker ๐Ÿ‡บ๐Ÿ‡ธ decide if an automatic French translation ๐Ÿ‡ซ๐Ÿ‡ท is good enough to be shared?

Introducing โ“AskQEโ“, an #LLM-based Question Generation + Answering framework that detects critical MT errors and provides actionable feedback ๐Ÿ—ฃ๏ธ

#ACL2025

21.05.2025 17:48 โ€” ๐Ÿ‘ 1    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

We introduce a super simple yet effective strategy to improve video-language alignment (+18%): add hallucination correction in your training objective๐Ÿ‘Œ
Excited to share our accepted paper at ACL: Can Hallucination Correction Improve Video-language Alignment?
Link: arxiv.org/abs/2502.15079

20.05.2025 21:12 โ€” ๐Ÿ‘ 5    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Please help us spread the word! ๐Ÿ“ฃ

FATE is hiring a pre-doc research assistant! We're looking for candidates who will have completed their bachelor's degree (or equivalent) by summer 2025 and want to advance their research skills before applying to PhD programs.

20.05.2025 14:34 โ€” ๐Ÿ‘ 39    ๐Ÿ” 28    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Wisconsin-Madison's tree-filled campus, next to a big shiny lake

Wisconsin-Madison's tree-filled campus, next to a big shiny lake

A computer render of the interior of the new computer science, information science, and statistics building. A staircase crosses an open atrium with visibility across multiple floors

A computer render of the interior of the new computer science, information science, and statistics building. A staircase crosses an open atrium with visibility across multiple floors

I'm joining Wisconsin CS as an assistant professor in fall 2026!! There, I'll continue working on language models, computational social science, & responsible AI. ๐ŸŒฒ๐Ÿง€๐Ÿšฃ๐Ÿปโ€โ™€๏ธ Apply to be my PhD student!

Before then, I'll postdoc for a year in the NLP group at another UW ๐Ÿ”๏ธ in the Pacific Northwest

05.05.2025 19:54 โ€” ๐Ÿ‘ 145    ๐Ÿ” 14    ๐Ÿ’ฌ 16    ๐Ÿ“Œ 3
Preview
Can you map it to English? The Role of Cross-Lingual Alignment in Multilingual Performance of LLMs Large language models (LLMs) pre-trained predominantly on English text exhibit surprising multilingual capabilities, yet the mechanisms driving cross-lingual generalization remain poorly understood. T...

๐Ÿ”ˆ NEW PAPER ๐Ÿ”ˆ
Excited to share my paper that analyzes the effect of cross-lingual alignment on multilingual performance
Paper: arxiv.org/abs/2504.09378 ๐Ÿงต

18.04.2025 15:00 โ€” ๐Ÿ‘ 0    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿšจย New Paper ๐Ÿšจ

1/ We often assume that well-written text is easier to translate โœ๏ธ

But can #LLMs automatically rewrite inputs to improve machine translation? ๐ŸŒ

Hereโ€™s what we found ๐Ÿงต

17.04.2025 01:32 โ€” ๐Ÿ‘ 8    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

A bit of a mess around the conflict of COLM with the ARR (and to lesser degree ICML) reviews release. We feel this is creating a lot of pressure and uncertainty. So, we are pushing our deadlines:

Abstracts due March 22 AoE (+48hr)
Full papers due March 28 AoE (+24hr)

Plz RT ๐Ÿ™

20.03.2025 18:20 โ€” ๐Ÿ‘ 37    ๐Ÿ” 31    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 2

Nice modern NLP (Ai) intro talk slides isabelleaugenstein.github.io/slides/2025_... Isabelle Augenstein

11.03.2025 07:31 โ€” ๐Ÿ‘ 14    ๐Ÿ” 5    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐Ÿšจ Our team at UMD is looking for participants to study how #LLM agent plans can help you answer complex questions

๐Ÿ’ฐ $1 per question
๐Ÿ† Top-3 fastest + most accurate win $50
โณ Questions take ~3 min => $20/hr+

Click here to sign up (please join, reposts appreciated ๐Ÿ™): preferences.umiacs.umd.edu

11.03.2025 14:30 โ€” ๐Ÿ‘ 2    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Our FATE MTL team has been working on a series of projects on anthropomorphic AI systems for which we recently put out a few pre-prints Iโ€™m excited about. While working on these we tried to think carefully not only about key research questions but also how we study and write about these systems

05.03.2025 19:55 โ€” ๐Ÿ‘ 24    ๐Ÿ” 2    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 5

New synthetic benchmark for multilingual long-context LLMs! Surprisingly, English and Chinese are not the top-performing languages (it's Polish!). We also observe a widening gap between high and low-resource languages as context size increases. Check out the paper for more ๐Ÿ‘‡

05.03.2025 18:44 โ€” ๐Ÿ‘ 4    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

๐Ÿšจ New Position Paper ๐Ÿšจ

Multiple choice evals for LLMs are simple and popular, but we know they are awful ๐Ÿ˜ฌ

We complain they're full of errors, saturated, and test nothing meaningful, so why do we still use them? ๐Ÿซ 

Here's why MCQA evals are broken, and how to fix them ๐Ÿงต

24.02.2025 21:03 โ€” ๐Ÿ‘ 45    ๐Ÿ” 13    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Post image

โš ๏ธCurrent methods for generating instruction-following data fall short for long-range reasoning tasks like narrative claim verification.

We present CLIPPER โœ‚๏ธ, a compression-based pipeline that produces grounded instructions for ~$0.5 each, 34x cheaper than human annotations.

21.02.2025 16:25 โ€” ๐Ÿ‘ 21    ๐Ÿ” 8    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2
Screenshot of top half of first page of paper. The paper is titled: "When People are Floods: Analyzing Dehumanizing Metaphors in Immigration Discourse with Large Language Models". The authors are Julia Mendelsohn (University of Chicago) and Ceren Budak (University of Michigan). The top right corner contains a visual showing the sentence "They want immigrants to pour into and infest this country". The caption says: Figure 1: Dehumanizing sentence likening immigrants to the source domain concepts of Water and Vermin via the words "pour" and "infest". 

The abstract text on the left reads: Metaphor, discussing one concept in terms of another, is abundant in politics and can shape how people understand important issues. We develop a computational approach to measure metaphorical language, focusing on immigration discourse on social media. Grounded in qualitative social science research, we identify seven concepts evoked in immigration discourse (e.g. "water" or "vermin"). We propose and evaluate a novel technique that leverages both word-level and document-level signals to measure metaphor with respect to these concepts. We then study the relationship between metaphor, political ideology, and user engagement in 400K US tweets about immigration. While conservatives tend to use dehumanizing metaphors more than liberals, this effect varies widely across concepts. Moreover, creature-related metaphor is associated with more retweets, especially for liberal authors. Our work highlights the potential for computational methods to complement qualitative approaches in understanding subtle and implicit language in political discourse.

Screenshot of top half of first page of paper. The paper is titled: "When People are Floods: Analyzing Dehumanizing Metaphors in Immigration Discourse with Large Language Models". The authors are Julia Mendelsohn (University of Chicago) and Ceren Budak (University of Michigan). The top right corner contains a visual showing the sentence "They want immigrants to pour into and infest this country". The caption says: Figure 1: Dehumanizing sentence likening immigrants to the source domain concepts of Water and Vermin via the words "pour" and "infest". The abstract text on the left reads: Metaphor, discussing one concept in terms of another, is abundant in politics and can shape how people understand important issues. We develop a computational approach to measure metaphorical language, focusing on immigration discourse on social media. Grounded in qualitative social science research, we identify seven concepts evoked in immigration discourse (e.g. "water" or "vermin"). We propose and evaluate a novel technique that leverages both word-level and document-level signals to measure metaphor with respect to these concepts. We then study the relationship between metaphor, political ideology, and user engagement in 400K US tweets about immigration. While conservatives tend to use dehumanizing metaphors more than liberals, this effect varies widely across concepts. Moreover, creature-related metaphor is associated with more retweets, especially for liberal authors. Our work highlights the potential for computational methods to complement qualitative approaches in understanding subtle and implicit language in political discourse.

New preprint!
Metaphors shape how people understand politics, but measuring them (& their real-world effects) is hard.

We develop a new method to measure metaphor & use it to study dehumanizing metaphor in 400K immigration tweets Link: bit.ly/4i3PGm3

#NLP #NLProc #polisky #polcom #compsocialsci
๐Ÿฆ๐Ÿฆ

20.02.2025 19:59 โ€” ๐Ÿ‘ 182    ๐Ÿ” 64    ๐Ÿ’ฌ 6    ๐Ÿ“Œ 11
Post image

New open source reasoning model!

Huginn-3.5B reasons implicitly in latent space ๐Ÿง 

Unlike O1 and R1, latent reasoning doesnโ€™t need special chain-of-thought training data, and doesn't produce extra CoT tokens at test time.

We trained on 800B tokens ๐Ÿ‘‡

10.02.2025 15:58 โ€” ๐Ÿ‘ 11    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2

I have learned a lot in this project! If you are interested in how NLI can be used in VLMs to complement its representation, check it out!

23.01.2025 18:45 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Accepted at #ICLR2025โœจ

๐ŸงWhich languages benefit the most from vocabulary adaptation?

We introduce VocADT, a new vocabulary adaptation method with a vocabulary adapter.
We explore the impact of various adaptation strategies on languages with diverse scripts and fragmentation to answer this question

22.01.2025 19:27 โ€” ๐Ÿ‘ 6    ๐Ÿ” 1    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1
Preview
COLM 2025 Program Committee Volunteer Form This form is to volunteer to serve on the COLM 2025 (https://colmweb.org/) program committee as a reviewer or an area chair.

We are looking for volunteers for reviewing and AC roles! Please sign up here:
forms.gle/rZp67YvMn1hn...

13.01.2025 17:05 โ€” ๐Ÿ‘ 9    ๐Ÿ” 9    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Announcing the 2024 TMLR Outstanding Certification By the 2024 TMLR Outstanding Paper Committee: Michael Bowling, Brian Kingsbury, Andreas Kirsch, Yingzhen Li, and Eleni Triantafillou

๐ŸŽ‰Announcing... the 2024 TMLR Outstanding Certifications! (aka, our "best paper" awards!)

Are you bursting with anticipation to see what they are? Check out this blog post, and read down-thread!! ๐ŸŽ‰๐Ÿงต๐Ÿ‘‡ 1/n
medium.com/@TmlrOrg/ann...

08.01.2025 17:41 โ€” ๐Ÿ‘ 20    ๐Ÿ” 7    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

๐Ÿ“ฃ ๐Ÿ“ฃ Interested in an internship on human-centred AI, human agency, AI evaluation & the impacts of AI systems? Our team/FATE MLT (Su Lin Blodgett, @qveraliao.bsky.social & I) is looking for a few summer interns ๐ŸŽ‰ Apply by Jan 10 for full consideration: jobs.careers.microsoft.com/global/en/jo...

05.12.2024 20:11 โ€” ๐Ÿ‘ 22    ๐Ÿ” 10    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 2
Expanding the Toolkit: Large Language Models in Humanities Research Call for Papers: Expanding the Toolkit: Large Language Models in Humanities Research.

Reminder that we are looking for papers using LLMs for humanities research, for a special issue of the Computational Humanities Research Journal.

Deadline January 31st!

#NLP #DigitalHumanities #CulturalAnalytics

08.01.2025 17:32 โ€” ๐Ÿ‘ 106    ๐Ÿ” 58    ๐Ÿ’ฌ 6    ๐Ÿ“Œ 1
Accessibility guide for authors Accessible submissions are essential to make your work readable by the greatest number of readers. This includes taking steps as you author your document, and making your submitted PDF accessible.

Is your #FAccT2025 paper draft accessible? This is important for both peer review and camera ready publications.

Check out this guide from CHI with tips for both Word and LaTeX users!

sigchi.org/resources/gu...

06.01.2025 21:37 โ€” ๐Ÿ‘ 9    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
meme with three rows.

"this human-ai decision making leads to unfair outcomes" --> "panik"

"let's show explanations to help people be more fair" --> "kalm"

"those explanations are based on proxy features" --> "panik"

meme with three rows. "this human-ai decision making leads to unfair outcomes" --> "panik" "let's show explanations to help people be more fair" --> "kalm" "those explanations are based on proxy features" --> "panik"

The Impact of Explanations on Fairness in Human-AI Decision-Making: Protected vs Proxy Features

Despite hopes that explanations improve fairness, we see that when biases are hidden behind proxy features, explanations may not help.

Navita Goyal, Connor Baumler +al IUIโ€™24
hal3.name/docs/daume23...
>

09.12.2024 11:41 โ€” ๐Ÿ‘ 21    ๐Ÿ” 6    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

starter pack for the Computational Linguistics and Information Processing group at the University of Maryland - get all your NLP and data science here!

go.bsky.app/V9qWjEi

10.12.2024 17:14 โ€” ๐Ÿ‘ 29    ๐Ÿ” 12    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Drake meme.

Top panel (dismissive): "Helping people make sense of model errors just by highlighting confabulations."

Bottom panel (happy): "Also showing them potential alternatives to those confabulations."

Drake meme. Top panel (dismissive): "Helping people make sense of model errors just by highlighting confabulations." Bottom panel (happy): "Also showing them potential alternatives to those confabulations."

Successfully Guiding Humans with Imperfect Instructions by Highlighting Potential Errors and Suggesting Corrections

When generating instructions for people, we can help them by highlighting potential confabs, AND by suggesting alternatives.

by Lingjun Zhao EMNLPโ€™24

hal3.name/docs/daume24...
>

02.12.2024 09:36 โ€” ๐Ÿ‘ 9    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

ASL STEM Wiki: Dataset and Benchmark for Interpreting STEM Articles

We develop a continuous signing dataset for ASL on a STEM subset of Wikipedia; challenges suggest problems related to fingerspelling detection, sign linking, & translation.

by Kayo Yin et al EMNLPโ€™24
hal3.name/docs/daume24...
>

27.11.2024 09:00 โ€” ๐Ÿ‘ 8    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

@houyu0930 is following 20 prominent accounts