Lingjun Zhao @lingjunz - Bluesky Profile

A diagram illustrating pointwise scoring with a large language model (LLM). At the top is a text box containing instructions: 'You will see the text of a political advertisement about a candidate. Rate it on a scale ranging from 1 to 9, where 1 indicates a positive view of the candidate and 9 indicates a negative view of the candidate.' Below this is a green text box containing an example ad text: 'Joe Biden is going to eat your grandchildren for dinner.' An arrow points down from this text to an illustration of a computer with 'LLM' displayed on its monitor. Finally, an arrow points from the computer down to the number '9' in large teal text, representing the LLM's scoring output. This diagram demonstrates how an LLM directly assigns a numerical score to text based on given criteria

LLMs are often used for text annotation, especially in social science. In some cases, this involves placing text items on a scale: eg, 1 for liberal and 9 for conservative

There are a few ways to accomplish this task. Which work best? Our new EMNLP paper has some answers🧵
arxiv.org/pdf/2507.00828

27.10.2025 14:59 — 👍 26 🔁 8 💬 1 📌 0

Glad to hear ❤️

14.10.2025 18:36 — 👍 0 🔁 0 💬 0 📌 0

📄 Paper: arxiv.org/abs/2505.19299
💻 Code: github.com/lingjunzhao/PE…
🙏 Huge thanks to my advisor @haldaume3.bsky.social and everyone who shared insights!

14.10.2025 16:28 — 👍 0 🔁 0 💬 0 📌 0

🚨 New #EMNLP2025 (main) paper!
LLMs often produce inconsistent explanations (62–86%), hurting faithfulness and trust in explainable AI.
We introduce PEX consistency, a measure for explanation consistency,
and show that optimizing it via DPO improves faithfulness by up to 9.7%.

14.10.2025 16:28 — 👍 3 🔁 1 💬 2 📌 0

An Interdisciplinary Approach to Human-Centered Machine Translation Machine Translation (MT) tools are widely used today, often in contexts where professional translators are not present. Despite progress in MT technology, a gap persists between system development and...

What should Machine Translation research look like in the age of multilingual LLMs?

Here’s one answer from researchers across NLP/MT, Translation Studies, and HCI.
"An Interdisciplinary Approach to Human-Centered Machine Translation"
arxiv.org/abs/2506.13468

18.06.2025 12:08 — 👍 18 🔁 7 💬 1 📌 0

QANTA Logo: Question Answering is not a Trivial Activity [Humans and computers competing on a buzzer]

Do you like trivia? Can you spot when AI is feeding you BS? Or can you make AIs turn themselves inside out? Then on June 14 at College Park (or June 21 online), we have a competition for you.

05.06.2025 16:17 — 👍 0 🔁 1 💬 1 📌 0

Super thankful for my wonderful collaborators: @pcascanteb.bsky.social @haldaume3.bsky.social Mingyang Xie, Kwonjoon Lee

20.05.2025 21:12 — 👍 1 🔁 0 💬 0 📌 0

We introduce a super simple yet effective strategy to improve video-language alignment (+18%): add hallucination correction in your training objective👌
Excited to share our accepted paper at ACL: Can Hallucination Correction Improve Video-language Alignment?
Link: arxiv.org/abs/2502.15079

20.05.2025 21:12 — 👍 6 🔁 3 💬 1 📌 0

For the ACL ARR review, I’ve heard complaints about the workload—some reviewers have 16 papers. Even though I only need to write 1 rebuttal and respond to 4, it still feels substantial. For those managing more (thank you!), it can be difficult to thoroughly engage with every rebuttal.

30.03.2025 20:56 — 👍 2 🔁 0 💬 0 📌 0

Page one of diff.

Page 2 of diff.

Page 3 of diff.

There is a new version of the Research Plan for NIST's AI Safety Consortium (AISIC) in response to EOs. I did a diff.

Out: safety, responsibility, sociotechnical, fairness, working w fed agencies, authenticating content, watermarking, RN of CBRN, autonomous replication, ctrl of physical systems
>

04.03.2025 12:30 — 👍 24 🔁 13 💬 2 📌 0

Causal Effect of Group Diversity on Redundancy and Coverage in Peer-Reviewing A large host of scientific journals and conferences solicit peer reviews from multiple reviewers for the same submission, aiming to gather a broader range of perspectives and mitigate individual biase...

This is my first time serving as an AC for a big conference.

Just read this great work by Goyal et al. arxiv.org/abs/2411.11437

I'm optimizing for high coverage and low redundancy—assigning reviewers based on relevant topics or affinity scores alone feels off. Seniority and diversity matter!

05.12.2024 00:44 — 👍 5 🔁 2 💬 1 📌 0

Lingjun Zhao

Latest posts by lingjunz.bsky.social on Bluesky

@lingjunz is following 20 prominent accounts