Come to talk with us today about the evaluation of long form multilingual generation at the second poster session #COLM2025
📍4:30–6:30 PM / Room 710 – Poster #8
@markar.bsky.social
#nlp researcher interested in evaluation including: multilingual models, long-form input/output, processing/generation of creative texts previous: postdoc @ umass_nlp phd from utokyo https://marzenakrp.github.io/
Come to talk with us today about the evaluation of long form multilingual generation at the second poster session #COLM2025
📍4:30–6:30 PM / Room 710 – Poster #8
Off to #COLM fake Fuji looks really good today.
本物は下からしか見たことがないが、今日は少なくとも偽物が上から見えて嬉しい。
I feel like it was worth waking up early
06.10.2025 14:35 — 👍 4 🔁 0 💬 0 📌 0Wait how come, I'm flying direct at 7am..
06.10.2025 12:00 — 👍 0 🔁 0 💬 1 📌 0When reading AI reasoning text (aka CoT), we (humans) form a narrative about the underlying computation process, which we take as a transparent explanation of model behavior. But what if our narratives are wrong? We measure that and find it usually is.
Now on arXiv: arxiv.org/abs/2508.16599
📊 Preliminary ranking of WMT 2025 General Machine Translation benchmark is here!
But don't draw conclusions just yet - automatic metrics are biased for techniques like metric as a reward model or MBR. The official human ranking will be part of General MT findings at WMT.
arxiv.org/abs/2508.14909
Happy to see this work accepted to #EMNLP2025! 🎉🎉🎉
20.08.2025 20:49 — 👍 12 🔁 1 💬 0 📌 0✨We are thrilled to announce that over 3200 papers have been accepted to #EMNLP2025 ✨
This includes over 1800 main conference papers and over 1400 papers in findings!
Congratulations to all authors!! 🎉🎉🎉
The Echoes in AI paper showed quite the opposite with also a story continuation setup.
Additionally, we present evidence that both *syntactic* and *discourse* diversity measures show strong homogenization that lexical and cosine used in this paper do not capture.
Definitely!
16.08.2025 17:46 — 👍 1 🔁 0 💬 0 📌 0At the same time I wish that whoever sparked this interest in data distribution would also help them with the design...
16.08.2025 03:24 — 👍 1 🔁 0 💬 1 📌 0Absolutely! Looking forward to seeing QUDsim at COLM!
16.08.2025 03:19 — 👍 1 🔁 0 💬 2 📌 0The issue is always what, which humans in what circumstances
15.08.2025 05:33 — 👍 2 🔁 0 💬 1 📌 0I think there are quite a few undergraduate students on this preprint and maybe there was a need for a bit more mentoring. The comparison to writingprompts is just one of the issues (amateur writers in very different conditions than normal writing + very short outputs).
15.08.2025 05:31 — 👍 3 🔁 0 💬 1 📌 0Check out the full leaderboard here: novelchallenge.github.io
We'll be updating the dataset with new books and claims within the next few months!
Screenshot of benchmark with gpt-5 on top with 68.46% accuracy.
GPT-5 lands first place on NoCha, our long-context book understanding benchmark.
That said, this is a tiny improvement (~1%) over o1-preview, which was released almost one year ago. Have long-context models hit a wall?
Accuracy of human readers is >97%... Long way to go!
🗓️29 July, 4 PM: Automated main concept generation for narrative discourse assessment in aphasia. w/
@marisahudspeth.bsky.social, Polly Stokes, Jacquie Kurland, and @brenocon.bsky.social
📍Hall 4/5.
Come by to chat about argumentation, narrative texts, policy & law, and beyond! #ACL2025NLP
Excited to present two papers at #ACL2025!
🗓️30 July, 11 AM: 𝛿-Stance: A Large-Scale Real World Dataset of Stances in Legal Argumentation. w/ Douglas Rice and @brenocon.bsky.social
📍At Hall 4/5. 🧵👇
Kaiserslautern, Germany
📣 Life update: Thrilled to announce that I’ll be starting as faculty at the Max Planck Institute for Software Systems this Fall!
I’ll be recruiting PhD students in the upcoming cycle, as well as research interns throughout the year: lasharavichander.github.io/contact.html
Congratulations 👏🎉
23.07.2025 01:11 — 👍 1 🔁 0 💬 1 📌 0For EMNLP 2025’s special theme of "Advancing our Reach: Interdisciplinary Recontextualization of NLP", we are organizing a panel of experts, and would like input from the community at large as we prepare. Please take a moment to fill in this survey: forms.office.com/r/pWFFA0Gss1
17.07.2025 20:24 — 👍 8 🔁 5 💬 0 📌 0A new definition for AGI just dropped, and it is a bad one.
12.07.2025 18:04 — 👍 170 🔁 27 💬 8 📌 5Now accepted to #COLM2025 @colmweb.org
🇨🇦🎉
Had to always apply for IRB in Japan (UTokyo), though the process was much longer than in the US (committee was meeting only few times a year and you were almost guaranteed to be asked to correct something which extended the process). Could easily take 2-3 months.
07.07.2025 23:00 — 👍 2 🔁 0 💬 1 📌 0What should Machine Translation research look like in the age of multilingual LLMs?
Here’s one answer from researchers across NLP/MT, Translation Studies, and HCI.
"An Interdisciplinary Approach to Human-Centered Machine Translation"
arxiv.org/abs/2506.13468
Extremely interesting new task that gives a model a literary text, plus a critical essay about it — with one quotation masked. Can the model figure out which quotation from the original work would support these claims? Best-performing models exceed human readers. #MLSky arxiv.org/abs/2506.030...
04.06.2025 15:50 — 👍 50 🔁 7 💬 3 📌 2Tired of AI slop? Our work on "Frankentexts" shows how LLMs can stitch together random fragments of human writing into coherent, relevant responses to arbitrary prompts.
Frankentexts are weirdly creative, and they also pose problems for AI detectors: are they AI? human? More 👇
🤔 What if you gave an LLM thousands of random human-written paragraphs and told it to write something new -- while copying 90% of its output from those texts?
🧟 You get what we call a Frankentext!
💡 Frankentexts are surprisingly coherent and tough for AI detectors to flag.
Interested in crosslingual memorization? Check out our new work :) Congrats to Emir, Alisha, and Minh for putting together their first research paper 🎉
30.05.2025 15:40 — 👍 4 🔁 0 💬 0 📌 0LLMs memorize novels 📚 in English. But what about existing translations? Or translations into new languages?
Our 🦉OWL dataset (31K/10 languages) shows GPT4o recognizes books:
92% English
83% official translations
69% unseen translations
75% as audio (EN)