Mohit Iyyer miyyer - Bluesky Statics

Well this is sure to be a blockbuster AI article... @jennarussell.bsky.social et al are kicking ass and taking names in journalism, both individuals and organizations.

"AI use in American newspapers is widespread, uneven, and rarely disclosed"
arxiv.org/abs/2510.18774

23.10.2025 13:53 — 👍 22 🔁 8 💬 3 📌 0

AI is already at work in American newsrooms.

We examine 186k articles published this summer and find that ~9% are either fully or partially AI-generated, usually without readers having any idea.

Here's what we learned about how AI is influencing local and national journalism:

22.10.2025 15:24 — 👍 55 🔁 29 💬 5 📌 2

Tired of AI slop? Our work on "Frankentexts" shows how LLMs can stitch together random fragments of human writing into coherent, relevant responses to arbitrary prompts.

Frankentexts are weirdly creative, and they also pose problems for AI detectors: are they AI? human? More 👇

03.06.2025 16:16 — 👍 15 🔁 3 💬 0 📌 0

Llama 4's massive context window is impressive! However, the best Llama model for long-context understanding over books is still Llama 3.1 405B. Llama 4 Scout is especially bad at our NoCha benchmark, performing below random chance.

08.04.2025 01:48 — 👍 25 🔁 6 💬 0 📌 1

Thinking about paying $20k/month for a "PhD-level AI agent"? You might want to wait until their web browsing skills are on par with those of human PhD students 😛 Check out our new BEARCUBS benchmark, which shows web agents struggle to perform simple multimodal browsing tasks!

12.03.2025 16:08 — 👍 6 🔁 1 💬 0 📌 0

New synthetic benchmark for multilingual long-context LLMs! Surprisingly, English and Chinese are not the top-performing languages (it's Polish!). We also observe a widening gap between high and low-resource languages as context size increases. Check out the paper for more 👇

05.03.2025 18:44 — 👍 4 🔁 1 💬 0 📌 0

How can we generate synthetic data for a task that requires global reasoning over a long context (e.g., verifying claims about a book)? LLMs aren't good at *solving* such tasks, let alone generating data for them. Check out our paper for a compression-based solution!

21.02.2025 16:37 — 👍 17 🔁 4 💬 0 📌 0

Lots of recent work focuses on 𝐚𝐮𝐭𝐨𝐦𝐚𝐭𝐢𝐜 detection of LLM-generated text. But how well do 𝐡𝐮𝐦𝐚𝐧𝐬 fare? TLDR: ppl who frequently use ChatGPT for writing tasks are elite at spotting AI text! See our paper for more (and congrats to @jennarussell.bsky.social on her first paper!!)

28.01.2025 15:12 — 👍 4 🔁 0 💬 0 📌 0

Posts by Mohit Iyyer (@miyyer.bsky.social)