Cyberpunk!
03.02.2026 01:03 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Cyberpunk!
03.02.2026 01:03 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0
๐ญ How do LLMs (mis)represent culture?
๐งฎ How often?
๐ง Misrepresentations = missing knowledge? spoiler: NO!
At #CHI2026 we are bringing โจTALESโจ a participatory evaluation of cultural (mis)reps & knowledge in multilingual LLM-stories for India
๐ arxiv.org/abs/2511.21322
1/10
๐ Apply to CMU LTIโs Summer 2026 โLanguage Technology for Allโ internship! ๐ Open to preโdoctoral students new to language tech (nonโCS backgrounds welcome). ๐ฌ 12โ14 weeks inโperson in Pittsburgh โ travel + stipend paid. ๐ธ Deadline: Feb 20, 11:59pm ET. Apply โ forms.gle/cUu8g6wb27Hs...
02.02.2026 15:41 โ ๐ 14 ๐ 12 ๐ฌ 2 ๐ 0New paper to appear at EACL 2026 main conference, and it's now up on arxiv: arxiv.org/pdf/2505.17536. The character limit here is insane, so I'll let the screenshots speak for themselves. We put together a new dataset for conversational role attribution & thread disentanglement.
29.01.2026 21:14 โ ๐ 7 ๐ 3 ๐ฌ 1 ๐ 0This was joint work with my wonderful collaborators @mariaa.bsky.social, @stevewilson.bsky.social, @zxma.bsky.social, Achyut Ganti, @andrewpiper.bsky.social, and advisor @maartensap.bsky.social, at CU Boulder, UM-Flint, @uconn.bsky.social, and @ltiatcmu.bsky.social!
19.12.2025 23:05 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0
Paper: arxiv.org/abs/2512.15925
Code: github.com/joel-mire/so...
SSF-Corpus: huggingface.co/datasets/joe...
SSF-Generator: huggingface.co/joelmire/lla...
SSF-Classifier: huggingface.co/joelmire/lla...
SocialStoryFrames helps illuminate the social topography of narrative practices and reader response in online communities, opening avenues for future research in computational social science and cultural analytics!
19.12.2025 23:05 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Scatterplot with x-axis for 'reader-centric normalized entorpy' and y-axis for 'author-centric normalized entropy'. Each point represents a subreddit and is color-coded according to a legend of topics (e.g., fashion, hobby).
Lastly, we quantify the diversity (entropy) of author goals and reader reactions using SSF-Taxonomy.
Quadrants suggest distinct patterns: r/Frugal & r/techsupport have predictable scripts for authors and readers; r/Fitness has varied authorial approaches, predictable reader reactions.
Scatterplot comparing subreddit-pair similarity according to two similarity measures: semantic similarity and ssf-sim. The plot shows clusters of labels at regions of high and low agreement between the two measures. For example, both measures score the CFB-nfl highly, while only the ssf-sim measure scores apple-books highly.
SocialStoryFrames provides a new wayโbeyond semantic similarityโto compare social functions of storytelling across communities.
Even topically different communities (e.g., r/MakeupAddiction vs. r/buildapc) can show surprisingly similar social storytelling dynamics.
Bar plots for the overall_goal and narrative_intent dimensions of SSF-Taxonomy. Each bar plot shows relative proportion of dimensions sub-labels. For example, 'provide_info_support' has the highest proportion of overall_goal labels, followed by provide_experiential_accounts, and persuade_debate.
Using the framework, we analyze narrative intents across Reddit. Most common: justify/challenge belief (40%), clarify (14%), vent (14%), show identity (10%).
Association tests cast particular forms of storytelling (e.g., conveying similar experience) as mechanisms for empathy.
Table reporting inference classification performance for GPT-4.1 (k-shot) and SSF-Classifier across all SSF-Taxonomy dimensions.
For inference classification, we build a zero-shot SSF-Classifier using the same distillation approach, guided by a k-shot GPT-4.1 teacher.
Expert human evaluation shows strong performance, with an average Micro F1 of 0.85 and Macro F1 of 0.79 across SSF-Taxonomy dimensions.
Stacked bar plot comparing human plausibility ratings for SSF-Classifier and GPT-4o across all SSF-Taxonomy dimensions.
For inference generation, we create SSF-Generator via SFT distillation of GPT-4o on a Llama3.1 base.
We validate inference plausibility through a human survey (N=382).
94% of inferences were deemed plausible, with 78% deemed very/somewhat likely.
We define 2 tasks:
1. Generate plausible reader-response inferences given a story and its community/conversational context
2. Classify inferences onto taxonomy subdimensions.
We also curate SSF-Corpus: 6,140 storytelling contexts from 50+ Reddit communities.
Diagram of SSF-Taxonomy. Shows information flow from the author and community + conversational context into readers, who then have many different forms of reader response. These range from author-oriented inferences (overall goal, narrative intent, author emotional response), causal inferences (causal explanation, prediction), value judgments (character appraisal, moral, stance), and feelings (narrative feeling, aesthetic feeling).
We introduce SocialStoryFrames, a framework for contextual reasoning about narrative intent and reader response for social media stories.
Its foundation is SSF-Taxonomy, a taxonomy of dimensions of reader response, grounded in narrative theory, pragmatics, and psychology.
Screenshot of paper title and authors. Title: Social Story Frames: Contextual Reasoning about Narrative Intent and Reception Authors: Joel Mire, Maria Antoniak, Steven R. Wilson, Zexin Ma, Achyutarama R. Ganti, Andrew Piper, Maarten Sap
Reading social media stories evokes a wide range of contextual reader reactionsโinferential, affective, evaluativeโyet we lack methods to study these at scale.
Excited to share our new paper that builds a framework for analyzing storytelling practices across online communities!
A staircase in the new School of Computer, Data & Information Sciences building at Wisconsin Madison. Tan wood structures surround tapestry art and a small indoor garden.
A view from above of the staircases in the Wisconsin CDIS building
An shot from below of winding wooden staircases and a glass atrium rooftop. The new School of Computer, Data & Information Sciences building at Wisconsin Madison.
A bicolor white cat with seal-colored markings, looking upwards with big wide dark eyes.
It's the season for PhD apps!! ๐ฅง ๐ฆ โ๏ธ โ๏ธ
Apply to Wisconsin CS to research
- Societal impact of AI
- NLP โโ CSS and cultural analytics
- Computational sociolinguistics
- Human-AI interaction
- Culturally competent and inclusive NLP
with me!
lucy3.github.io/prospective-...
Performance of a sweep of models on Oolong-synth and Oolong-real. Performance decreases with increasing context length, sometimes steeply.
Can LLMs accurately aggregate information over long, information-dense texts? Not yetโฆ
We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
I'm recruiting multiple PhD students for Fall 2026 in Computer Science at @hopkinsengineer.bsky.social ๐
Apply to work on AI for social sciences/human behavior, social NLP, and LLMs for real-world applied domains you're passionate about!
Learn more at kristinagligoric.com & help spread the word!
How and when should LLM guardrails be deployed to balance safety and user experience?
Our #EMNLP2025 paper reveals that crafting thoughtful refusals rather than detecting intent is the key to human-centered AI safety.
๐ arxiv.org/abs/2506.00195
๐งต[1/9]
10 years after the initial idea, Artificial Humanities is here! Thanks so much to all who have preordered it. I hope you enjoy reading it and find this research approach as generative as I do. More to come!
27.09.2025 02:28 โ ๐ 38 ๐ 13 ๐ฌ 5 ๐ 3Academic paper titled un-straightening generative ai: how queer artists surface and challenge the normativity of generative ai models The piece is written by Jordan Taylor, Joel Mire, Franchesca Spektor, Alicia DeVrio, Maarten Sap, Haiyi Zhu, and Sarah Fox. As an image titled 24 attempts at intimacy showing 24 ai generated images with the word intimacy, none of which seems to include same gender couples
๐ณ๏ธโ๐๐จ๐ป๐ข Happy to share our workshop study on queer artistsโ experiences critically engaging with GenAI
Looking forward to presenting this work at #FAccT2025 and you can read a pre-print here:
arxiv.org/abs/2503.09805
When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs:
๐งต1/9
An overview of the work โResearch Borderlands: Analysing Writing Across Research Culturesโ by Shaily Bhatt, Tal August, and Maria Antoniak. The overview describes that We survey and interview interdisciplinary researchers (ยง3) to develop a framework of writing norms that vary across research cultures (ยง4) and operationalise them using computational metrics (ยง5). We then use this evaluation suite for two large-scale quantitative analyses: (a) surfacing variations in writing across 11 communities (ยง6); (b) evaluating the cultural competence of LLMs when adapting writing from one community to another (ยง7).
๐๏ธ Curious how writing differs across (research) cultures?
๐ฉ Tired of โculturalโ evals that don't consult people?
We engaged with interdisciplinary researchers to identify & measure โจcultural normsโจin scientific writing, and show thatโLLMs flatten themโ
๐ arxiv.org/abs/2506.00784
[1/11]
This looks incredible! Thanks for sharing the syllabus!
04.06.2025 17:23 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Iโm thrilled to share RewardBench 2 ๐โ We created a new multi-domain reward model evaluation that is substantially harder than RewardBench, we trained and released 70 reward models, and we gained insights about reward modeling benchmarks and downstream performance!
02.06.2025 23:41 โ ๐ 22 ๐ 6 ๐ฌ 2 ๐ 1Wisconsin-Madison's tree-filled campus, next to a big shiny lake
A computer render of the interior of the new computer science, information science, and statistics building. A staircase crosses an open atrium with visibility across multiple floors
I'm joining Wisconsin CS as an assistant professor in fall 2026!! There, I'll continue working on language models, computational social science, & responsible AI. ๐ฒ๐ง๐ฃ๐ปโโ๏ธ Apply to be my PhD student!
Before then, I'll postdoc for a year in the NLP group at another UW ๐๏ธ in the Pacific Northwest
When interacting with ChatGPT, have you wondered if they would ever "lie" to you? We found that under pressure, LLMs often choose deception. Our new #NAACL2025 paper, "AI-LIEDAR ," reveals models were truthful less than 50% of the time when faced with utility-truthfulness conflicts! ๐คฏ 1/
28.04.2025 20:36 โ ๐ 25 ๐ 9 ๐ฌ 1 ๐ 3A bar plot comparing the storytelling rates for different topics in the example dataset of congressional speeches. There are often large differences between storytelling and non-storytelling for individual topics. For example, the topic whose top words read "NUM, years, service, great, state" has much more storytelling that non-storytelling.
The top five congressional speeches for the topic "NUM, years, service, great state." All of the documents honor the lives of important people.
I updated our ๐ญStorySeeker demo. Aimed at beginners, it briefly walks through loading our model from Hugging Face, loading your own text dataset, predicting whether each text contains a story, and topic modeling and exploring the results. Runs in your browser, no installation needed!
โณ
New work on multimodal framing! ๐ซ
Some fun results: comparisons of the same frame when expressed in images vs texts. When the "crime" frame is expressed in the article text, there are more political words in the text, but when the frame is expressed in the article image, more police words.
This was joint work with my co-author Zubin Aysola; collaborators @dchechel.bsky.social, Nick Deas, and @chryssazrv.bsky.social; and advisor @maartensap.bsky.social at @ltiatcmu.bsky.social @scsatcmu.bsky.social @columbiauniversity.bsky.social, and the @istecnico.bsky.social! (10/10)
06.03.2025 19:49 โ ๐ 4 ๐ 0 ๐ฌ 0 ๐ 0