Really exicted to finally see our paper in print: “Web scraping for research: Legal, ethical, institutional, and scientific considerations”
A great interdisciplinary effort with @m-dot-brown.bsky.social, @orangechair.org, Gabe Maldoff, @solmg.bsky.social, @zevesanderson.com
doi.org/10.1177/2053...
Grateful for the collaboration with excellent researchers for this paper @orangechair.org, Gabe Maldoff, @solmg.bsky.social, @zevesanderson.com, & @michaelzimmer.bsky.social
Thrilled to have a new article published in @bigdatasoc.bsky.social! 🥳🎉📊
With scraping becoming a more common data collection strategy for internet researchers, we cover the legal, ethical, institutional, and scientific ramifications researchers should consider. doi.org/10.1177/2053...
they do be posting
🚨New publication in Social Media + Society🚨
Candidates Be Posting: Multi-Platform Strategies and Partisan Preferences in the 2022 U.S. Midterm Elections
And it's open access!
journals.sagepub.com/doi/full/10....
7/ 📄 You can find our full paper here: arxiv.org/abs/2503.23243
w/ Shubham Atreja, @libbyh.bsky.social, & @patrickwu.bsky.social
6/ This has implications for both research and industry:
⚠️ Fairness evaluations should be context-specific
🤔 Model choice alone will not solve bias
🔍 Human disagreements are part of the complexity—not a flaw
5/ We also find that the difficulty of the labeling task is most predictive of LLM agreement with human annotators.
4/ Key finding: While LLMs disagree with human annotators on the basis of demographics, it tends to be in the same directions on the same demographic categories within the same dataset. In other words, the direction of bias is not LLM-specific, but dataset-specific.
3/ This study evaluates LLM annotations across 4 datasets and tasks, analyzing whether these models disproportionately reflect majority group opinions.
2/ Prior research praises LLMs for their high accuracy, precision, recall, and F1 scores in labeling tasks—but also raises concerns about bias, especially around sensitive or polarizing content (e.g., toxicity).
Can large language models (LLMs) fairly annotate data on contentious topics?
Our new paper dives into this question—looking at whether LLM-generated labels reflect diverse viewpoints or skew toward majority perspectives. The results are surprisingly nuanced. 🧵
I am thrilled to share a new article in Sociological Methods & Research, “Quantifying Narrative Similarity Across Languages”. My co-first author Sol Messing and our collaborators developed a new approach to measuring “narrative similarity” between texts: journals.sagepub.com/doi/10.1177/...
[New WP] With the closure of major social media APIs and the new data access mandates under DSA, we enter what we call the "post-post-API" era. But have researchers obtained the data they need? Our recent survey (180) + interview (19) study suggests a stark reality.
🔗 arxiv.org/abs/2505.09877
1/3
So thrilled to have worked on this important piece with @yang3kc.bsky.social @m-dot-brown.bsky.social and Kayo Mimizuka. Data access for independent researchers is at such a critical juncture
Special thanks to Mango Brown and Taylor Swift's 'Mr. Perfectly Fine' for their help in getting this paper over the finish line
So excited to finally see this out! It was the first paper I started during my postdoc at @csmapnyu.org
It's well known that politicians take more extreme positions during primaries. In @electoralstudies.bsky.social, we find this shift is much more likely when incumbents in safe seats face a well-funded primary challenger.
🧵👇
authors.elsevier.com/a/1kn5KxRaZn...
@adambonica.bsky.social showed ideology predicts which agencies experience DOGE layoffs. But what other factors could be driving this?
Using a generative LLM-derived measure, I find agencies perceived as knowledge institutions are more likely to experience layoffs, even controlling for ideology. 🧵
🚨New Publication in New Media & Society🚨
Co-First-Authored w/ @m-dot-brown.bsky.social @meredithpruden.bsky.social & @markriedl.bsky.social
Making academia suck less: Supporting early career researchers studying harmful content online through a
feminist ethics of care.
jlukito.com/s/brown-et-a...
They don’t need an excuse. They’ll claim we gave them an excuse no matter what we do. “Be careful what you say” is precisely how authoritarians achieve compliance w/o lifting a finger. And yet, even w/ compliance, they will still attack.
Instead, may I suggest taking a look at researchersupport.org
Excited to share a pre-print about web scraping for research! We're happy to receive feedback on how we frame this issue and try to build some paths forward for researchers. w/ @orangechair.org, Gabe Maldoff, @solmg.bsky.social, Zeve Sanderson, & @michaelzimmer.bsky.social arxiv.org/abs/2410.23432
I’ll keep working with academics, civil society researchers, and journalists—including via the Coalition for Independent Technology Research—to continue these important accountability efforts. 12/ (independenttechresearch.org/introducing-...)
🚨Our new paper in Political Analysis presents a novel, cross-platform method for estimating the ideology of YouTube videos.
What we found: it is possible to do this at scale with an efficient, automated method!
🧵1/
Great coverage of the ongoing challenge of researcher data access on Twitter! Thanks @sheiladang.bsky.social for the great piece!
www.reuters.com/technology/e...