New paper out at AJPS: "The limits of AI for authoritarian control." The more repression there is, the less information exists in AI's training data, and the worse the AI performs.
19.02.2026 19:48 β π 10 π 1 π¬ 0 π 0@eddieyang.bsky.social
New paper out at AJPS: "The limits of AI for authoritarian control." The more repression there is, the less information exists in AI's training data, and the worse the AI performs.
19.02.2026 19:48 β π 10 π 1 π¬ 0 π 07/
See www.eddieyang.net/software/loc... for a more detailed tutorial and all other functionalities.
We're really hoping to make this a useful tool for people so please let us know if you have suggestions!
6/
Getting started in just three lines:
install.packages("localLLM")
library(localLLM)
install_localLLM()
Then:
quick_llama("Classify this tweet as Positive or Negative...")
That's it. No dependencies on Python or API keys.
5/
NEW: validate() β one-shot evaluation of your annotation results.
Computes confusion matrices (vs. gold standard) and intercoder reliability (Cohen's Kappa, Krippendorff's Alpha) in a single function.
4/
NEW: explore() β compare and explore multiple LLMs side-by-side in one call.
Pass a list of models and prompts. Get back annotations by LLM.
3/
NEW: Auto-documentation for reproducibility.
Wrap your analysis in document_start() / document_end() and localLLM automatically logs timestamps, model metadata, decoding parameters, and run summaries to a text report with SHA-256 hashes for all input and output.
2/
By default, localLLM ensures that LLM output is reproducible, even when temperature > 0.
We've updated the localLLM package on CRAN (cran.r-project.org/package=loca...).
It allows you to run LLMs locally and natively in R. Everything is reproducible. And it's free.
Some functionalities on reproducibility and validationπ§΅π
Russia, Venezuela, Iran, China, the Sahel region, the United States ...
Want to know why state agents carry out brutal repression β or participate in illegal coups?
Our new book "Making a Career in Dictatorship" provides answers β it just got published by @academic.oup.com:
tinyurl.com/ystwm3tf
Why the slow uptake in political science?
02.02.2026 17:45 β π 1 π 0 π¬ 0 π 0So, randomization is not a *sufficient* condition for good research. Far from it.
The best experimental social science being done is that work where either the theory or the operationalization (or both) are the emphasis.
Randomization is "easy" - the challenge is what you randomize and why.
Can Large Multimodal Models (LMMs) extract features of urban neighborhoods from street-view images?
New with Paige Bollen (OSU) and @joehigton.bsky.social (NYU): Sometimes, but the models better recover national assessments that local ones, even w/additional prompting (which can make things worse!)
We also developed a new R package, localLLM (cran.r-project.org/package=loca...), that enables reproducible annotation using LLM directly in R. More functionalities to follow!
20.10.2025 13:56 β π 2 π 0 π¬ 0 π 0Based on these findings (and more in the paper), we offer recommendations for best practices. We also summarized the recs in a checklist to facilitate a more principled procedure.
20.10.2025 13:56 β π 1 π 0 π¬ 1 π 0Finding 4: Bias-correction methods like DSL can reduce bias, but they introduce a trade-off: corrected estimates often have larger standard errors, requiring a large ground-truth sample (600-1000+) to be beneficial without losing too much precision.
20.10.2025 13:56 β π 0 π 0 π¬ 1 π 0Finding 3: In-context learning (providing a few annotated examples in the prompt) offers only marginal improvements in reliability, with benefits plateauing quickly. Changes to prompt format has a small effect (smaller and reasoning models more sensitive).
20.10.2025 13:56 β π 0 π 0 π¬ 1 π 0Finding 2: This disagreement has significant downstream consequences. Re-running the original analyses with LLM annotations produced highly variable coefficient estimates, often altering the conclusions of the original studies.
20.10.2025 13:56 β π 1 π 0 π¬ 1 π 0There is also an interesting linear relationship between LLM-human and LLM-LLM annotation agreement: when LLMs agree more with each other, they also tend to agree more with humans and supervised models! We gave some suggestions on what annotation tasks are good for LLMs.
20.10.2025 13:56 β π 1 π 0 π¬ 1 π 0Finding 1: LLM annotations show pretty low intercoder reliability with the original annotations (coded by humans or supervised models). Perhaps surprisingly, reliability among the different LLMs themselves is only moderate (larger models better).
20.10.2025 13:56 β π 0 π 0 π¬ 1 π 0The LLM annotations also allowed us to present results on:
1. effectiveness of in-context learning
2. model sensitivity to changes in prompt format
3. bias-correction methods
We re-annotated data from 14 published papers in political science with 15 different LLMs (300 million annotations!). We compared them with the original annotations. We then re-ran the original analyses to see how much variation in coefficient estimates these LLMs give us.
20.10.2025 13:56 β π 1 π 0 π¬ 1 π 0New paper: LLMs are increasingly used to label data in political science. But how reliable are these annotations, and what are the consequences for scientific findings? What are best practices? Some new findings from a large empirical evaluation.
Paper: eddieyang.net/research/llm_annotation.pdf
We also developed a new R package, localLLM (cran.r-project.org/package=loca...), that enables reproducible annotation using LLM directly in R. More functionalities to follow!
20.10.2025 13:52 β π 0 π 0 π¬ 0 π 0Based on these findings (and more in the paper), we offer recommendations for best practices. We also summarized the recommendations in a checklist to facilitate a more principled procedure.
20.10.2025 13:52 β π 0 π 0 π¬ 1 π 0Finding 4: Bias-correction methods like DSL can reduce bias, but they introduce a trade-off: corrected estimates often have larger standard errors, requiring a large ground-truth sample (600-1000+) to be beneficial without losing too much precision.
20.10.2025 13:52 β π 0 π 0 π¬ 1 π 0Finding 3: In-context learning (providing a few annotated examples in the prompt) offers only marginal improvements in reliability, with benefits plateauing quickly. Changes to prompt format has a small effect (smaller and reasoning models more sensitive).
20.10.2025 13:52 β π 0 π 0 π¬ 1 π 0Finding 2: This disagreement has significant downstream consequences. Re-running the original analyses with LLM annotations produced highly variable coefficient estimates, often altering the conclusions of the original studies.
20.10.2025 13:52 β π 0 π 0 π¬ 1 π 0There is also an interesting linear relationship between LLM-human and LLM-LLM annotation agreement: when LLMs agree more with each other, they also tend to agree more with humans and supervised models! We gave some suggestions on what annotation tasks are good for LLMs.
20.10.2025 13:52 β π 0 π 0 π¬ 1 π 0Finding 1: LLM annotations show pretty low intercoder reliability with the original annotations (coded by humans or supervised models). Perhaps surprisingly, reliability among the different LLMs themselves is only moderate (larger models better).
20.10.2025 13:52 β π 0 π 0 π¬ 1 π 0The LLM annotations also allowed us to present results on:
1. effectiveness of in-context learning
2. model sensitivity to changes in prompt format
3. bias-correction methods