Eddie Yang @eddieyang - Bluesky Profile

New paper out at AJPS: "The limits of AI for authoritarian control." The more repression there is, the less information exists in AI's training data, and the worse the AI performs.

19.02.2026 19:48 — 👍 10 🔁 1 💬 0 📌 0

localLLM - Local Large Language Models in R Run LLMs locally using llama.cpp backend with comprehensive support for text generation, annotation, and reliability metrics.

7/
See www.eddieyang.net/software/loc... for a more detailed tutorial and all other functionalities.

We're really hoping to make this a useful tool for people so please let us know if you have suggestions!

17.02.2026 22:35 — 👍 0 🔁 0 💬 0 📌 0

6/
Getting started in just three lines:

install.packages("localLLM")
library(localLLM)
install_localLLM()

Then:

quick_llama("Classify this tweet as Positive or Negative...")

That's it. No dependencies on Python or API keys.

17.02.2026 22:35 — 👍 0 🔁 0 💬 1 📌 0

5/
NEW: validate() — one-shot evaluation of your annotation results.

Computes confusion matrices (vs. gold standard) and intercoder reliability (Cohen's Kappa, Krippendorff's Alpha) in a single function.

17.02.2026 22:35 — 👍 0 🔁 0 💬 1 📌 0

4/
NEW: explore() — compare and explore multiple LLMs side-by-side in one call.

Pass a list of models and prompts. Get back annotations by LLM.

17.02.2026 22:35 — 👍 0 🔁 0 💬 1 📌 0

3/
NEW: Auto-documentation for reproducibility.

Wrap your analysis in document_start() / document_end() and localLLM automatically logs timestamps, model metadata, decoding parameters, and run summaries to a text report with SHA-256 hashes for all input and output.

17.02.2026 22:35 — 👍 0 🔁 0 💬 1 📌 0

2/
By default, localLLM ensures that LLM output is reproducible, even when temperature > 0.

17.02.2026 22:35 — 👍 0 🔁 0 💬 1 📌 0

localLLM: Running Local LLMs with 'llama.cpp' Backend Provides R bindings to the 'llama.cpp' library for running large language models. The package uses a lightweight architecture where the C++ backend library is downloaded at runtime rather than bundled...

We've updated the localLLM package on CRAN (cran.r-project.org/package=loca...).

It allows you to run LLMs locally and natively in R. Everything is reproducible. And it's free.

Some functionalities on reproducibility and validation🧵👇

17.02.2026 22:35 — 👍 0 🔁 1 💬 1 📌 0

Russia, Venezuela, Iran, China, the Sahel region, the United States ...

Want to know why state agents carry out brutal repression — or participate in illegal coups?

Our new book "Making a Career in Dictatorship" provides answers — it just got published by @academic.oup.com:

tinyurl.com/ystwm3tf

16.02.2026 11:09 — 👍 114 🔁 61 💬 10 📌 9

Why the slow uptake in political science?

02.02.2026 17:45 — 👍 1 🔁 0 💬 0 📌 0

So, randomization is not a *sufficient* condition for good research. Far from it.

The best experimental social science being done is that work where either the theory or the operationalization (or both) are the emphasis.

Randomization is "easy" - the challenge is what you randomize and why.

21.12.2025 13:50 — 👍 23 🔁 3 💬 3 📌 1

Can Large Multimodal Models (LMMs) extract features of urban neighborhoods from street-view images?

New with Paige Bollen (OSU) and @joehigton.bsky.social (NYU): Sometimes, but the models better recover national assessments that local ones, even w/additional prompting (which can make things worse!)

11.12.2025 20:32 — 👍 22 🔁 5 💬 1 📌 2

localLLM: Running Local LLMs with 'llama.cpp' Backend The 'localLLM' package provides R bindings to the 'llama.cpp' library for running large language models. The package uses a lightweight architecture where the C++ backend library is downloaded at runt...

We also developed a new R package, localLLM (cran.r-project.org/package=loca...), that enables reproducible annotation using LLM directly in R. More functionalities to follow!

20.10.2025 13:56 — 👍 2 🔁 0 💬 0 📌 0

Based on these findings (and more in the paper), we offer recommendations for best practices. We also summarized the recs in a checklist to facilitate a more principled procedure.

20.10.2025 13:56 — 👍 1 🔁 0 💬 1 📌 0

Finding 4: Bias-correction methods like DSL can reduce bias, but they introduce a trade-off: corrected estimates often have larger standard errors, requiring a large ground-truth sample (600-1000+) to be beneficial without losing too much precision.

20.10.2025 13:56 — 👍 0 🔁 0 💬 1 📌 0

Finding 3: In-context learning (providing a few annotated examples in the prompt) offers only marginal improvements in reliability, with benefits plateauing quickly. Changes to prompt format has a small effect (smaller and reasoning models more sensitive).

20.10.2025 13:56 — 👍 0 🔁 0 💬 1 📌 0

Finding 2: This disagreement has significant downstream consequences. Re-running the original analyses with LLM annotations produced highly variable coefficient estimates, often altering the conclusions of the original studies.

20.10.2025 13:56 — 👍 1 🔁 0 💬 1 📌 0

There is also an interesting linear relationship between LLM-human and LLM-LLM annotation agreement: when LLMs agree more with each other, they also tend to agree more with humans and supervised models! We gave some suggestions on what annotation tasks are good for LLMs.

20.10.2025 13:56 — 👍 1 🔁 0 💬 1 📌 0

Finding 1: LLM annotations show pretty low intercoder reliability with the original annotations (coded by humans or supervised models). Perhaps surprisingly, reliability among the different LLMs themselves is only moderate (larger models better).

20.10.2025 13:56 — 👍 0 🔁 0 💬 1 📌 0

The LLM annotations also allowed us to present results on:
1. effectiveness of in-context learning
2. model sensitivity to changes in prompt format
3. bias-correction methods

20.10.2025 13:56 — 👍 0 🔁 0 💬 1 📌 0

We re-annotated data from 14 published papers in political science with 15 different LLMs (300 million annotations!). We compared them with the original annotations. We then re-ran the original analyses to see how much variation in coefficient estimates these LLMs give us.

20.10.2025 13:56 — 👍 1 🔁 0 💬 1 📌 0

New paper: LLMs are increasingly used to label data in political science. But how reliable are these annotations, and what are the consequences for scientific findings? What are best practices? Some new findings from a large empirical evaluation.
Paper: eddieyang.net/research/llm_annotation.pdf

20.10.2025 13:56 — 👍 40 🔁 14 💬 1 📌 5

localLLM: Running Local LLMs with 'llama.cpp' Backend The 'localLLM' package provides R bindings to the 'llama.cpp' library for running large language models. The package uses a lightweight architecture where the C++ backend library is downloaded at runt...

We also developed a new R package, localLLM (cran.r-project.org/package=loca...), that enables reproducible annotation using LLM directly in R. More functionalities to follow!

20.10.2025 13:52 — 👍 0 🔁 0 💬 0 📌 0

Based on these findings (and more in the paper), we offer recommendations for best practices. We also summarized the recommendations in a checklist to facilitate a more principled procedure.

20.10.2025 13:52 — 👍 0 🔁 0 💬 1 📌 0

Finding 4: Bias-correction methods like DSL can reduce bias, but they introduce a trade-off: corrected estimates often have larger standard errors, requiring a large ground-truth sample (600-1000+) to be beneficial without losing too much precision.

20.10.2025 13:52 — 👍 0 🔁 0 💬 1 📌 0

Finding 3: In-context learning (providing a few annotated examples in the prompt) offers only marginal improvements in reliability, with benefits plateauing quickly. Changes to prompt format has a small effect (smaller and reasoning models more sensitive).

20.10.2025 13:52 — 👍 0 🔁 0 💬 1 📌 0

Finding 2: This disagreement has significant downstream consequences. Re-running the original analyses with LLM annotations produced highly variable coefficient estimates, often altering the conclusions of the original studies.

20.10.2025 13:52 — 👍 0 🔁 0 💬 1 📌 0

There is also an interesting linear relationship between LLM-human and LLM-LLM annotation agreement: when LLMs agree more with each other, they also tend to agree more with humans and supervised models! We gave some suggestions on what annotation tasks are good for LLMs.

20.10.2025 13:52 — 👍 0 🔁 0 💬 1 📌 0

Finding 1: LLM annotations show pretty low intercoder reliability with the original annotations (coded by humans or supervised models). Perhaps surprisingly, reliability among the different LLMs themselves is only moderate (larger models better).

20.10.2025 13:52 — 👍 0 🔁 0 💬 1 📌 0

The LLM annotations also allowed us to present results on:
1. effectiveness of in-context learning
2. model sensitivity to changes in prompt format
3. bias-correction methods

20.10.2025 13:52 — 👍 0 🔁 0 💬 1 📌 0

Eddie Yang

Latest posts by eddieyang.bsky.social on Bluesky

@eddieyang is following 20 prominent accounts