Alexey Koshevoy @alexeykoshevoy

thanks a lot Martin!

05.10.2025 16:41 — 👍 0 🔁 0 💬 0 📌 0

thanks a lot Oleg!!

30.09.2025 13:09 — 👍 1 🔁 0 💬 0 📌 0

thanks Mason!!

26.09.2025 11:13 — 👍 0 🔁 0 💬 0 📌 0

thanks a lot Natalia!

25.09.2025 23:07 — 👍 0 🔁 0 💬 0 📌 0

thanks Jakub! everyone has enjoyed our paper

25.09.2025 17:07 — 👍 2 🔁 0 💬 1 📌 0

Huge congratulations to @alexeykoshevoy.bsky.social l who defended his PhD thesis today!! with his co-supervisor @sblldtrch.bsky.social and jury members @simonkirby.bsky.social @gboleda.bsky.social Paula Rubio Fernandez & Benjamin Spector.

25.09.2025 16:42 — 👍 17 🔁 2 💬 5 📌 0

$We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation". We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks. For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations. Then, we collect 13 million LLM annotations across plausible LLM configurations. These annotations feed into 1.4 million regressions testing the hypotheses. For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions. Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors. Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models. Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.$

We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation". We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks. For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations. Then, we collect 13 million LLM annotations across plausible LLM configurations. These annotations feed into 1.4 million regressions testing the hypotheses. For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions. Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors. Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models. Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.

🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.

Paper: arxiv.org/pdf/2509.08825

12.09.2025 10:33 — 👍 259 🔁 94 💬 5 📌 19

Image of labubu doll labeled labubu next to image of spiky labubu doll labeled lakiki

::slowly stands while clapping::

10.09.2025 23:07 — 👍 1014 🔁 257 💬 6 📌 11

Congrats, well deserved!

04.09.2025 14:56 — 👍 1 🔁 0 💬 0 📌 0

A global database on blowguns with links to geography and language | Evolutionary Human Sciences | Cambridge Core A global database on blowguns with links to geography and language - Volume 7

New paper! ⚡ With Gabriel Aguirre and Marcelo Sánchez, looking at patterns of blowgun types and use across societies of the world. We find areal patterns, similarities mediated by cultural connections, and specific types characterizing distinct branches of the Austronesian language tree. 🎯

27.08.2025 21:37 — 👍 22 🔁 8 💬 0 📌 0

I am on a 6 hour train journey without air conditioning, but it’s worth it because I am heading to #SLE2025! This is my first linguistics conference in a while.

25.08.2025 14:32 — 👍 5 🔁 0 💬 0 📌 0

Models as Prediction Machines: How to Convert Confusing Coefficients into Clear Quantities Abstract Psychological researchers usually make sense of regression models by interpreting coefficient estimates directly. This works well enough for simple linear models, but is more challenging for more complex models with, for example, categorical variables, interactions, non-linearities, and hierarchical structures. Here, we introduce an alternative approach to making sense of statistical models. The central idea is to abstract away from the mechanics of estimation, and to treat models as “counterfactual prediction machines,” which are subsequently queried to estimate quantities and conduct tests that matter substantively. This workflow is model-agnostic; it can be applied in a consistent fashion to draw causal or descriptive inference from a wide range of models. We illustrate how to implement this workflow with the marginaleffects package, which supports over 100 different classes of models in R and Python, and present two worked examples. These examples show how the workflow can be applied across designs (e.g., observational study, randomized experiment) to answer different research questions (e.g., associations, causal effects, effect heterogeneity) while facing various challenges (e.g., controlling for confounders in a flexible manner, modelling ordinal outcomes, and interpreting non-linear models).

Figure illustrating model predictions. On the X-axis the predictor, annual gross income in Euro. On the Y-axis the outcome, predicted life satisfaction. A solid line marks the curve of predictions on which individual data points are marked as model-implied outcomes at incomes of interest. Comparing two such predictions gives us a comparison. We can also fit a tangent to the line of predictions, which illustrates the slope at any given point of the curve.

A figure illustrating various ways to include age as a predictor in a model. On the x-axis age (predictor), on the y-axis the outcome (model-implied importance of friends, including confidence intervals). Illustrated are 1. age as a categorical predictor, resultings in the predictions bouncing around a lot with wide confidence intervals 2. age as a linear predictor, which forces a straight line through the data points that has a very tight confidence band and 3. age splines, which lies somewhere in between as it smoothly follows the data but has more uncertainty than the straight line.

Ever stared at a table of regression coefficients & wondered what you're doing with your life?

Very excited to share this gentle introduction to another way of making sense of statistical models (w @vincentab.bsky.social)
Preprint: doi.org/10.31234/osf...
Website: j-rohrer.github.io/marginal-psy...

25.08.2025 11:49 — 👍 941 🔁 283 💬 49 📌 19

Congrats!

07.07.2025 14:59 — 👍 1 🔁 0 💬 0 📌 0

Experimentology cover: title and curves for distributions.

Experimentology is out today!!! A group of us wrote a free online textbook for experimental methods, available at experimentology.io - the idea was to integrate open science into all aspects of the experimental workflow from planning to design, analysis, and writing.

01.07.2025 18:25 — 👍 534 🔁 228 💬 10 📌 15

Want to easily scrape data from news media sites?

There's an R package for that!

paperboy

"paperboy offers writers of web scrap[ers] a clear path to publish their code & earn co-authorship on the package, while deliver[ing] news media data from many websites in a consistent format."

26.06.2025 13:46 — 👍 122 🔁 35 💬 6 📌 2

For some reason bluesky doesn’t work on Firefox anymore, event after I updated it. Is it the case for anyone else?

11.06.2025 09:50 — 👍 0 🔁 0 💬 0 📌 0

The pivot penalty in research - Nature An analysis of millions of scientific papers and patents reveals a ‘pivot penalty’ when researchers shift direction, with the impact of studies decreasing rapidly the further they move from their prev...

:: (not really) Trigger warning::

Stay in your lane, or pay a career penalty

www.nature.com/articles/s41...

31.05.2025 04:49 — 👍 49 🔁 14 💬 9 📌 4

Language depends on copying (e.g. of words, signs). And language in turn is needed for many other things.

When and why did our ancestors gain this ability to copy? Our (Ron, Elisa & me) archaeological reanalysis says: in the last million years. Just out:

dx.doi.org/10.1093/oxfo...

26.05.2025 10:32 — 👍 44 🔁 16 💬 2 📌 0

Corpus-based approaches to evolutionary dynamics in language AbstractPragmatic-interactional aspects of present-day language use as well as historical language change have come to be regarded as an important source o

🚨New publication alert!
"Corpus-based approaches to evolutionary dynamics in language" (w/ @stefanhartmann.bsky.social) out now in the in the Oxford Handbook of Approaches to Language Evolution. Kudos to @limorraviv.bsky.social & @cedricboeckx.bsky.social for putting this great volume together!

25.05.2025 15:56 — 👍 25 🔁 9 💬 2 📌 1

A Natural Language Interface to ggplot2 The ggplot2 package is the state-of-the-art toolbox for creating and formatting graphs. However, it is easy to forget how certain formatting commands are named and sometimes users find themselves aski...

you know all gen 1 pokémon, of course you could.
reminded of brandmaier.github.io/ggx/

23.05.2025 10:19 — 👍 12 🔁 1 💬 1 📌 0

How are humans able to make sense of time? Not with special biology but with “time tools”—ideas, practices, and artifacts that render time more concrete.

My new paper explores this vast, varied toolkit—one that makes use of knots, nuts, hands, flowers, mountains, shadows, and much more.

(link 👇)

02.05.2025 16:51 — 👍 89 🔁 34 💬 3 📌 1

Out now in @pnas.org! 🌹Is a rose by any other name still as roselike?🌹

We study the prevalence of iconicity (does a word look/sound like what it means?) and systematicity (are pronunciation/meaning relationships shared across multiple words?) in large datasets of ASL, English, and Spanish.

🧵1/N

23.04.2025 17:47 — 👍 18 🔁 9 💬 1 📌 3

Knowledge transmission, culture and the consequences of social disruption in wild elephants | Philosophical Transactions of the Royal Society B: Biological Sciences Cultural knowledge is widely presumed to be important for elephants. In all three elephant species, individuals tend to congregate around older conspecifics, creating opportunities for social transmission. However, direct evidence of social learning and ...

Knowledge transmission, culture and the consequences of social disruption in wild elephants royalsocietypublishing.org/doi/10.1098/...

05.05.2025 12:28 — 👍 17 🔁 11 💬 0 📌 1

Carbon majors and the scientific case for climate liability - Nature A transparent and reproducible scientific framework is introduced to formalize how trillions in economic losses are attributable to the extreme heat caused by emissions from fossil fuel companies, whi...

"Emissions linked to Chevron, the highest-emitting investor-owned company in our data, for example, very likely caused between US $791 billion and $3.6 trillion in heat-related losses over the period 1991–2020"

www.nature.com/articles/s41...

23.04.2025 19:37 — 👍 259 🔁 145 💬 6 📌 13

Apply for our PhD position in language acquisition / computational linguistics in Groningen until 24 April! Job ad is here:
www.rug.nl/about-ug/wor...

26.03.2025 17:22 — 👍 5 🔁 4 💬 0 📌 1

Psych-DS A specification for psychological datasets. JSON metadata, predictable directory structure, and machine-readable specifications for tabular datasets.

Psych-DS is (1) spellcheck for your datasets and (2) a pathway to standardizing data in our academic fields that *everyone* can learn.

And it's live RIGHT NOW!

psych-ds.github.io

(This is the announcement post I've been leading up to)

09.04.2025 19:37 — 👍 133 🔁 60 💬 9 📌 12

A Dataset on Linguistic Connectivity Across and Within Countries - Scientific Data Scientific Data - A Dataset on Linguistic Connectivity Across and Within Countries

This looks useful -- A Dataset on Linguistic Connectivity Across and Within Countries
#linguistics #languages
www.nature.com/articles/s41...

08.04.2025 20:33 — 👍 18 🔁 5 💬 1 📌 0

New paper on misperceptions out in PNAS @pnas.org

www.pnas.org/doi/10.1073/...

Why do people overestimate the size of politically relevant groups (immigrant, LGBTQ, Jewish) and quantities (% of budget spent on foreign aid, % of refugees that are criminals)?🧵👇

07.04.2025 12:00 — 👍 270 🔁 98 💬 12 📌 21

Extensive compositionality in the vocal system of bonobos Compositionality, the capacity to combine meaningful elements into larger meaningful structures, is a hallmark of human language. Compositionality can be trivial (the combination’s meaning is the sum ...

Game changing study in @science.org by @berthetmelissa.bsky.social and co.

www.science.org/doi/10.1126/...

03.04.2025 21:50 — 👍 17 🔁 9 💬 1 📌 2

Alexey Koshevoy

Latest posts by alexeykoshevoy.bsky.social on Bluesky

@alexeykoshevoy is following 20 prominent accounts