Oleg Sobchuk ๐Ÿ‡บ๐Ÿ‡ฆ's Avatar

Oleg Sobchuk ๐Ÿ‡บ๐Ÿ‡ฆ

@sobchuk.bsky.social

I use big data to research the cultural evolution of arts @ Max Planck Institute for Evolutionary Anthropology More: https://www.sobch.uk/

1,649 Followers  |  854 Following  |  540 Posts  |  Joined: 18.03.2023  |  2.0612

Latest posts by sobchuk.bsky.social on Bluesky

Post image

Why does Western Paleolithic cave art strongly prefer animal side views and often use abbreviations? Our new paper in Topics in Cognitive Science challenges long-held assumptions about these artistic choices using cognitive science experiments. A thread 1/n
onlinelibrary.wiley.com/doi/10.1111/...

15.09.2025 14:41 โ€” ๐Ÿ‘ 71    ๐Ÿ” 21    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Congratulations from me too ๐Ÿฅณ ๐ŸŽ“

26.09.2025 23:15 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Assistant Professor - Information - School of Information University of California, Berkeley is hiring. Apply now!

The UC Berkeley School of Information is hiring an assistant professor in the broad field of Information--including areas of info seeking/retrieval, digital humanities, cultural analytics, info viz, & philosophy of information (among others). Deadline Nov 1! aprecruit.berkeley.edu/JPF05014

23.09.2025 14:43 โ€” ๐Ÿ‘ 75    ๐Ÿ” 74    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

I agree. One of my favorite movies. I learned about it from Zizek's Pervert's Guide to Cinema

21.09.2025 22:42 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
From Questions to Knowledge I am delighted to announce the publications of my statistics and data analysis book, โ€˜From Questions to Knowledge: Data Analysis for Psychology and Behavioural Science Using Rโ€™.ย  This bโ€ฆ

Delighted to announce the publication of 'From Questions to Knowledge', my new and updated statistics and data analysis handbook.
www.danielnettle.eu/2025/09/14/f...

14.09.2025 17:41 โ€” ๐Ÿ‘ 10    ๐Ÿ” 5    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Cool new paper (and a thread about it) by @babeheim.bsky.social on the cultural evolution of Go games! Check out these colourful decision trees ๐Ÿ”ฅ doi.org/10.1017/ehs....

16.09.2025 15:02 โ€” ๐Ÿ‘ 13    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation".
We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks.
For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations.
Then, we collect 13 million LLM annotations across plausible LLM configurations.
These annotations feed into 1.4 million regressions testing the hypotheses. 
For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions.
Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors.
Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models.
Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.

We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation". We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks. For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations. Then, we collect 13 million LLM annotations across plausible LLM configurations. These annotations feed into 1.4 million regressions testing the hypotheses. For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions. Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors. Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models. Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.

๐Ÿšจ New paper alert ๐Ÿšจ Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.

Paper: arxiv.org/pdf/2509.08825

12.09.2025 10:33 โ€” ๐Ÿ‘ 259    ๐Ÿ” 94    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 19
Front cover of the Oxford Handbook of Cultural Evolution

Front cover of the Oxford Handbook of Cultural Evolution

The chapter "Evolution of Modern Literature and Film" in this volume

The chapter "Evolution of Modern Literature and Film" in this volume

My copy of the Oxford Handbook of Cultural Evolution has arrived! Looks gorgeous โ€“ and massive. And somewhere on page 554 (which is roughly the middle of the book ๐Ÿ˜…) you can find my chapter.

Big thanks to the editors for organizing this!

15.09.2025 15:18 โ€” ๐Ÿ‘ 22    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
How competition propels scientific risk-taking
Kevin Grossโˆ—
Department of Statistics
North Carolina State University
Raleigh, NC USA
Carl T. Bergstromโ€ 
Department of Biology
University of Washington
Seattle, WA USA
(Dated: September 9, 2025)
In science as elsewhere, attention is a limited resource and scientists compete with one another
to produce the most exciting, novel and impactful results. We develop a game-theoretic model to
explore how such competition influences the degree of risk that scientists are willing to embrace in
their research endeavors. We find that competition for scarce resourcesโ€”for example, publications
in elite journals, prestigious prizes, and faculty jobsโ€”motivates scientific risk-taking and may be
important in counterbalancing other incentives that favor cautious, incremental science. Even small
amounts of competition induce substantial risk-taking. Moreover, we find that in an โ€œopt-inโ€ contest,
increasing the stakes induces increased participationโ€”which crowds the contest and further impels
entrants to pursue higher-risk, higher-return investigations. The model also illuminates a source of
tension in academic training and collaboration. Researchers at different career stages differ in their
need to amass accomplishments that distinguish them from their peers, and therefore may not agree
on what degree of risk to accept.

How competition propels scientific risk-taking Kevin Grossโˆ— Department of Statistics North Carolina State University Raleigh, NC USA Carl T. Bergstromโ€  Department of Biology University of Washington Seattle, WA USA (Dated: September 9, 2025) In science as elsewhere, attention is a limited resource and scientists compete with one another to produce the most exciting, novel and impactful results. We develop a game-theoretic model to explore how such competition influences the degree of risk that scientists are willing to embrace in their research endeavors. We find that competition for scarce resourcesโ€”for example, publications in elite journals, prestigious prizes, and faculty jobsโ€”motivates scientific risk-taking and may be important in counterbalancing other incentives that favor cautious, incremental science. Even small amounts of competition induce substantial risk-taking. Moreover, we find that in an โ€œopt-inโ€ contest, increasing the stakes induces increased participationโ€”which crowds the contest and further impels entrants to pursue higher-risk, higher-return investigations. The model also illuminates a source of tension in academic training and collaboration. Researchers at different career stages differ in their need to amass accomplishments that distinguish them from their peers, and therefore may not agree on what degree of risk to accept.

1. What does a Cold War-era game theory problem known as the silent duel have to do with high-risk research strategies, publication in Cell/Nature/Science glamor journals, and the academic job market?

Kevin Gross and I tackle these questions in our latest arXiv preprint: arxiv.org/abs/2509.06718

14.09.2025 13:49 โ€” ๐Ÿ‘ 176    ๐Ÿ” 53    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 4
Post image

The count down starts for #CESRabat! Follow @ces2026.bsky.social and join us May 11-13 next year for an exciting meeting in Rabat, Morocco.

Massive thanks to the #CESRabat organising committee:
Sarah Alami (co-chair)
Mathieu Charbonneau (co-chair)
Zachary Garfield
Edmond Seabright

13.09.2025 03:14 โ€” ๐Ÿ‘ 61    ๐Ÿ” 45    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 3
Preview
Is it getting harder to make a hit? Evidence from 65 years of US music chart history - EPJ Data Science Since the creation of the Billboard Hot 100 music chart in 1958, the chart has been a window into the music consumption of Americans. Since its introduction, the chart has documented music consumption...

"Is it getting harder to make a hit?" โ€“ asks this new paper. Short answer: yes.

"The inequality in song lifetimes has increased rapidly since the turn of the millennium. Top-10 songs are lingering on the chart for longer and longer"

epjdatascience.springeropen.com/articles/10....

11.09.2025 15:13 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Post image

For decades, linguists assumed kids drive language change through โ€˜imperfectโ€™ learning. New research by Raviv, Blasi & Kempe (Psychological Review) show that instead, adolescents and young adults are more likely to spread, normalize, and cement linguistic shifts. www.mpi.nl/news/young-c...

09.09.2025 10:45 โ€” ๐Ÿ‘ 31    ๐Ÿ” 16    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 7

Haha, thanks Olivier!

04.09.2025 14:50 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Thank you, Monika!

04.09.2025 12:15 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Charting the evolution of European literature Oleg Sobchuk has received an ERC Starting Grant to study 200 years of European literary evolution

Great news from the @erc.europa.eu today: I received an ERC Starting Grant to study the cultural evolution of literature ๐Ÿ”ฅ

www.eva.mpg.de/press/news/a...

04.09.2025 11:45 โ€” ๐Ÿ‘ 121    ๐Ÿ” 10    ๐Ÿ’ฌ 31    ๐Ÿ“Œ 3
Figure 6. Map showing projectile types and the use of toxins in the world sample (a). Grey points represent unavailable data. Poison is only used in association with darts within this data set (not in pellets). The term โ€˜dartsโ€™ does not exclude an association with the use of poison, but may rather reflect a lack of information. The eastern USA is generally believed to use darts without toxins, but this has never been systematically studied and is therefore regarded as ambiguous. Our results show that some North American groups are reported to use toxins. The two โ€˜hotspotsโ€™ for blowguns are located in South East Asia (b) and South America (c). Darts are much more prevalent than pellets and pellets are more strongly associated with the โ€˜singleโ€™ type (d).

Figure 6. Map showing projectile types and the use of toxins in the world sample (a). Grey points represent unavailable data. Poison is only used in association with darts within this data set (not in pellets). The term โ€˜dartsโ€™ does not exclude an association with the use of poison, but may rather reflect a lack of information. The eastern USA is generally believed to use darts without toxins, but this has never been systematically studied and is therefore regarded as ambiguous. Our results show that some North American groups are reported to use toxins. The two โ€˜hotspotsโ€™ for blowguns are located in South East Asia (b) and South America (c). Darts are much more prevalent than pellets and pellets are more strongly associated with the โ€˜singleโ€™ type (d).

Humans are in fact a venomous species (Figure 6 from "A global database on blowguns with links to geography and language" Aguirre-Fernรกndez et al doi.org/10.1017/ehs....)

29.08.2025 12:55 โ€” ๐Ÿ‘ 35    ๐Ÿ” 5    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Preview
An exploration of basic human values in 38 million obituaries over 30 years | PNAS How societies remember the dead can reveal what people value in life. We analyzed 38 million obituaries from the United States to examine how perso...

๐Ÿชฆ New in @pnas.org: we analyzed 38 million U.S. obituaries to ask what signals a life well lived:

What values are people most remembered for?

How do legacies shift with cultural events?

How do age and gender shape what it means to have lived well?

www.pnas.org/doi/10.1073/...

27.08.2025 02:39 โ€” ๐Ÿ‘ 36    ๐Ÿ” 12    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 1

Substance was amazing in theatre (some people had very loud reactions to it, some left the room), but I'm a fan of body horror and enjoyed all of it: well written, well shot, tense and humorous at the same time. A lot of body horror, but never becomes cruel or disturbing

25.08.2025 17:27 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

My favorites from of the recent years: How to Blow Up a Pipeline, Substance, Companion, Bones and All, Anatomy of a Fall, Zone of Interest

25.08.2025 16:32 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Models as Prediction Machines: How to Convert Confusing Coefficients into Clear Quantities

Abstract
Psychological researchers usually make sense of regression models by interpreting coefficient estimates directly. This works well enough for simple linear models, but is more challenging for more complex models with, for example, categorical variables, interactions, non-linearities, and hierarchical structures. Here, we introduce an alternative approach to making sense of statistical models. The central idea is to abstract away from the mechanics of estimation, and to treat models as โ€œcounterfactual prediction machines,โ€ which are subsequently queried to estimate quantities and conduct tests that matter substantively. This workflow is model-agnostic; it can be applied in a consistent fashion to draw causal or descriptive inference from a wide range of models. We illustrate how to implement this workflow with the marginaleffects package, which supports over 100 different classes of models in R and Python, and present two worked examples. These examples show how the workflow can be applied across designs (e.g., observational study, randomized experiment) to answer different research questions (e.g., associations, causal effects, effect heterogeneity) while facing various challenges (e.g., controlling for confounders in a flexible manner, modelling ordinal outcomes, and interpreting non-linear models).

Models as Prediction Machines: How to Convert Confusing Coefficients into Clear Quantities Abstract Psychological researchers usually make sense of regression models by interpreting coefficient estimates directly. This works well enough for simple linear models, but is more challenging for more complex models with, for example, categorical variables, interactions, non-linearities, and hierarchical structures. Here, we introduce an alternative approach to making sense of statistical models. The central idea is to abstract away from the mechanics of estimation, and to treat models as โ€œcounterfactual prediction machines,โ€ which are subsequently queried to estimate quantities and conduct tests that matter substantively. This workflow is model-agnostic; it can be applied in a consistent fashion to draw causal or descriptive inference from a wide range of models. We illustrate how to implement this workflow with the marginaleffects package, which supports over 100 different classes of models in R and Python, and present two worked examples. These examples show how the workflow can be applied across designs (e.g., observational study, randomized experiment) to answer different research questions (e.g., associations, causal effects, effect heterogeneity) while facing various challenges (e.g., controlling for confounders in a flexible manner, modelling ordinal outcomes, and interpreting non-linear models).

Figure illustrating model predictions. On the X-axis the predictor, annual gross income in Euro. On the Y-axis the outcome, predicted life satisfaction. A solid line marks the curve of predictions on which individual data points are marked as model-implied outcomes at incomes of interest. Comparing two such predictions gives us a comparison. We can also fit a tangent to the line of predictions, which illustrates the slope at any given point of the curve.

Figure illustrating model predictions. On the X-axis the predictor, annual gross income in Euro. On the Y-axis the outcome, predicted life satisfaction. A solid line marks the curve of predictions on which individual data points are marked as model-implied outcomes at incomes of interest. Comparing two such predictions gives us a comparison. We can also fit a tangent to the line of predictions, which illustrates the slope at any given point of the curve.

A figure illustrating various ways to include age as a predictor in a model. On the x-axis age (predictor), on the y-axis the outcome (model-implied importance of friends, including confidence intervals).

Illustrated are 
1. age as a categorical predictor, resultings in the predictions bouncing around a lot with wide confidence intervals
2. age as a linear predictor, which forces a straight line through the data points that has a very tight confidence band and
3. age splines, which lies somewhere in between as it smoothly follows the data but has more uncertainty than the straight line.

A figure illustrating various ways to include age as a predictor in a model. On the x-axis age (predictor), on the y-axis the outcome (model-implied importance of friends, including confidence intervals). Illustrated are 1. age as a categorical predictor, resultings in the predictions bouncing around a lot with wide confidence intervals 2. age as a linear predictor, which forces a straight line through the data points that has a very tight confidence band and 3. age splines, which lies somewhere in between as it smoothly follows the data but has more uncertainty than the straight line.

Ever stared at a table of regression coefficients & wondered what you're doing with your life?

Very excited to share this gentle introduction to another way of making sense of statistical models (w @vincentab.bsky.social)
Preprint: doi.org/10.31234/osf...
Website: j-rohrer.github.io/marginal-psy...

25.08.2025 11:49 โ€” ๐Ÿ‘ 941    ๐Ÿ” 283    ๐Ÿ’ฌ 49    ๐Ÿ“Œ 19
Post image Post image Post image Post image

I happened to accidentally find this edited volume on quantitative approaches to literature, from 1969. Never heard of it - even though it was co-edited by one of the big figures in narratology: Lubomir Doleลพel (another surprise). The minimalist plots here are โšก๏ธ

25.08.2025 13:23 โ€” ๐Ÿ‘ 15    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Qualtrics Survey | Qualtrics Experience Management The most powerful, simple and trusted way to gather experience data. Start your journey to experience management and try a free account today.

๐Ÿง  Want to integrate cultural evolution into your course using award winning materials created by the field's experts, and get paid $2000 to do it? ๐Ÿ’ต

๐Ÿšจ The Cultural Evolution Society is seeking applications for the ACE Teaching Innovation Awards.

๐Ÿ”– Apply here: vuw.qualtrics.com/jfe/form/SV_...

21.08.2025 15:15 โ€” ๐Ÿ‘ 18    ๐Ÿ” 23    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 2
Preview
Overused Academic Book Cover Art (with images, tweets) ยท DrLeonJ In March 2014, I snagged a copy of *Transatlantic Literary Studies.* As soon as I had unwrapped it, I noted that it had the same cover art as Michael Kammen's *A Season of Youth.* It got me to wonderi...

I was looking through an old laptop and came across a thing on overused academic book cover art I put together years ago that I thought might raise a chuckle or two ๐Ÿ—ƒ๏ธ

web.archive.org/web/20180222...

05.08.2025 13:32 โ€” ๐Ÿ‘ 55    ๐Ÿ” 7    ๐Ÿ’ฌ 7    ๐Ÿ“Œ 3
Preview
Unmasking the Denisovans The Harbin cranium, linked to Denisovans via mitochondrial DNA, broadens their known range and provides the first insights into Denisovan morphology. โ€ฆ

Vanessa Villalba-Mouco and I wrote a little preview piece for Cell on the new mtDNA and proteomics results from the Harbin skull. It is free for 50 days at this link: www.sciencedirect.com/science/arti...

24.07.2025 21:26 โ€” ๐Ÿ‘ 24    ๐Ÿ” 7    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

RStudio + SageMaker + LLMs? Yes! ๐Ÿš€

Discover how to seamlessly integrate Amazon Bedrock LLMs using the gander R package โ€“ all within a managed RStudio environment on AWS.

Say goodbye to on-prem headaches!

Check out the blog post here: tinyurl.com/yv5ndpzp

10.07.2025 17:41 โ€” ๐Ÿ‘ 9    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Book cover by Simon Kirby

Book cover by Simon Kirby

Delighted to announce the publication of a collaborative effort, co-led by @limorraviv.bsky.social @mpi-nl.bsky.social, showcasing the ways in which researchers have made language evolution an empirical issue: A handbook of experimental approaches to the fascinating problem of language evolution ๐Ÿงช

27.05.2025 05:56 โ€” ๐Ÿ‘ 130    ๐Ÿ” 48    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 3
Preview
Learn Stan with brms, Part I | A. Solomon Kurz y ~ 1

New #rstats blog up!

solomonkurz.netlify.app/blog/2025-07...

This is the first in a brief series where we use {brms} to learn {Stan} code.

Many thanks to @fusaroli.bsky.social and @stephenjwild.bsky.social for their helpful reviews.

07.07.2025 15:05 โ€” ๐Ÿ‘ 105    ๐Ÿ” 36    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1

I love the visual style of these! Many great ideas, some of which I may steal :D

06.07.2025 22:39 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Put your manuscript about computational poetics in the trustworthy hands of @artjomshl.bsky.social

05.07.2025 20:49 โ€” ๐Ÿ‘ 8    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Super cool. Congratulations!

30.06.2025 08:15 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@sobchuk is following 20 prominent accounts