Mark Schaffer's Avatar

Mark Schaffer

@markeschaffer.bsky.social

Professor of Economics, Heriot-Watt University, Edinburgh

549 Followers  |  494 Following  |  32 Posts  |  Joined: 21.09.2023  |  1.7049

Latest posts by markeschaffer.bsky.social on Bluesky

Video thumbnail

Goddamn. What a sobering and poignant piece of conceptual art. What an amazing artist.

06.08.2025 23:36 β€” πŸ‘ 3658    πŸ” 1454    πŸ’¬ 84    πŸ“Œ 95
Preview
Research examines impact of Great Irish Famine on survivors' height The study used the historical data of 14,500 individuals, with different exposures to famine conditions and drawn from two prisons in Dublin and Tipperary, born before, during and after the famine

Never thought I'd get quoted in the Mirror ( the paper my grandmother still reads) www.irishmirror.ie/news/irish-n...

06.08.2025 20:46 β€” πŸ‘ 9    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

Academia is basically a collection of people who got lucky early on and mistook it for genius. doi.org/10.1073/pnas...

04.08.2025 11:26 β€” πŸ‘ 401    πŸ” 127    πŸ’¬ 15    πŸ“Œ 16
Preview
Commentary: The health crisis in the USSR: looking behind the facade My first visit to the USSR was in February 1981, by coincidence at exactly the same time that Nick Eberstadt published a commentary on β€˜The health crisis i

In a strange circularity, it seems that all the work I was doing in the early 1990s on concealment/distortion of statistics by USSR & DDR is becoming very relevant now

academic.oup.com/ije/article-...

01.08.2025 22:13 β€” πŸ‘ 335    πŸ” 116    πŸ’¬ 0    πŸ“Œ 5
Tweets from Lydia DePillis that read 

Great panel! I particularly appreciated acting BLS head Wiatrowski's response to one question: In light of this White House's disrespect for data as well as the very conservative background of its nominee for BLS Commissioner, how we would know if bias had crept into the process?

"I think the first way you would notice that would be if all the staff at the Postal Square building started walking out," Wiatrowski said, underscoring the intense professionalism these folks bring to their jobs. (Each panelist said nobody had tried to skew their output so far.)

Tweets from Lydia DePillis that read Great panel! I particularly appreciated acting BLS head Wiatrowski's response to one question: In light of this White House's disrespect for data as well as the very conservative background of its nominee for BLS Commissioner, how we would know if bias had crept into the process? "I think the first way you would notice that would be if all the staff at the Postal Square building started walking out," Wiatrowski said, underscoring the intense professionalism these folks bring to their jobs. (Each panelist said nobody had tried to skew their output so far.)

These 7 year old words from Wiatrowskiβ€”who is now acting commissioner of BLS after Trump fired commissioner McEntarfer todayβ€”are more important than ever.

01.08.2025 22:21 β€” πŸ‘ 494    πŸ” 106    πŸ’¬ 10    πŸ“Œ 4
These jobs range from coding the ideology of manifestos to identifying types of protest events to detecting certain political valences in speeches. The idea is to precisely calibrate exactly
how replicable one can expect machines to be in practice, and where (what types of tasks) we can expect better (lower variance, more replication) or worse performance. The human
workers provide a baseline comparison in terms of replicability. At a high level, the news is bad: while it is true that LMs can be (very) accurate relative to a gold standard, they also show considerable variance over time. And this is to say nothing of cases where they simply will not run at all, and thus fail the most basic requirement (see, e.g., Benureau and Rougier, 2018) of computational replication. Contrary to popular belief, the problems do not go away even if one sets "temperatures" (or equivalent tunings) to zero; indeed, this induces new but unpredictable problems with replication. Unsurprisingly, this variance affects the substantive answers we get downstream-that is, in subsequent analysis in which the labels

These jobs range from coding the ideology of manifestos to identifying types of protest events to detecting certain political valences in speeches. The idea is to precisely calibrate exactly how replicable one can expect machines to be in practice, and where (what types of tasks) we can expect better (lower variance, more replication) or worse performance. The human workers provide a baseline comparison in terms of replicability. At a high level, the news is bad: while it is true that LMs can be (very) accurate relative to a gold standard, they also show considerable variance over time. And this is to say nothing of cases where they simply will not run at all, and thus fail the most basic requirement (see, e.g., Benureau and Rougier, 2018) of computational replication. Contrary to popular belief, the problems do not go away even if one sets "temperatures" (or equivalent tunings) to zero; indeed, this induces new but unpredictable problems with replication. Unsurprisingly, this variance affects the substantive answers we get downstream-that is, in subsequent analysis in which the labels

3.2 The Problem with Language Model Replication
The central problem with replication for Language Models is that as we will show-the process exhibits the weaknesses of deterministic, stochastic and rule-based replication, without the strengths of any of them. To make this point clear, consider Table 1. There we document replication practices as a typology. What defines the typology is first, whether exact replication is possible; second, whether replication is fragile in the sense we discussed above.

With a 2x2 table showing that LLms are not exactly replicable and are fragile

3.2 The Problem with Language Model Replication The central problem with replication for Language Models is that as we will show-the process exhibits the weaknesses of deterministic, stochastic and rule-based replication, without the strengths of any of them. To make this point clear, consider Table 1. There we document replication practices as a typology. What defines the typology is first, whether exact replication is possible; second, whether replication is fragile in the sense we discussed above. With a 2x2 table showing that LLms are not exactly replicable and are fragile

Full results for each outcome and run are displayed in Figures 8 to 10 in SI C. We also give descriptions of what we found. For now, we summarize our main observations:
1. For the manifestos, the crowdworkers perform very well (by LM standards) and their variance is generally lower than the LMs.
2. For the protests crowdworkers are less accurate than the LMs, but very consistent in their performance.
3. Crowdworkers struggle in predictable ways: for example, they are least accurate when manifestos should have 'extreme' codings (far left /far right).
4. LMs struggle in unpredictable ways: for example, GPT made errors on more moderate (liberal manifestos, but it is hard to know why.
5. Comparing across LMs, errors and performances appears to be idiosyncratic: for example, Llama has recall on some tasks on a par with GPT but generally much lower
variance.
6. Open LMs have the best replication performance, at least in terms of low variance.
For instance, on the static tasks, Llama has practically zero variance in its coding performance.

Full results for each outcome and run are displayed in Figures 8 to 10 in SI C. We also give descriptions of what we found. For now, we summarize our main observations: 1. For the manifestos, the crowdworkers perform very well (by LM standards) and their variance is generally lower than the LMs. 2. For the protests crowdworkers are less accurate than the LMs, but very consistent in their performance. 3. Crowdworkers struggle in predictable ways: for example, they are least accurate when manifestos should have 'extreme' codings (far left /far right). 4. LMs struggle in unpredictable ways: for example, GPT made errors on more moderate (liberal manifestos, but it is hard to know why. 5. Comparing across LMs, errors and performances appears to be idiosyncratic: for example, Llama has recall on some tasks on a par with GPT but generally much lower variance. 6. Open LMs have the best replication performance, at least in terms of low variance. For instance, on the static tasks, Llama has practically zero variance in its coding performance.

3. Consider open models that allow offline versioning. We found that, uniquely, our open-weights implementations were replicable to a high standard if that standard is low variance. That is, if the goal is something approaching the Deterministic 'code and data' replication vision above, then local, versioned models are the way to go. These may not deliver top of the line performance (e.g. accuracy) but should be checked as a first resort. We acknowledge that an open LM may not be "transparent" in the sense that it is "easy" to understand how it produces predictions even if one has the weights. But it is obviously a boon to replication insofar as being able to verify that
the original researcher did indeed see the results they reported. What is more, recent research into LM interpretability points the way toward more model understanding and control but only if weights are accessible (Cunningham et al., 2023).

3. Consider open models that allow offline versioning. We found that, uniquely, our open-weights implementations were replicable to a high standard if that standard is low variance. That is, if the goal is something approaching the Deterministic 'code and data' replication vision above, then local, versioned models are the way to go. These may not deliver top of the line performance (e.g. accuracy) but should be checked as a first resort. We acknowledge that an open LM may not be "transparent" in the sense that it is "easy" to understand how it produces predictions even if one has the weights. But it is obviously a boon to replication insofar as being able to verify that the original researcher did indeed see the results they reported. What is more, recent research into LM interpretability points the way toward more model understanding and control but only if weights are accessible (Cunningham et al., 2023).

Finally got to read this new paper by @cbarrie.bsky.social & @lexipalmer.bsky.social & Arthur Spirling on the lack of replicability in LLM-based research and polisci and it's so good and concise and well-reasoned! arthurspirling.org/documents/Ba...

31.07.2025 01:02 β€” πŸ‘ 37    πŸ” 10    πŸ’¬ 2    πŸ“Œ 2
shows such as "DILFS" and "Captured and Bound by my CEO"

shows such as "DILFS" and "Captured and Bound by my CEO"

other gems such as "Pucked and Pregnant" and "Tricked into Having My Ex-Husband's Baby"

other gems such as "Pucked and Pregnant" and "Tricked into Having My Ex-Husband's Baby"

Was doing some reading on the Chinese Microdrama phenomenon β€” it's mostly a site called ReelShort where you pay ~50 cents to watch 90-second long episodes of slightly pornier Hallmark movies β€” and I'm losing it at these show titles.

30.07.2025 21:41 β€” πŸ‘ 530    πŸ” 46    πŸ’¬ 56    πŸ“Œ 35

If you ever find yourself wondering why half of the internet posts and comments written by French native speakers contain more spelling mistakes than words, listen and see why in 20 seconds:

30.07.2025 10:17 β€” πŸ‘ 96    πŸ” 20    πŸ’¬ 12    πŸ“Œ 12

"The best way to learn is to teach"

So I set off trying to learn #python by adding some pythonic content to my overwhelmingly #rstats courses.

Here are my impressions so far.

24.07.2025 10:54 β€” πŸ‘ 33    πŸ” 5    πŸ’¬ 3    πŸ“Œ 2
Delusions on the Left and Labour Right I had it with writing about internal Labour party politics at the end of the 2010s, and have written very little on the subject since. Wri...

New post: Delusions on the Left and Labour Right
mainlymacro.blogspot.com/2025/07/delu...
In which I will upset nearly everyone by arguing that the Labour Party only works if it is a broad church that spans left to right, but only if the left does not have control.

29.07.2025 08:00 β€” πŸ‘ 68    πŸ” 25    πŸ’¬ 45    πŸ“Œ 32

How many knees does the AI allocate per picture?

More to the point, how many knees does the magazine's art director think it's normal to have?

28.07.2025 09:16 β€” πŸ‘ 108    πŸ” 22    πŸ’¬ 24    πŸ“Œ 1

With Tom Lehrer's passing, I suppose this is a moment to share the story of the prank he played on the National Security Agency, and how it went undiscovered for nearly 60 years.

27.07.2025 21:01 β€” πŸ‘ 8545    πŸ” 3605    πŸ’¬ 143    πŸ“Œ 715

Ooh this is a nice oppoertunity to plug our recent TICS article - see box 3 for differences in how human children and LLMs learn. www.cell.com/trends/cogni...

26.07.2025 19:06 β€” πŸ‘ 33    πŸ” 6    πŸ’¬ 0    πŸ“Œ 0
Post image

Researcher: "We let the data speak for itself."

Earlier that day:

02.01.2025 15:31 β€” πŸ‘ 8065    πŸ” 1028    πŸ’¬ 100    πŸ“Œ 70
Post image

People who find French hard are lucky they weren't born 1000 years ago.

Old French had noun cases - yes, like Latin! - but only two were left:

Li om voit le chien. (The man sees the dog.)
Li chiens voit l'ome. (The dog sees the man.)

My graphic tells you all about their origin and their demise:

25.07.2025 19:22 β€” πŸ‘ 208    πŸ” 41    πŸ’¬ 17    πŸ“Œ 5
Woman explains how the elderly who support Palestine at protests are terrorists.
YouTube video by Rosie Holt Woman explains how the elderly who support Palestine at protests are terrorists.

Woman explains how the elderly who attend Pro Palestine protests are terrorists who must be stopped. youtu.be/iy3icAPWguo?...

25.07.2025 10:39 β€” πŸ‘ 564    πŸ” 170    πŸ’¬ 44    πŸ“Œ 34

Nothing to read for your summer holiday? Then have a look at

"Interactive, Grouped and Non-separable Fixed Effects: A Practitioner's Guide to the New Panel Data Econometrics"

The paper is available here:

papers.ssrn.com/sol3/papers....

25.07.2025 12:09 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Preview
Rachel Reeves mulls stepping in to save car loan providers billions Exclusive: Chancellor could overrule supreme court if it upholds entirety of ruling over commission paid to brokers

It is important to understand what this case is about, so as to follow why retrospective legislation letting the lenders off the hook is such a bad idea.

Part of the responsibility for this farce lies with the UK state.

/1
www.theguardian.com/business/202...

25.07.2025 07:28 β€” πŸ‘ 62    πŸ” 36    πŸ’¬ 4    πŸ“Œ 10
Post image

Martin Scorsese drew these storyboards when he was 11 years old. They are for a Roman epic entitled THE ETERNAL CITY.

24.07.2025 04:57 β€” πŸ‘ 1589    πŸ” 287    πŸ’¬ 30    πŸ“Œ 68
Preview
How archaeologists are missing Pleistocene cultures I propose a β€œCulture First” way of looking at ancient remains, instead of the β€œCulture Last” assumption so pervasive in the field.

Ignoring culture has been a major impediment to understanding the material record from deep time. But there are amazing prospects for changing this situation from Culture Last to Culture First.

www.johnhawks.net/p/how-archae...

24.07.2025 02:26 β€” πŸ‘ 29    πŸ” 10    πŸ’¬ 1    πŸ“Œ 1
Post image

top entertainment (you might hope ) i have not followed the qr code

23.07.2025 18:05 β€” πŸ‘ 13    πŸ” 4    πŸ’¬ 3    πŸ“Œ 6

Of course!

Here's the 2017 exam: s3.amazonaws.com/file.paulgp....

and here's 2024:

s3.amazonaws.com/file.paulgp....

23.07.2025 20:31 β€” πŸ‘ 9    πŸ” 1    πŸ’¬ 1    πŸ“Œ 2

Gift version of Goldin’s NYT column

www.nytimes.com/2025/06/06/o...

22.07.2025 19:09 β€” πŸ‘ 64    πŸ” 17    πŸ’¬ 0    πŸ“Œ 1
Post image

My optimal setup as an applied microeconomist.

22.07.2025 13:18 β€” πŸ‘ 140    πŸ” 16    πŸ’¬ 2    πŸ“Œ 3
Preview
ChatGPT advises women to ask for lower salaries, study finds A new study has found that large language models (LLMs) like ChatGPT consistently advise women to ask for lower salaries than men.

Study finds A.I. LLMs advise women to ask for lower salaries than men. When prompted w/ a user profile of same education, experience & job role, differing only by gender, ChatGPT advised the female applicant to request $280K salary; Male applicant=$400K.
thenextweb.com/news/chatgpt...

20.07.2025 20:15 β€” πŸ‘ 1947    πŸ” 1031    πŸ’¬ 92    πŸ“Œ 340
The sexiest little cannon you will ever see pouts at the camera. For reasons known best to the inventor, the bore and body of the cannon is elliptical and they've gone the extra mile to add a sexy little cupid's bow to the muzzle.

It's only a brass model and I've no idea if the full size version was ever allowed to seal anything with a loving kiss.

The sexiest little cannon you will ever see pouts at the camera. For reasons known best to the inventor, the bore and body of the cannon is elliptical and they've gone the extra mile to add a sexy little cupid's bow to the muzzle. It's only a brass model and I've no idea if the full size version was ever allowed to seal anything with a loving kiss.

This account is unashamedly francophilic, and has no truck with the Anglo prejudice that the French are obsessed with sex.

The official museum of the French army, on the other hand...

20.07.2025 21:17 β€” πŸ‘ 318    πŸ” 58    πŸ’¬ 24    πŸ“Œ 7
Workout Programs The most popular bodybuilding message boards!

Who else are you going to debate the eternal question "are there 7 days in a week, or 8?"

(thank goodness people archived this)

web.archive.org/web/20150105...

21.07.2025 00:20 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0
Post image

Very pleased that our local projections dif-in-dif paper is now out in the Journal of Applied Econometrics. Joint with @dgirardi.bsky.social, Jorda, and Taylor.

It's a tool that we think many applied economists will find useful (indeed many already have).

🧡

20.07.2025 03:07 β€” πŸ‘ 151    πŸ” 36    πŸ’¬ 1    πŸ“Œ 6
Post image

What does it say in the image?

It says 'minimum'.

Hard to read? Medieval scribes thought so too.

That's why they invented the dot on the i.

This way, you could at least see which strokes represented vowels - and that helped a lot.

For similar reasons, the letter j was invented. Two ... 1/

14.07.2025 21:01 β€” πŸ‘ 182    πŸ” 48    πŸ’¬ 7    πŸ“Œ 9
NEP/RePEc link to paper

Joint Quantile Shrinkage: A State-Space Approach toward Non-Crossing Bayesian Quantile Models: David Kohns; Tibor Szendrei

14.07.2025 21:45 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

@markeschaffer is following 19 prominent accounts