Goddamn. What a sobering and poignant piece of conceptual art. What an amazing artist.
06.08.2025 23:36 β π 3658 π 1454 π¬ 84 π 95@markeschaffer.bsky.social
Professor of Economics, Heriot-Watt University, Edinburgh
Goddamn. What a sobering and poignant piece of conceptual art. What an amazing artist.
06.08.2025 23:36 β π 3658 π 1454 π¬ 84 π 95Never thought I'd get quoted in the Mirror ( the paper my grandmother still reads) www.irishmirror.ie/news/irish-n...
06.08.2025 20:46 β π 9 π 1 π¬ 1 π 0Academia is basically a collection of people who got lucky early on and mistook it for genius. doi.org/10.1073/pnas...
04.08.2025 11:26 β π 401 π 127 π¬ 15 π 16In a strange circularity, it seems that all the work I was doing in the early 1990s on concealment/distortion of statistics by USSR & DDR is becoming very relevant now
academic.oup.com/ije/article-...
Tweets from Lydia DePillis that read Great panel! I particularly appreciated acting BLS head Wiatrowski's response to one question: In light of this White House's disrespect for data as well as the very conservative background of its nominee for BLS Commissioner, how we would know if bias had crept into the process? "I think the first way you would notice that would be if all the staff at the Postal Square building started walking out," Wiatrowski said, underscoring the intense professionalism these folks bring to their jobs. (Each panelist said nobody had tried to skew their output so far.)
These 7 year old words from Wiatrowskiβwho is now acting commissioner of BLS after Trump fired commissioner McEntarfer todayβare more important than ever.
01.08.2025 22:21 β π 494 π 106 π¬ 10 π 4These jobs range from coding the ideology of manifestos to identifying types of protest events to detecting certain political valences in speeches. The idea is to precisely calibrate exactly how replicable one can expect machines to be in practice, and where (what types of tasks) we can expect better (lower variance, more replication) or worse performance. The human workers provide a baseline comparison in terms of replicability. At a high level, the news is bad: while it is true that LMs can be (very) accurate relative to a gold standard, they also show considerable variance over time. And this is to say nothing of cases where they simply will not run at all, and thus fail the most basic requirement (see, e.g., Benureau and Rougier, 2018) of computational replication. Contrary to popular belief, the problems do not go away even if one sets "temperatures" (or equivalent tunings) to zero; indeed, this induces new but unpredictable problems with replication. Unsurprisingly, this variance affects the substantive answers we get downstream-that is, in subsequent analysis in which the labels
3.2 The Problem with Language Model Replication The central problem with replication for Language Models is that as we will show-the process exhibits the weaknesses of deterministic, stochastic and rule-based replication, without the strengths of any of them. To make this point clear, consider Table 1. There we document replication practices as a typology. What defines the typology is first, whether exact replication is possible; second, whether replication is fragile in the sense we discussed above. With a 2x2 table showing that LLms are not exactly replicable and are fragile
Full results for each outcome and run are displayed in Figures 8 to 10 in SI C. We also give descriptions of what we found. For now, we summarize our main observations: 1. For the manifestos, the crowdworkers perform very well (by LM standards) and their variance is generally lower than the LMs. 2. For the protests crowdworkers are less accurate than the LMs, but very consistent in their performance. 3. Crowdworkers struggle in predictable ways: for example, they are least accurate when manifestos should have 'extreme' codings (far left /far right). 4. LMs struggle in unpredictable ways: for example, GPT made errors on more moderate (liberal manifestos, but it is hard to know why. 5. Comparing across LMs, errors and performances appears to be idiosyncratic: for example, Llama has recall on some tasks on a par with GPT but generally much lower variance. 6. Open LMs have the best replication performance, at least in terms of low variance. For instance, on the static tasks, Llama has practically zero variance in its coding performance.
3. Consider open models that allow offline versioning. We found that, uniquely, our open-weights implementations were replicable to a high standard if that standard is low variance. That is, if the goal is something approaching the Deterministic 'code and data' replication vision above, then local, versioned models are the way to go. These may not deliver top of the line performance (e.g. accuracy) but should be checked as a first resort. We acknowledge that an open LM may not be "transparent" in the sense that it is "easy" to understand how it produces predictions even if one has the weights. But it is obviously a boon to replication insofar as being able to verify that the original researcher did indeed see the results they reported. What is more, recent research into LM interpretability points the way toward more model understanding and control but only if weights are accessible (Cunningham et al., 2023).
Finally got to read this new paper by @cbarrie.bsky.social & @lexipalmer.bsky.social & Arthur Spirling on the lack of replicability in LLM-based research and polisci and it's so good and concise and well-reasoned! arthurspirling.org/documents/Ba...
31.07.2025 01:02 β π 37 π 10 π¬ 2 π 2shows such as "DILFS" and "Captured and Bound by my CEO"
other gems such as "Pucked and Pregnant" and "Tricked into Having My Ex-Husband's Baby"
Was doing some reading on the Chinese Microdrama phenomenon β it's mostly a site called ReelShort where you pay ~50 cents to watch 90-second long episodes of slightly pornier Hallmark movies β and I'm losing it at these show titles.
30.07.2025 21:41 β π 530 π 46 π¬ 56 π 35If you ever find yourself wondering why half of the internet posts and comments written by French native speakers contain more spelling mistakes than words, listen and see why in 20 seconds:
30.07.2025 10:17 β π 96 π 20 π¬ 12 π 12"The best way to learn is to teach"
So I set off trying to learn #python by adding some pythonic content to my overwhelmingly #rstats courses.
Here are my impressions so far.
New post: Delusions on the Left and Labour Right
mainlymacro.blogspot.com/2025/07/delu...
In which I will upset nearly everyone by arguing that the Labour Party only works if it is a broad church that spans left to right, but only if the left does not have control.
How many knees does the AI allocate per picture?
More to the point, how many knees does the magazine's art director think it's normal to have?
With Tom Lehrer's passing, I suppose this is a moment to share the story of the prank he played on the National Security Agency, and how it went undiscovered for nearly 60 years.
27.07.2025 21:01 β π 8545 π 3605 π¬ 143 π 715Ooh this is a nice oppoertunity to plug our recent TICS article - see box 3 for differences in how human children and LLMs learn. www.cell.com/trends/cogni...
26.07.2025 19:06 β π 33 π 6 π¬ 0 π 0Researcher: "We let the data speak for itself."
Earlier that day:
People who find French hard are lucky they weren't born 1000 years ago.
Old French had noun cases - yes, like Latin! - but only two were left:
Li om voit le chien. (The man sees the dog.)
Li chiens voit l'ome. (The dog sees the man.)
My graphic tells you all about their origin and their demise:
Woman explains how the elderly who attend Pro Palestine protests are terrorists who must be stopped. youtu.be/iy3icAPWguo?...
25.07.2025 10:39 β π 564 π 170 π¬ 44 π 34Nothing to read for your summer holiday? Then have a look at
"Interactive, Grouped and Non-separable Fixed Effects: A Practitioner's Guide to the New Panel Data Econometrics"
The paper is available here:
papers.ssrn.com/sol3/papers....
It is important to understand what this case is about, so as to follow why retrospective legislation letting the lenders off the hook is such a bad idea.
Part of the responsibility for this farce lies with the UK state.
/1
www.theguardian.com/business/202...
Martin Scorsese drew these storyboards when he was 11 years old. They are for a Roman epic entitled THE ETERNAL CITY.
24.07.2025 04:57 β π 1589 π 287 π¬ 30 π 68Ignoring culture has been a major impediment to understanding the material record from deep time. But there are amazing prospects for changing this situation from Culture Last to Culture First.
www.johnhawks.net/p/how-archae...
top entertainment (you might hope ) i have not followed the qr code
23.07.2025 18:05 β π 13 π 4 π¬ 3 π 6Of course!
Here's the 2017 exam: s3.amazonaws.com/file.paulgp....
and here's 2024:
s3.amazonaws.com/file.paulgp....
Gift version of Goldinβs NYT column
www.nytimes.com/2025/06/06/o...
My optimal setup as an applied microeconomist.
22.07.2025 13:18 β π 140 π 16 π¬ 2 π 3Study finds A.I. LLMs advise women to ask for lower salaries than men. When prompted w/ a user profile of same education, experience & job role, differing only by gender, ChatGPT advised the female applicant to request $280K salary; Male applicant=$400K.
thenextweb.com/news/chatgpt...
The sexiest little cannon you will ever see pouts at the camera. For reasons known best to the inventor, the bore and body of the cannon is elliptical and they've gone the extra mile to add a sexy little cupid's bow to the muzzle. It's only a brass model and I've no idea if the full size version was ever allowed to seal anything with a loving kiss.
This account is unashamedly francophilic, and has no truck with the Anglo prejudice that the French are obsessed with sex.
The official museum of the French army, on the other hand...
Who else are you going to debate the eternal question "are there 7 days in a week, or 8?"
(thank goodness people archived this)
web.archive.org/web/20150105...
Very pleased that our local projections dif-in-dif paper is now out in the Journal of Applied Econometrics. Joint with @dgirardi.bsky.social, Jorda, and Taylor.
It's a tool that we think many applied economists will find useful (indeed many already have).
π§΅
What does it say in the image?
It says 'minimum'.
Hard to read? Medieval scribes thought so too.
That's why they invented the dot on the i.
This way, you could at least see which strokes represented vowels - and that helped a lot.
For similar reasons, the letter j was invented. Two ... 1/