Home - Somewhere On Earth Productions
SOMEWHERE ON EARTH PRODUCTIONS: We are here to connect technology and business to people and new possibilities.
ICYMI: Listen to @manueltonneau.bsky.social @oii.ox.ac.uk's interview with the SOEP podcast talking about his new research into hate speech, online platforms and disparities in content moderation across different European countries. Available here: bit.ly/4ntsiRU
01.10.2025 13:46 β π 1 π 1 π¬ 0 π 1
π¨Hiring a fully funded (3.5 years) PhD for the @ldnsocmedobs.bsky.social to research social media and politics. Candidates should have quantitative/computational skills and/or be interested in content curation/moderation. UK home candidates only unfortunately. www.royalholloway.ac.uk/media/hquftp...
29.09.2025 17:21 β π 4 π 14 π¬ 1 π 3
π£ New Preprint!
Have you ever wondered what the political content in LLM's training data is? What are the political opinions expressed? What is the proportion of left- vs right-leaning documents in the pre- and post-training data? Do they correlate with the political biases reflected in models?
29.09.2025 14:54 β π 45 π 14 π¬ 2 π 0
Social media feeds today are optimized for engagement, often leading to misalignment between users' intentions and technology use.
In a new paper, we introduce Bonsai, a tool to create feeds based on stated preferences, rather than predicted engagement.
arxiv.org/abs/2509.10776
16.09.2025 13:24 β π 153 π 46 π¬ 5 π 7
We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation".
We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks.
For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations.
Then, we collect 13 million LLM annotations across plausible LLM configurations.
These annotations feed into 1.4 million regressions testing the hypotheses.
For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions.
Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors.
Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models.
Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.
π¨ New paper alert π¨ Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.
Paper: arxiv.org/pdf/2509.08825
12.09.2025 10:33 β π 265 π 96 π¬ 6 π 20
1/ π¨ Big news π¨ today weβre launching Tech for Open Minds (TOM) at @DukeUβ a global program exploring how technology shapes open-mindedness, humility & polarization ππ§
πhttps://sicss.io/stories/2025-08-18
29.08.2025 16:06 β π 10 π 2 π¬ 1 π 0
thanks a lot for the repost!
29.08.2025 07:19 β π 1 π 0 π¬ 0 π 0
OSF
Millions of users are posting to social media and other platforms in languages with zero moderators, even within the EU.
That's the topline finding from an impressive new working paper leveraging newly mandated transparency data under the DSA led by @manueltonneau.bsky.social osf.io/preprints/so...
28.08.2025 09:41 β π 9 π 5 π¬ 0 π 1
I don't have Portuguese roots but my parents liked the name, and I lived in Lisbon for a few months, so can speak um bocado :)
28.08.2025 09:50 β π 1 π 0 π¬ 1 π 0
Thank you and great point! We did not but I suppose we could find the info in the DSA Transparency Database, at least for Spanish and Portuguese. The issue I foresee though is that we'll only have info for moderation in EU countries and nothing on Latin America. Still, worth a look, thanks again!
28.08.2025 09:48 β π 1 π 0 π¬ 0 π 0
Merci beaucoup pour le repost :)
28.08.2025 09:38 β π 1 π 0 π¬ 0 π 0
muito obrigado :)
28.08.2025 09:37 β π 0 π 0 π¬ 1 π 0
@oii.ox.ac.uk @weizenbauminstitut.bsky.social @umassamherst.bsky.social @umich.edu
28.08.2025 08:44 β π 1 π 0 π¬ 0 π 0
Finally tagging scholars whose work inspired this piece: @monaelswah.bsky.social @farhana-shahid.bsky.social @nicp.bsky.social @cgoanta.bsky.social Your feedback is most welcome!
28.08.2025 08:44 β π 1 π 0 π¬ 1 π 0
This would also not have been possible without data collection efforts led by @jurgenpfeffer.bsky.social and without @claesdevreese.bsky.social @aurman21.bsky.social who made me aware of the DSA moderator count data on here a while back, thank you all!
28.08.2025 08:44 β π 3 π 0 π¬ 2 π 0
Had a blast working on this paper with my wonderful coauthors @deeliu97.bsky.social @antisomniac.bsky.social @ze.vin Ralph @ethanz.bsky.social @computermacgyver.bsky.social
28.08.2025 08:44 β π 3 π 0 π¬ 1 π 0
OII | OII researchers propose recommendations for effective data governance in light of the EUβs Digital Service Act
OII researchers propose a series of recommendations for effective data access and data governance in light of the EUβs Digital Service Act.
We also issue a recommendation: platforms and regulators should improve transparency by reporting moderator counts with context (eg content volume per language), ensure consistent reporting over time, and extend data coverage beyond EU languages.
28.08.2025 08:44 β π 1 π 0 π¬ 1 π 0
So what? The main implication is that speakers of underserved languages likely receive less protection from online harms. Our analysis also nuances existing concerns: while Global South languages are consistently underserved, allocation for other non-English languages varies widely across platforms.
28.08.2025 08:44 β π 1 π 0 π¬ 1 π 0
For languages with moderators, we normalize mod counts by content volume per language and find that platforms allocate moderation workforce disproportionately relative to content volume, with languages primarily spoken in the Global South (Spanish, Portuguese, Arabic) consistently underserved.
28.08.2025 08:44 β π 0 π 0 π¬ 1 π 0
We also quantify the amount of EU-based users whose national language does not have moderators, and weβre talking about millions of users posting in languages with zero moderators.
28.08.2025 08:44 β π 1 π 0 π¬ 1 π 0
Taking Twitter/X as an example, we then show that languages subject to moderation blind spots are generally widely spoken on social media, representing an average of 31% of all tweets during a one-day period in countries where they are the official language.
28.08.2025 08:44 β π 1 π 0 π¬ 1 π 0
We first look at language coverage and find that while larger platforms such as YouTube and Meta have moderators in most EU languages, smaller platforms such as X and Snapchat have several language blind spots with no human moderators, particularly in Southern, Eastern and Northern Europe.
28.08.2025 08:44 β π 3 π 1 π¬ 1 π 0
Frances Haugen: βI never wanted to be a whistleblower. But lives were in dangerβ
The woman whose revelations have rocked Facebook tells how spending time with her mother, a priest, motivated her to speak out
Concerns about underinvestment in non-English moderation have long circulated via whistleblower leaks, but they were never quantified. The EUβs Digital Services Act is a turning point, requiring platforms to disclose moderator counts per language, making cross-lingual comparison possible.
28.08.2025 08:44 β π 2 π 0 π¬ 1 π 0
Social media platforms operate globally, but do they allocate human moderation equitably across languages?
Our new WP shows the answer is no:
-Millions of users post in languages with zero moderators
-Where mods exist, mod count relative to content volume varies widely across langs
osf.io/amfws
28.08.2025 08:44 β π 18 π 11 π¬ 2 π 5
Very cool piece by my colleague @antisomniac.bsky.social on how YouTube is used differently across languages. Worth a read!
13.08.2025 19:31 β π 3 π 0 π¬ 0 π 0
π Thrilled to share that our HateDay paper has received an Outstanding Paper Award at #ACL2025
Big thanks to my wonderful co-authors: @deeliu97.bsky.social, Niyati, @computermacgyver.bsky.social, Sam, Victor, and @paul-rottger.bsky.social!
Thread πand data avail at huggingface.co/datasets/man...
31.07.2025 08:05 β π 29 π 7 π¬ 2 π 1
Creators pour years into building a following, but in a growing underground market, you can simply buy accounts and inherit their audience.
In our new pre-print, we find this practice of repurposing accounts to be prevalent and consequential on YouTube!
arxiv.org/abs/2507.16045
30.07.2025 20:29 β π 9 π 2 π¬ 1 π 0
New! Heading to #ACL2025NLP today? Hear from @oii.ox.ac.uk researchers presenting new research and sharing recent findings which aim to help address inequalities in natural language processing models. 1/4
28.07.2025 09:18 β π 2 π 1 π¬ 1 π 1
Join @manueltonneau.bsky.social as he presents his co-authored paper βHateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitterβ this afternoon. Mon 28 July, 14.00-15.00. Hall A. 2/4
28.07.2025 09:18 β π 2 π 1 π¬ 1 π 1
seeks to understand language.
Head of Cohere Labs
@Cohere_Labs @Cohere
PhD from @UvA_Amsterdam
https://marziehf.github.io/
We are a multidisciplinary research centre based @royalholloway.bsky.social. We use computational methods and big data to investigate the role of social media platforms in politics, in partnership with key stakeholders and policymakers.
Postdoc @milanlp.bsky.social / Incoming Postdoc @stanfordnlp.bsky.social / Computational social science, LLMs, algorithmic fairness
Complex systems, Networks, Computational Social Science, Machine Learning
Postdoc at uc3m-IBiDat, Madrid
https://blas-ko.github.io/
Sociologist & Computer Scientist
PI: @dataworkersinquiry.bsky.social
Research Lead @weizenbauminstitut.bsky.social
Research Lead @dairinstitute.bsky.social
π milamiceli.com
Executive Vice-President for a Clean, Just and Competitive Transition and Commissioner for Competition.
European Commission (2024-2029)
EU Policy Lead & Applied Researcher @ Hugging Face π€
Computer Scientist, PhD
Wikipedia & languages are my β‘
Forschung fΓΌr die vernetzte Gesellschaft \\ Research for the networked society
https://www.weizenbaum-institut.de/
Assistant Professor at Politecnico di Milano. Capoeira at Sul Da Bahia Milano.
Researching @weizenbauminstitut.bsky.social | Computational Social Science | cu @ICA25 & IC2S2
DE/EN. π Potsdam / Berlin
coordination of @dsa40collaboratory.bsky.social, various research at @weizenbauminstitut.bsky.social
among other things: http://zusammenfuergleichstellung.de
PhD student @ Centre for Digital Governance, Hertie School. What if states can deploy lots of cognitive power very soon?
Tech correspondent for the BBC. I write about how we use technology, and how technology uses us. Contact info on tomgermain.com
PhD candidate at Cornell University. HCI researcher examining trust & safety issues in the Majority World.
The internet, YouTube, Wikipedia, NYC, birds, media... Sr Research Fellow at UMass Amherst Initiative for Digital Public Infrastructure, Media Cloud.
Places:
rhododendrites.com
Rhododendrites @ Wikipedia/Instagram/Threads
Antisomniac @ Mastodon.social
Postdoc β Aalborg University (CPH) π©π°
#NLPxEducation #NLPxHR #NLP
Past:
π©π° IT University of Copenhagen
π¨π Swiss Federal Institute of Technology Lausanne
πΈπ¬ National University of Singapore
π©πͺ NEC
π³π± University of Groningen
π https://jjzha.github.io/
Associate Professor of Political Communication and Computational Social Sciences at Royal Holloway University of London. Director of the London Social Media Observatory. UKRI Future Leader Fellow.
Associate Professor, Oxford DPIR
I study the role of technology and conspiracy theories in democratic politics. NYU PhD.
https://www.janzilinsky.com