WikiResearch's Avatar

WikiResearch

@wikiresearch.bsky.social

Mostly a placeholder until we can bring our feed of volunteer-curated Wikipedia/Wikidata/Wikimedia research news to this platform too. For full coverage subscribe to our newsletter: https://meta.wikimedia.org/wiki/Research:Newsletter

437 Followers  |  36 Following  |  13 Posts  |  Joined: 13.05.2024  |  2.0424

Latest posts by wikiresearch.bsky.social on Bluesky

abstract of the paper "What did Elon change? A comprehensive analysis of Grokipedia"

Elon Musk released Grokipedia on 27 October 2025 to provide an alternative to Wikipedia, the crowdsourced online encyclopedia. In this paper, we provide the first comprehensive analysis of Grokipedia and compare it to a dump of Wikipedia, with a focus on article similarity and citation practices. Although Grokipedia articles are much longer than their corresponding English Wikipedia articles, we find that much of Grokipedia's content (including both articles with and without Creative Commons licenses) is highly derivative of Wikipedia. Nevertheless, citation practices between the sites differ greatly, with Grokipedia citing many more sources deemed "generally unreliable" or "blacklisted" by the English Wikipedia community and low quality by external scholars, including dozens of citations to sites like Stormfront and Infowars. We then analyze article subsets: one about elected officials, one about controversial topics, and one random subset for which we derive article quality and topic. We find that the elected official and controversial article subsets showed less similarity between their Wikipedia version and Grokipedia version than other pages. The random subset illustrates that Grokipedia focused rewriting the highest quality articles on Wikipedia, with a bias towards biographies, politics, society, and history. Finally, we publicly release our nearly-full scrape of Grokipedia, as well as embeddings of the entire Grokipedia corpus.

abstract of the paper "What did Elon change? A comprehensive analysis of Grokipedia" Elon Musk released Grokipedia on 27 October 2025 to provide an alternative to Wikipedia, the crowdsourced online encyclopedia. In this paper, we provide the first comprehensive analysis of Grokipedia and compare it to a dump of Wikipedia, with a focus on article similarity and citation practices. Although Grokipedia articles are much longer than their corresponding English Wikipedia articles, we find that much of Grokipedia's content (including both articles with and without Creative Commons licenses) is highly derivative of Wikipedia. Nevertheless, citation practices between the sites differ greatly, with Grokipedia citing many more sources deemed "generally unreliable" or "blacklisted" by the English Wikipedia community and low quality by external scholars, including dozens of citations to sites like Stormfront and Infowars. We then analyze article subsets: one about elected officials, one about controversial topics, and one random subset for which we derive article quality and topic. We find that the elected official and controversial article subsets showed less similarity between their Wikipedia version and Grokipedia version than other pages. The random subset illustrates that Grokipedia focused rewriting the highest quality articles on Wikipedia, with a bias towards biographies, politics, society, and history. Finally, we publicly release our nearly-full scrape of Grokipedia, as well as embeddings of the entire Grokipedia corpus.

back again to share a new preprint from me and @mantzarlis.com! β€œWhat did Elon Change? A comprehensive analysis of Grokipedia” arxiv.org/abs/2511.09685

I had seen many spot analyses of individual grokipedia pages, but I was curious: how was grokipedia made? what did Elon change from wikipedia?

17.11.2025 16:10 β€” πŸ‘ 10    πŸ” 9    πŸ’¬ 1    πŸ“Œ 1
Preview
Grokipedia cites a Nazi forum and fringe conspiracy websites A site-wide comparison with Wikipedia sheds light on what Elon Musk is trying to do

Key points in new Cornell Tech research:

56% of Grokipedia entries carry the Wikipedia CC license, suggesting wholesale ingestion

Grokipedia’s top 100 sources include fewer news outlets and more UGC (e.g. LinkedIn scraping)

Grokipedia has fewer citations overall, making it harder to check sources

13.11.2025 14:17 β€” πŸ‘ 14    πŸ” 8    πŸ’¬ 0    πŸ“Œ 0
Wikidata Map inΒ 2025 Another year, another map, and another Birthday for Wikidata. Last generated in 2024 by @tarrow and @outdooracorn, this year I have put the work in just ahead of the 13th Wikidata birthday to have a look at what's changed in terms of items with coordinates this past year on Wikidata. And here it is! But really you need to look at the diff between previous years to see what has changed!

Wikidata Map inΒ 2025

Another year, another map, and another Birthday for Wikidata. Last generated in 2024 by @tarrow and @outdooracorn, this year I have put the work in just ahead of the 13th Wikidata birthday to have a look at what's changed in terms of items with coordinates this past year on…

28.10.2025 23:14 β€” πŸ‘ 0    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image Post image

#Grokipedia set out to β€œfix” #Wikipedia.
Turns out it mostly rewrites it, longer, slicker, less sourced.
Fluent, but fragile. @wikiresearch.bsky.social

31.10.2025 21:26 β€” πŸ‘ 6    πŸ” 2    πŸ’¬ 2    πŸ“Œ 0
Preview
Investigating extreme cases in Wikipedia talk pages: Some insights on user behaviours Investigating extreme cases in Wikipedia talk pages: Some insights on user behaviours was published in Exploring digitally-mediated communication with corpora on page 453.

Alternative link: www.degruyterbrill.com/document/doi...

15.10.2025 00:57 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

"Investigating extreme cases in Wikipedia talk pages: Some insights on user behaviours"
uplopen.com/chapters/e…
e.g. "the most prolific users, the longest threads (in terms of total duration, number of posts or number of distinct users involved) and the longest monologues"

15.10.2025 00:31 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1
Preview
Using a Wikipedia edit-a-thon as a cross-curricular STEM representation assignment - Discover Education Background Wikipedia is a highly used, free, online encyclopedia with known gender disparities across its biography content. Editing Wikipedia has entered STEM classrooms as a writing-focused and sometimes equity-focused assignment. This paper presents a Wikipedia edit-a-thon event at the Wentworth Institute of Technology in Boston, Massachusetts focused on improving articles about women in STEM. This edit-a-thon promoted cross-disciplinary collaboration and community building with faculty and undergraduate students across eleven courses and disparate disciplines and offices at the university. Results Edit-a-thon attendees edited pages on women in STEM and listened to five-minute lightning talks by women in the university community: students, former faculty, and administrators. The impacts of the event include the addition of more than 15,000 words and 100 references to more than 100 articles on Wikipedia. The event supported a variety of student learning outcomes in participating courses across disciplines in the sciences and humanities. Conclusions A Wikipedia edit-a-thon supported student learning across multiple subjects while contributing to underdeveloped biography articles about women in STEM and helping students find a voice in the Wiki space. The edit-a-thon has potential as a cross-curricular touchpoint and to support equity and representation work.

Seredinski, A., Litchock-Morellato, F., Lange, A. et al. Using a Wikipedia edit-a-thon as a cross-curricular STEM representation assignment. Discov Educ 4, 368 (2025). doi.org/10.1007/s442... #OpenAccess

30.09.2025 08:57 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image Post image Post image

"Demographic disparity in Wikipedia coverage: a global perspective" (top 12 languages) epjdatascience.springeropen.com/articles/1…
- Women slightly overrepresented (not underrepresented) among living article subjects since ~2015, but still have shorter articles
- Developing countries overrepresented

11.10.2025 05:29 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 1
Post image

"Investigating How LLMs Impact Participation in [Wikipedia]" (interviewing 16 editors) https://arxiv.org/abs/2509.07819v1

ChatGPT etc "enhance contribution quality for experienced editors" & "lower entry barriers for newcomers", but newbies struggle to align LLM outputs w Wikipedia policies

04.10.2025 01:14 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
The Graphic User Interface of WikiTextGraph

The Graphic User Interface of WikiTextGraph

New paper alert: WikiTextGraph – an open-source Python package for extracting the text and building multilingual Wikipedia link networks.

With: @gustavoschwartz.bsky.social , Juan Luis SuΓ‘rez

Paper: openresearchsoftware.metajnl.com/articles/10....

@wikiresearch.bsky.social #wikipedia #software

17.09.2025 13:32 β€” πŸ‘ 1    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Preview
Critical Wikimedia Research Bibliography - Meta-Wiki

With the school year approaching, a number of scholars and myself have assembled together a Critical Wikimedia Research Bibliography. If you are teaching a course or doing research, we think you might find some good resources here. meta.wikimedia.org/wiki/Critica...

27.08.2025 21:25 β€” πŸ‘ 3    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Preview
A manifesto for Wikimedia research: Critically studying Wikimedia as infrastructure

I am pleased to announce the launch of the Manifesto for Wikimedia Research manifesto.wiki. As my co-authored Big Data & Society commentary explains, the manifesto is dedicated to a humanist and critical tradition of taking Wikipedia's importance seriously. journals.sagepub.com/doi/10.1177/...

08.07.2025 13:17 β€” πŸ‘ 10    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0
Presenter (Patrick Gildersleve) in front of a screen summarising the WikiReddit Dataset project. The slide describes it as "Every Wikipedia mention and link on Reddit, 2020-2023", includes some example usage, describes the scale of the dataset, and offers suggested use cases.

Presenter (Patrick Gildersleve) in front of a screen summarising the WikiReddit Dataset project. The slide describes it as "Every Wikipedia mention and link on Reddit, 2020-2023", includes some example usage, describes the scale of the dataset, and offers suggested use cases.

Had a great time meeting everyone and seeing all the interesting work @icwsm.bsky.social. I presented our study on the Wikireddit dataset - exploring Wikipedia’s role in fact-checking, discussion, and cross-platform attention on the web. Thank you to the organisers!

πŸ“„: ojs.aaai.org/index.php/IC...

26.06.2025 10:08 β€” πŸ‘ 8    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Preview
The Challenge of Peer-Produced Websites | UW College of Arts & Sciences Communication professor Benjamin Mako Hill studies why successful peer-produced websites (like Wikipedia) eventually struggle to maintain their openness to new contributors.

UW published this really nice article about my work on governance challenges and lifecycles faced by peer-produced online communitiesβ€”the work supported by my NSF CAREER grant. Check it out if you want to know what I've been thinking about and working on!

15.06.2025 15:50 β€” πŸ‘ 28    πŸ” 8    πŸ’¬ 3    πŸ“Œ 0
Post image

DesambiguaciΓ³n en Wikipedia: exploraciΓ³n de los mecanismos de control de autoridades en la enciclopedia colaborativa por @florenciac.bsky.social y @tsaorin.bsky.social en #revistainfonomy
doi.org/10.3145/info...

#Controldeautoridades #Vocabularioscontrolados #Wikipedia

19.05.2025 10:15 β€” πŸ‘ 3    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Been a hectic semester for me but made it through 😊 a few updates

Had a blast as a GSI for @dbamman.bsky.social NLP class. Was a wonderful experience πŸ’ƒ

Won the Wikipedia Foundation Research of The Year Award for our CHI paper(doi.org/10.1145/3613...) with @schasins.bsky.social and John Canny

27.05.2025 19:22 β€” πŸ‘ 6    πŸ” 4    πŸ’¬ 3    πŸ“Œ 1

findings: (1) Wikipedia is most frequently cited by news and science websites for informational purposes, while commercial websites reference it less often. (2) The majority of Wikipedia links appear within the main content rather than in boilerplate [3/5 of https://arxiv.org/abs/2505.15837v1]

23.05.2025 06:00 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
WikiWorkshop 2025 Recap - Rhododendrites I like the internet

Whipped up a #WikiWorkshop 2025 recap blog post here: rhododendrites.com/posts/WikiWo... @wikiresearch.bsky.social Some really interesting tools, methods, and studies over the last couple days!

23.05.2025 17:34 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
Templates and sovereignty: Wikipedia’s policy development and the reflection of community consensus - Steve Jankowski, Claudio Celis Bueno, Ouejdane Sabbah, Jakko Kemper, 2025 This article examines how Wikipedians embed their sovereign authority within the development of the site’s multilingual policy environment. By drawing on the co...

Well this is good timing. @wikiworkshop.bsky.social starts today and my paper that I presented in previous years has just been published this morning. doi.org/10.1177/1461.... We describe how hatnotes on policy pages are incredibly important techniques for ascribing different forms of authority.

21.05.2025 08:14 β€” πŸ‘ 3    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Preview
Is Wikipedia a cesspool of antisemitism? Don't trust the ADL's answer. The ADL would have us believe Wikipedia is riddled by antisemitism. The reality is more complicated, writes a scholar whom the ADL has cited.

A recent ADL report claimed to find broad, systemic evidence of antisemitism on Wikipedia, prompting two dozen members of Congress to call into question the site's approach to moderating content related to Jews.

Some researchers cited by the ADL say their findings have been misconstrued.

16.05.2025 15:22 β€” πŸ‘ 7    πŸ” 3    πŸ’¬ 1    πŸ“Œ 1
A table listing the 13 research fund proposals under consideration (screenshot of https://meta.wikimedia.org/wiki/Grants_talk:Start#Alternative_overview_of_the_research_fund_proposals_under_further_consideration  )

A table listing the 13 research fund proposals under consideration (screenshot of https://meta.wikimedia.org/wiki/Grants_talk:Start#Alternative_overview_of_the_research_fund_proposals_under_further_consideration )

"Invitation to give feedback to Wikimedia Research Fund 2024-2025 proposals" until May 12 lists.wikimedia.org/hyperkitty/l...
(13 proposals under review, with budgets ranging from $8,571 to $149,976 USD)

06.05.2025 18:57 β€” πŸ‘ 7    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Image of various pills, with text: 'Readers use Wikipedia's health content "to learn more", "to improve decision-making" and "for self-advocacy".' (from https://meta.wikimedia.org/wiki/Research:Newsletter/2025/April )

Image of various pills, with text: 'Readers use Wikipedia's health content "to learn more", "to improve decision-making" and "for self-advocacy".' (from https://meta.wikimedia.org/wiki/Research:Newsletter/2025/April )

In the new edition of our monthly newsletter:
* How readers use Wikipedia health content
* Scholars are generally happy with how their papers are cited on Wikipedia
* Several other papers about references on Wikipedia
and more:
meta.wikimedia.org/wiki/Researc...

04.05.2025 16:16 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 1
Preview
Research citations building trust in Wikipedia: Results from a survey of published authors The use of Wikipedia citations in scholarly research has been the topic of much inquiry over the past decade, however little is known regarding perceived Researchers trustworthiness of Wikipedia citations and representation of their work. This cross-publisher study (Taylor & Francis and University of Michigan Press) aimed to investigate author sentiment towards Wikipedia as a source of trusted information. Methods A short survey was distributed to 40,402 authors of papers cited in Wikipedia (n=21,854 surveys sent, n=750 complete responses received). The survey gathered responses from published authors in relation to their views on Wikipedia’s trustworthiness in relation to the citations to their published works. The unique findings of the survey were analysed using a mix of quantitative and qualitative methods using Python, Google BigQuery and Looker Studio. Results Overall, authors expressed positive sentiment towards research citation in Wikipedia and researcher engagement practices (mean scores >7/10). Sub-analyses revealed significant differences in sentiment based on publication type (articles vs. books) and discipline (Humanities and Social Sciences vs. Science, Technology, and Medicine), but not access status (open vs. closed access). Conclusions This study provides unique insights into author perceptions of Wikipedia’s trustworthiness. Further research is needed to deepen the understanding of the benefits for researchers and publishers including academic citations in Wikipedia.

Research citations building trust in Wikipedia: Results from a survey of published authors | PLOS One https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0320334

20.04.2025 16:40 β€” πŸ‘ 4    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
https://w.wiki/CumQ

πŸ“’πŸ“’πŸ“’ Second for Contributions!
Join us at the 2nd WikiNLP Workshop on NLP for Wikipedia, co-located with #ACL2025 in Vienna. We welcome both in-person and virtual attendees, and have both archival and non-archival tracks!

πŸ—“οΈ deadline: April 30
Details: meta.wikimedia.org/wiki/NLP_for...

17.04.2025 09:14 β€” πŸ‘ 5    πŸ” 5    πŸ’¬ 0    πŸ“Œ 1
Preview
Wiki Workshop 2025 May 21st – 22nd, 2025

Registration is now open for #WikiWorkshop2025!

Don’t miss this opportunity to connect and collaborate with the @wikiresearch.bsky.social community.

πŸ“… May 21-22, 2025
πŸ’» Virtual
πŸ”— pretix.eu/wikimedia/wi...

27.03.2025 10:09 β€” πŸ‘ 4    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Plot showing evidence for a decrease in pageviews for LLM-similar Wikipedia articles (including a figure from Liang Lyu, James Siderius, Hannah Li, Daron Acemoglu, Daniel Huttenlocher, Asuman Ozdaglar, "Wikipedia Contributions in the Wake of ChatGPT", CC BY 4.0)

Plot showing evidence for a decrease in pageviews for LLM-similar Wikipedia articles (including a figure from Liang Lyu, James Siderius, Hannah Li, Daron Acemoglu, Daniel Huttenlocher, Asuman Ozdaglar, "Wikipedia Contributions in the Wake of ChatGPT", CC BY 4.0)

Illustration of the challenges between different stakeholders (regarding Flagged Revisions on Wikipedia)
from "Challenges in Restructuring Community-based Moderation", by Chau Tran, Kejsi Take, Kaylea Champion, Benjamin Mako Hill, Rachel Greenstadt, CC BY-SA 4.0

Illustration of the challenges between different stakeholders (regarding Flagged Revisions on Wikipedia) from "Challenges in Restructuring Community-based Moderation", by Chau Tran, Kejsi Take, Kaylea Champion, Benjamin Mako Hill, Rachel Greenstadt, CC BY-SA 4.0

In the March issue of our research newsletter:
* Flagged Revisions: Explaining the disappointing history of a community-requested software feature
* A roundup of several recent papers investigating the impact of ChatGPT on Wikipedia so far
and more: meta.wikimedia.org/wiki/Researc...

24.03.2025 16:51 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Grants:Programs/Wikimedia Research & Technology Fund/Wikimedia Research Fund - Meta

New funding opportunity for Wikimedia-related research! Grants up to $150K available. Apply by April 16th:
meta.wikimedia.org/wiki/Grants:...

16.03.2025 15:18 β€” πŸ‘ 14    πŸ” 10    πŸ’¬ 0    πŸ“Œ 0

We received 64 submissions for #WikiWorkshop2025 πŸ™Œ

A huge thank you to the @wikiresearch.bsky.social community for such an amazing engagement. Our reviewers will be diving into them in the coming weeks... stay tuned! πŸ”πŸ“–

14.03.2025 09:55 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
chart "Fraction of OA citations by publication date of citation" (figure 3 from Puyu Yang, Ahad Shoaib, Robert West, Giovanni Colavizza, "Open access improves the dissemination of science: insights from Wikipedia", https://doi.org/10.1007/s11192-024-05163-4 , BB BY 4.0)

chart "Fraction of OA citations by publication date of citation" (figure 3 from Puyu Yang, Ahad Shoaib, Robert West, Giovanni Colavizza, "Open access improves the dissemination of science: insights from Wikipedia", https://doi.org/10.1007/s11192-024-05163-4 , BB BY 4.0)

In the new issue of our monthly newsletter:
β–Έ What's known about how readers navigate Wikipedia
β–Έ Italian Wikipedia is the hardest to read
β–Έ "open access articles are extensively and increasingly more cited in Wikipedia" than those behind a paywall
meta.wikimedia.org/wiki/Researc...

02.03.2025 10:54 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 0    πŸ“Œ 1
WikiNLP workshop flyer inviting contributions for provocations, datasets, and ongoing work on NLP+Wikimedia

WikiNLP workshop flyer inviting contributions for provocations, datasets, and ongoing work on NLP+Wikimedia

The call for papers for the second edition of the #WikiNLP workshop at @aclmeeting.bsky.social is out!

We welcome contributions on #NLProc + Wikimedia, especially on datasets, and ideas to advance its mission.

More details: meta.wikimedia.org/wiki/NLP_for...

06.02.2025 17:14 β€” πŸ‘ 11    πŸ” 3    πŸ’¬ 0    πŸ“Œ 3

@wikiresearch is following 20 prominent accounts