Front Matter's Avatar

Front Matter

@index.blog.front-matter.io.ap.brid.gy

The Front Matter Blog covers the intersection of science and technology since 2007. [bridged from https://blog.front-matter.io/ on the fediverse by https://fed.brid.gy/ ]

11 Followers  |  0 Following  |  21 Posts  |  Joined: 18.03.2025  |  2.8787

Latest posts by index.blog.front-matter.io.ap.brid.gy on Bluesky

Preview
Reflections on the social web My biggest point of uncertainty about Ghost 6.0 was whether people were going to "get" the social web integration. The technology is wonderful, but complex. For many people, terms like ActivityPub, Fediverse, bridge, protocol, server, toot, boost, and Webfinger are alienating and confusing. They subtly imply that unless you understand what all these words mean, this might not be the place for you; in the same way crypto terms—blockchain, web3, wallet, keypair, nonce—are a wall of jargon that scream "you don't belong here" to normal people. The work of a product team, when working with new technology, is to abstract away as much of this complexity as possible, so that it feels friendly and approachable to new people. To send an email, you don't need to know what SMTP, IMAP, POP, DKIM, SPF, or DMARC are. To browse the web, there's no requirement to understand HTTP, DNS, servers, SSL, TTL, load balancing, or caches. The most significant impact these protocols have is perhaps that users never have to think about them. So while building the social web integration for Ghost, we weren't just reasoning about how to make it work and what it should do—we were thinking deeply about how to frame it. What words to use. What to compare it to. How to explain it. How to make it not need explaining at all. Will people "get it"? This question consumed more of my mental energy than anything else, right up until the moment we finally hit launch this past Monday. My personal nightmare would have been if the response to the launch was another chorus of "I don't understand what the point of this is"—"this is too complicated"—"what does [x] even mean?" I've seen it happen so many times before when people try to figure out this tech and how it relates to their lives. The graveyard of technically superior but user-hostile products is vast. But, I'm thrilled to see—at least so far—that hasn't been the case. To be sure, there are still points of functional confusion. Chief among them: Why doesn't post X from platform Y show up on platform Z right away? But for the most part, I've been really encouraged by how many people have just jumped right in and started using it, without getting stuck and needing more explanation. They're just... publishing. And connecting. And it's working. My strongest belief about the social web is that if we want it to succeed, we have to keep lowering the barrier to entry. We have to keep minimizing the need for arcane language. We have to keep solving the things that people expect to work, but don't, rather than endlessly explaining how the underlying technology works. We have to create more familiarity with concepts people already know. Let's not forget that email, as a technology, was based on the humble letter. To/from, subject, inbox, outbox—these were all words based on sending physical memos. The metaphor made the transition accessible. The interface and format of a new technology can often be the single biggest factor in determining its adoption. After all, for over a decade, we've had artificial intelligence capable of performing some pretty incredible tasks. The moment it really caught fire, though, was the moment it became a chatbox. Not when it got smarter. Not when it got more powerful. When it got simpler. I think we've taken a big step in the right direction with the social web in Ghost 6.0. And now we need to keep going.
07.08.2025 18:02 — 👍 24    🔁 34    💬 2    📌 1
Preview
Rogue Scholar moves beyond passwords The Rogue Scholar science blog archive has introduced authentication with passkeys and will disable local accounts on September 15. Reading Rogue Scholar content has always been free and never required user accounts or cookie permissions. This is true both for web and API usage. ### User accounts Science blogs participating in Rogue Scholar also don't require user accounts, but only a contact email, and the initial registration of the blog is via a web form. The import of new or updated blog posts into Rogue Scholar happens automatically via RSS feed or JSON API integration, and there is no ability to update blog posts archived in Rogue Scholar outside of that workflow. Rogue Scholar offers user accounts that everyone can sign up for. They are currently only intended for blog authors, and two weeks ago their functionality was expanded. Rogue Scholar blog posts are organized around communities, and there are three community types: * **blog** : all blog posts of a given blog * **topic** : all blog posts about a given topic, often used as tag or keyword * **subject area** : all blog posts in a given OECD Field of Science and Technology The blog community includes basic information about the blog, such as name, description and logo. This information is automatically extracted from the blog feed, but in some cases manual curation is desired, e.g. to include a description or logo that the feed does not provide. Blog authors with a Rogue Scholar account, and after they have received an invitation as _community manager,_ can now manage this information themselves: Going forward Rogue Scholar will also allow blog authors to provide more information, such as the blog default OECD Field of Science (currently provided in the blog registration form) and International Standard Serial Number (ISSN), currently provided via email. Topic communities aggregate blog posts from multiple blogs around a common topic, typically referred to as a _tag_ or _category_ in blogs. These _tags/categories_ have naturally evolved as folksonomy rather than a centrally defined taxonomy, and fully automatic classification of blog posts into topics is neither desired nor easily achievable. For these reasons, manual curation of topics is needed, and currently this requires user accounts. Blog authors who are community managers of their blog can submit blog posts to topic communities, or manage the communities the blog post is included. The creation of topic communities currently requires an admin account, but if people are interested in becoming Rogue Scholar community managers, creating new topic communities, or adding blog posts to topic communities where they are not the author, please reach out to me. Subject communities are automatically added to blog posts based on the blog subject area, and user accounts are not relevant here. Going forward, I want to enable automatic classification based on blog post content, and inclusion in more than one subject community. We can take advantage of the new collections feature introduced in InvenioRDM v13 earlier this month. ### Authentication Now that I have explained that user accounts are not required to use Rogue Scholar, but there are important use cases where they are needed, we can talk about how user accounts are managed in Rogue Scholar. Users can register for a user account with Rogue Scholar in one of two ways: * ORCID * Rogue Scholar passkeys Authentication via ORCID is a widely used authentication workflow in the scholarly community that is free (no ORCID organizational membership required) and built into the InvenioRDM platform. Depending on a single external authentication service is not desirable, as not all users will have an ORCID (e.g. group or admin accounts), and this would make it impossible to sign in to Rogue Scholar if the ORCID service is temporarily down. ORCID is primarily a service to provide unique identifiers to scholars, and not an identity provider (IdP) that provides single sign-on services. One option is to provide authentication via local accounts with username/password, but that is both potentially insecure without additional measures such as multi-factor authentication (MFA) and inconvenient without a password manager. The InvenioRDM platform that Rogue Scholar uses has built-in local account management, but not built-in multi-factor authentication or other more advanced functionalities. Many InvenioRDM instances use an external authentication service that integrates with InvenioRDM. ORCID authentication is implemented via the OpenID Connect (OIDC) authentication protocol, and OIDC can be used with many other third-party logins, including GitHub, which is built into InvenioRDM. For Rogue Scholar I wanted to implement a local OIDC provider that I can configure and control. The most popular authentication service for InvenioRDM for this use case is probably Keycloak. Keycloak is a powerful and well-tested open-source solution for identity management, but it is almost too complex for smaller instances such as Rogue Scholar. For these reasons, Rogue Scholar went with Pocket ID, > A simple and easy-to-use OIDC provider that allows users to authenticate with their passkeys to your services. Pocket ID depends on passkeys, a secure and user-friendly authentication method that is (slowly) becoming a new authentication standard. Pocket ID authentication for Rogue Scholar was launched three weeks ago, and the service is hosted at https://auth.rogue-scholar.org. The setup was straightforward, and Pocket ID supports some advanced features currently not needed by Rogue Scholar, e.g. restricting user groups, LDAP integration, or a REST API. With the launch of Pocket ID authentication, registration for new local accounts has been disabled, and login via username/password will be disabled on September 15. Users with existing local accounts can link their accounts to ORCID and/or passkeys until that date. Please use Slack, email, Mastodon, or Bluesky if you have any questions or comments regarding Rogue Scholar authentication. ## References 1. Fenner, M. (2025, July 30). Rogue Scholar Updates: Full-text search as default and basic blog self-management. _Front Matter_. https://doi.org/10.53731/jeatk-t8t07 2. Fenner, M. (2025, July 23). Rogue Scholar relaunches today. _Front Matter_. https://doi.org/10.53731/8dg89-gnc10
12.08.2025 08:01 — 👍 0    🔁 0    💬 0    📌 0
Preview
Rogue Scholar citation tracking launches to production The Rogue Scholar science blog archive uses DOIs to uniquely identify blog posts with meaningful metadata. This enables tracking citations of scholar blog posts in the scholarly literature using traditional citation tracking methods rather than altmetrics. Initially launched as a Rogue Scholar service six months ago, citation tracking has launched to production this week. The following features are now available: * Automatically fetch citations of Rogue Scholar posts from the Crossref Cited-by service once a week, * Automatically add these citations to Rogue Scholar posts as custom metadata (in the same format as references), * Make these citation metadata available via the Rogue Scholar API, * Added a Citations search facet to filter Rogue Scholar posts, * Show the number of citations in the Rogue Scholar dashboard. Citations in the Rogue Scholar APICitations shown in a Rogue Scholar recordCitations Search FacetCitations in the Rogue Scholar Dashboard The Altmetrics manifesto stated in 2010 that > Altmetrics are fast, using public APIs to gather data in days or weeks. They’re open – not just the data, but the scripts and algorithms that collect and interpret it. In the past 15 years we have seen that the tools and services around altmetrics have taken a different path: altmetrics data from sources like Twitter, Facebook or Mendeley never really became open, and neither did the services aggregating and showing them, or the algorithms aggregating and interpreting them. Efforts like PLOS Article-Level Metrics (where I was the technical lead from 2012 to 2015) or Crossref and DataCite Event Data tried hard but ultimately failed because the data sources were not open, and social media feeds were increasingly driven by algorithms. This social media platform decay was coined _enshittification_ by Cory Doctorow in November 2022. Rogue Scholar is trying a different approach. Instead of altmetrics as a filter to make sense of the scholarly literature, it helps scholarly blogs to become part of the scholarly literature. Rogue Scholar applies the concepts of persistent identifiers (DOI, ORCID, ROR), standardized metadata (Crossref, DataCite), Open Data and Open Access (CC-BY licenses for all content, OSI-approved licenses for all software), and even experiments with peer review. And it now adds citation tracking. Blog posts are rarely cited in the scholarly literature, but that is more about current citation practices than the ability to do so. Crossref is smart enough to figure out that links to blog post URLs in reference lists are conceptually the same as citing a DOI, and that is why Rogue Scholar can find citations to Rogue Scholar blog posts published long before these blogs joined Rogue Scholar. Open Access News was published 2003-2010 and joined Rogue Scholar in 2025, and Crossref finds 22 citations in the scholarly literature to its blog posts. Blog posts are typically published much faster than scholarly articles or even preprints. That is why the post on the Crossref blog announcing that the blog started assigning DOIs to all its posts published in June already has two citations two weeks later, both from Rogue Scholar blogs. Citation tracking of course has limitations. I am not talking using citation counts as a proxy for scientific impact, but rather about technical limitations of automated citation tracking. The main challenge is that only citations that show up in reference lists with an identifier (DOI or URL) and where the reference list is openly made available to Crossref (I4OC) are found. Another challenge is that automated workflows make mistakes – for this reason, Rogue Scholar can exclude falsely linked citations, and has occasionally done so. Please reach out if you find a citation in Rogue Scholar that is not found in the reference list of the citing scholarly work. Rogue Scholar registers Crossref DOIs for blog posts but also includes blogs that register their DOIs using DataCite. Less than 10 blogs of the currently 168 Rogue Scholar blogs do so, and I currently have no easy way to collect citations for these blog posts. DataCite has a similar service to Crossref Cited-by, but the citing scholarly literature is currently mostly in Crossref. Third-party services such as OpenAlex also show the citations of the Crossref blog post mentioned above, but OpenAlex currently has only limited support for scholarly blog posts registered by DataCite. I hope that the new citation tracking service encourages Rogue Scholar bloggers to publish more blog posts with references (currently only 4.98 % of posts), as references and citations are closely connected (how to include references in blog posts is documented here). And I hope that readers of Rogue Scholar blog posts discover interesting scholarly works through the citation tracking service. 460 of the 1370 Rogue Scholar citations are also blog posts, but the most popular citing content type is journal article: * journal article 648 * blog post 460 * preprint 117 * book chapter 117 * proceedings article 12 * book 7 * other 9 Please use Slack, email, Mastodon, or Bluesky if you have any questions or comments regarding the Rogue Scholar citation tracking service. ## References 1. Priem, J., Taraborelli, D., Groth, P., & Neylon, C. (2010). _Altmetrics: A manifesto_. https://doi.org/10.5281/zenodo.12684249 2. Fenner, M. (2025, February 3). Rogue Scholar now shows citations of science blog posts. _Front Matter_. https://doi.org/10.53731/4bvt3-hmd07 3. Fenner, M. (2013). What Can Article-Level Metrics Do for You? _PLoS Biology_ , _11_(10), e1001687. https://doi.org/10.1371/journal.pbio.1001687 4. Doctorow, C. (2022, November 17). Social Quitting. _Medium_. https://doctorow.medium.com/social-quitting-1ce85b67b456 5. Marcum, C. S. (2025, April 8). Peer-Review for a Blog Post? My Experience with MetaROR. _Upstream_. https://doi.org/10.54900/bymaz-4fw37 6. Stoll, L., Vale, P., & Clark, R. M. (2025, June 24). Scholarly blogs and their place in the research nexus. _Crossref Blog_. https://doi.org/10.64000/552ec-b8g03 7. Shotton, D. M. (2017, April 6). The Initiative for Open Citations. _OpenCitations Blog_. https://doi.org/10.59350/jdwj8-at997
04.08.2025 11:13 — 👍 1    🔁 5    💬 0    📌 1
Preview
Rogue Scholar Newsletter July 2025 This is the July issue of the monthly newsletter from the Rogue Scholar science blog archive. The newsletter reports on new blogs that have joined the platform, important technical updates in Rogue Scholar infrastructure, community updates, and other news relevant to Rogue Scholar users. ## Blogs added to Rogue Scholar Thirteen blogs have been added in July, making it one of the busiest months yet for Rogue Scholar. Welcome everybody! This brings the number of participating blogs to 168, and the number of archived posts to 46,178. ### Existential Crunch Thoughts about existential risk, history, climate, food security and societal collapse. _Social and economic geography, English._ https://existentialcrunch.substack.com ### The Bibliomagician Comment & practical guidance from the LIS-Bibliometrics community. _Computer and information sciences, English_ https://thebibliomagician.wordpress.com/ ### Open Access Network _Other social sciences, German._ https://open-access.network/ ### geocompx geocompx hosts free resources on reproducible geographic data analysis, modelling and visualization with open source software. _Earth and related environmental sciences, English._ https://geocompx.org/ ### Public Knowledge Project _Social science, English._ https://pkp.sfu.ca/news/ ### Imperfect notes on an imperfect world Japan-based scholar Christopher Hobson reflects on how we can live and act in conditions that are constantly changing and challenging us. Pursuing open thinking. _Philosophy, ethics and religion, English._ https://imperfectnotes.substack.com/ ### Oxford iHealth Fostering innovation, research, and education in the field of computational sciences for health. _Health sciences, English._ https://oxford-ihtm.io/blog ### WiNoDa Knowledge Lab Journal en – WiNoDa Knowledge Lab Wissenslabor für naturwissenschaftliche Sammlungen und objektzentrierte Daten. _Other natural sciences, German._ https://winoda.de/ ### Væl Space _Social science, English._ https://jofrhwld.github.io/blog/ ### Dr. Joaquin Barroso's Blog Scientific log of a computational chemist - "Make like a molecule and React!" _Chemical sciences, English._ https://joaquinbarroso.com/ ### Adapt Research Ltd Health, technology, and global catastrophic risk. _Other social sciences, English._ https://adaptresearchwriting.com/ ### The 20% Statistician A blog on statistics, methods, philosophy of science, and open science. Understanding 20% of statistics will improve 80% of your inferences. _Psychology, English._ https://daniellakens.blogspot.com/ ### SciComp Blog _Natural sciences, English._ https://mpievolbio-scicomp.pages.gwdg.de/blog/ As always, the blogs cover a variety of disciplines, use a diverse set of blogging platforms, and not all write in English. The Rogue Scholar dashboard has the breakdown of the numbers, including the number of posts (1285) published in 2025 so far.[ ](https://rogue-scholar.org/communities/sigcse) ### ## Technical Updates In July, Rogue Scholar saw a major update in both software and hardware, upgrading to the next major release (v13.0) of the InvenioRDM repository software, and migrating to dedicated server hardware. Also part of this update was the launch of a new authentication option, login via passkeys, using a self-hosted Pocket ID service. This week Rogue Scholar made full-text search the default search option and added basic blog self-management functionality. ### Community Update Earlier this week Peter Suber announced that Rogue Scholar has archived Open Access News: > I'm very happy to announce that he's now captured my old blog, 𝙊𝙥𝙚𝙣 𝘼𝙘𝙘𝙚𝙨𝙨 𝙉𝙚𝙬𝙨 -- more than 16.4k posts, 2002-2010. > https://rogue-scholar.org/communities/oan You probably noticed that the Public Knowledge Project (PKP) blog was also added this month. PKP is behind the most popular open source publishing journal publishing platform Open Journal Systems (OJS) and they join a growing list of open scholarly infrastructure organizations (including OpenCitations, rOpenSci, Journal of Open Source Software, Research Software Alliance, Liberate Science, Research Graph, DataCite, ROR, Make Data Count, Crossref) who have joined Rogue Scholar. Open Infrastructure is one of the Rogue Scholar topic communities that aggregate blog posts by topic. Please use Slack, email, Mastodon, or Bluesky if you have any questions or comments regarding this monthly newsletter. Rogue Scholar is a scholarly infrastructure that is free for all authors and readers. You can support Rogue Scholar with a one-time or recurring donation or by becoming a sponsor. ## References 1. Fenner, M. (2025, July 23). Rogue Scholar relaunches today. _Front Matter_. https://doi.org/10.53731/8dg89-gnc10 2. Fenner, M. (2025, July 30). Rogue Scholar Updates: Full-text search as default and basic blog self-management. _Front Matter_. https://doi.org/10.53731/jeatk-t8t07
31.07.2025 20:08 — 👍 0    🔁 1    💬 0    📌 0
Preview
Rogue Scholar Updates: full-text search as default and basic blog self-management This week the Rogue Scholar science blog archive as received to important updates: full-text search becomes the default search configuration, and blog authors can now self-manage basic settings of their Rogue Scholar blog community. ### Full-text search as default Rogue Scholar has long supported full-text search of all its content. With this update, users no longer have to specify that they want to search in the full-text content using the `content:` prefix. All queries now automatically search the full-text, e.g. this query for the term xanadu: https://rogue-scholar.org/search?q=xanadu. You still can specify `content:` if you want to search only in the full-text, similar to how you can specify other fields to search in. And you can search either all of Rogue Scholar or a specific community. See the Rogue Scholar search guide for details. ### Basic blog self-management With the latest update the basic settings of Rogue Scholar blog communities can be managed by blog authors. They need a Rogue Scholar account and have to accept an invitation (send an email) as _manager_ of the blog community. The basic blog settings (name, short description, website, and profile picture) are automatically extracted from the blog RSS feed. Still, they can be overridden in the blog community form, for example to add a profile picture if that is not included in the feed. Registration of new blogs still requires filling out a separate form (found here), and the automatic blog post extraction can't be configured in the community form. Blog community managers also can't update their posts in Rogue Scholar, but they can submit their blog posts to one or more topic communities (such as R or Book Review) using the Communities sidebar settings. Please use Slack, email, Mastodon, or Bluesky if you have any questions or comments regarding these updates.
30.07.2025 16:24 — 👍 1    🔁 1    💬 0    📌 0
Preview
Rogue Scholar relaunches today The science blog archive Rogue Scholar relaunched today with a number of exciting new features, including major new software version, new hardware, new look and feed, and new authentication. ### Major new software version Version v13.0 of the InvenioRDM open source repository platform was released today. After running release candidate versions for a few weeks, Rogue Scholar today was relauched with version v13.0. There are numerous changes in this new version described in detail in the release notes, and this will facilitate additional new features and bug fixes going forward. Many of the major changes of v13.0 happened in the backend and are only visible to administrators, including an improved administration panel or audit logs. Other improvements, e.g. subcommunities and collections, have to be enabled and will happen in the next few months. ### New hardware Rogue Scholar is running on new hardware, which makes the service faster, easier to update, and cheaper to run. Using the Kamal deployment tool, Rogue Scholar now runs on dedicated hardware rented from Hetzner and located in Germany instead of via the cloud provider Fly.io. ### New look and few With this relaunch I fixed several long-standing issues with Rogue Scholar blog post list views: * Show the DOI to allow users to jump directly to the blog post instead of needing to go to the Rogue Scholar archived version first, * Show the language (15% of Rogue Scholar posts are in languages other than English), you can also filter search results by language, * Show optional feature images, as is common for RSS feed readers, * Support (a subset of) HTML in titles, e.g. superscript, bold or italic. ### New authentication No authentication is required to read Rogue Scholar content, and blog posts are automatically imported from participating blogs. Rogue Scholar user accounts currently have limited functionality, mainly allowing blog authors to submit blog posts to one or more topic communities, such as R (the programming language), book review, or interviews. Going forward, I will work with the Rogue Scholar community to improve what you can do with user accounts, including registering a new blog with the platform – currently still requiring an external form. For this functionality, Rogue Scholar user accounts have to be easy to manage and secure. The built-in functionality of the InvenioRDM platform for local accounts allows users to self-manage their accounts and reset their passwords. And it allows administrators to block accounts that misbehave. Rogue Scholar has for a while supported login via ORCID accounts. Today I have launched another authentication option, login via passkeys, using a self-hosted Pocket ID service. Passkeys are both easier to use and safer than usernames/passwords, and after a transition period to allow users to link their existing local accounts to ORCID and/or passkeys, Rogue Scholar will disable local accounts on September 15. Please use Slack, email, Mastodon, or Bluesky if you have any questions or comments regarding this major update. ## References 1. Fenner, M. (2025, July 7). Upgrading to InvenioRDM v13. _Front Matter_. https://doi.org/10.53731/dd5h7-z5y55 2. Fenner, M. (2025, June 27). Kamal deploys InvenioRDM Starter to production. _Front Matter_. https://doi.org/10.53731/m7gng-jmm19
23.07.2025 15:48 — 👍 0    🔁 4    💬 0    📌 0
Preview
Upgrading to InvenioRDM v13 Ten days ago, I reported on a new deployment strategy for the InvenioRDM repository software. Using the Kamal deployment tool, I deployed both a staging instance of the Rogue Scholar service and a demo instance of the InvenioRDM Starter package. Over the last few days I have updated both instances to the latest release candidate (v13.0.0rc3) of the next major InvenioRDM software version. The upgrade was fairly painless. It took me some time to update the customizations that I had made. The major issue working with Kamal that I reported before – specifying the hostname of the instance for security reasons (`APP_ALLOWED_HOSTS)` is still there, and my workaround still works (but `ALLOWED_HOSTS` was renamed to `TRUSTED_HOSTS` in v13). I ran into one major issue with the v13 upgrade: the recommended configuration for using externally registered DOIs no longer worked. It took me two patches in two different Invenio packages to fix this, and additional work is probably needed. In addition to upgrading to the v13 release candidate, I made one big change in my InvenioRDM configuration: I stopped using S3 object storage and configured the local file storage. Neither the InvenioRDM Starter demo nor Rogue Scholar currently need to store large files (mostly community logos), and local storage seems to be better aligned with the Kamal deployment philosophy of deploying everything on one or more virtual machines running Docker. For InvenioRDM Starter I have shifted from using demo data to importing existing metadata records from Crossref, DataCite, or other InvenioRDM instances, using the commonmeta library. For the current demo instance I decided to import about 1000 records of thesis metadata from both Crossref and DataCite, filtering by records that contain ROR metadata. This approach worked well, helped by first importing the complete ROR vocabulary in InvenioRDM YAML format for funders and affiliations. More work is needed for some new functionalities in v13, but I am confident that I can deploy Rogue Scholar and the InvenioRDM Starter demo instance to production as soon as InvenioRDM v13 is officially released. The ease of deploying updates and the initial performance looks really promising, and I look forward to working with Kamal. It is a promising platform for those smaller InvenioRDM instances that don't need Kubernetes. Please reach out to me if you have questions regarding Kamal and InvenioRDM, or if you want to deploy an InvenioRDM v13 instance with Kamal. ## References 1. Fenner, M. (2025, June 27). Kamal deploys InvenioRDM Starter to production. _Front Matter_. https://doi.org/10.53731/m7gng-jmm19 2. Fenner, M. (2025, April 21). Working with the Research Organization Registry (ROR) Data Dump. _Front Matter_. https://doi.org/10.53731/f0g5b-68326
07.07.2025 15:53 — 👍 0    🔁 0    💬 0    📌 0
Preview
Rogue Scholar Newsletter June 2026 This is the June issue of the monthly newsletter from the Rogue Scholar science blog archive. The newsletter reports on new blogs that have joined the platform, important technical updates in Rogue Scholar infrastructure, community updates, and other news relevant to Rogue Scholar users. ## Blogs added to Rogue Scholar Two blogs were added in June. Welcome everybody! This brings the number of participating blogs to 150, a big milestone for Rogue Scholar. ### carrier-bag.net _Arts, English._ https://carrier-bag.net/ ### ACM SIGCSE Journal Club Better teaching and learning, one paper at a time... _Computer and information sciences, English_ https://sigcse.cs.manchester.ac.uk/[ ](https://rogue-scholar.org/communities/sigcse) ### ## Technical Updates Starting June 2nd, Rogue Scholar experienced major issues with its search index that were finally resolved on June 18. The underlying cause was a failed update of the InvenioRDM software, as reported here. No data were lost, but some Rogue Scholar functionality, e.g. listing all posts by a given blog, was temporarily unavailable. One consequence was to set up additional Rogue Scholar infrastructure for more extensive testing of major new software versions. The new Rogue Scholar staging server is available at staging.rogue-scholar.org, and the underlying technology (the Kamal tool) is described here. Kamal makes deploying Rogue Scholar simpler and cheaper, and will be used for the migration to the next major InvenioRDM release (v13 v13.0.0rc3 was published today) over the coming weeks. ### Community Update In collaboration with the Infra Wiss Blogs project, Rogue Scholar started a webinar series on best practices for science blogs. The first webinar (in German) on June 11 focused on WordPress, with participation by the DINI and CSTOnline blogs. The webinar is summarized here, and the presentation slides have also been made available. On June 24 the Crossref Blog published a blog post about the roles Crossref sees for blogs in the scholarly ecosystem, and the work that Crossref has done with Rogue Scholar to assign DOIs to posts in the Crossref blog, and to archive the content with Rogue Scholar and the Internet Archive. Please use Slack, email, Mastodon, or Bluesky if you have any questions or comments regarding this monthly newsletter. Rogue Scholar is a scholarly infrastructure that is free for all authors and readers. You can support Rogue Scholar with a one-time or recurring donation or by becoming a sponsor. ## References Fenner, M. (2025, June 6). Rogue Scholar upgrade pains. _Front Matter_. https://doi.org/10.53731/9nam9-w9k29 Fenner, M. (2025, June 27). Kamal deploys InvenioRDM Starter to production. _Front Matter_. https://doi.org/10.53731/m7gng-jmm19 Höfting, J., Ochsner, C., & Pampel, H. (2025, May 17). Zusammenfassung: Infra Wiss Blogs Webinar zu Rogue Scholar. _Infra Wiss Blogs_. https://doi.org/10.59350/h7rh7-jb575 Stoll, L., Vale, P., & Clark, R. M. (2025, June 24). Scholarly blogs and their place in the research nexus. _Crossref Blog_. https://doi.org/10.64000/552ec-b8g03
02.07.2025 16:16 — 👍 0    🔁 2    💬 0    📌 0
Preview
The longformers Ghost<>WordPress<>Flipboard<>Fediverse
01.07.2025 14:44 — 👍 9    🔁 7    💬 1    📌 0
Preview
Kamal deploys InvenioRDM Starter to production InvenioRDM is the open source turn-key research data management platform, with detailed documentation available here. InvenioRDM Starter facilitates deployment and configuration of InvenioRDM, allowing you to run InvenioRDM on your local computer within 15 min. This is achieved by providing a) a prebuilt Invenio-App-RDM Docker image, and b) a Docker Compose configuration file with sensible defaults. Starting this week, InvenioRDM starter can also be used to deploy InvenioRDM to production, using the Kamal tool. Kamal is similar to Docker Compose, but adds important functionality, including automatic remote builds, zero-downtime deployments, and deployments to multiple servers. Kamal is a command-line utility with a YAML configuration file, and much simpler to use than Kubernetes or commercial Docker container orchestration services such as Amazon Elastic Container Service (Amazon ECS). Kamal can deploy InvenioRDM to your hardware or to a virtual machine provided by your organization or a cloud provider. Whereas Kubernetes is a good option for large InvenioRDM installations, smaller InvenioRDM instances benefit from simpler deployment tools both in terms of cost and required maintenance. The science blog archive Rogue Scholar managed by Front Matter is a good example of an InvenioRDM repository that can benefit from simpler deployment options. As the next major release of InvenioRDM (v13.0) will happen in the next few weeks, Rogue Scholar needs to prepare for the upgrade, and I have this week launched a Rogue Scholar staging instance at https://staging.rogue-scholar.org using Kamal and a virtual machine provisioned by Hetzner and located in Germany. The setup was mostly straightforward, except for the integration with the Kamal proxy server, which turned out to be very painful. In the end I had to set the InvenioRDM`APP_ALLOWED_HOSTS` ENV variable to `None` and patch the REST API cross site request forgery (CRSF) check to not check the request host. This needs more discussion but appears safe, as all requests must go through the Kamal proxy, where the host header is already checked. More work is needed on the staging server, including regular automatic backups of the database, and setting up monitoring (logs and metrics). The instance is running the latest stable release (v12.1.0), but I will soon be able to install the latest v13 release candidate – v13.0.0rc2 was released three days ago. Kamal was released in 2023 by 37signals, the company behind the Basecamp and Hey services, and one of the major contributors to the Rails platform. Kamal is installed as a Ruby gem, but is not specific to Rails or Ruby. Kamal can be seen as the successor to the Capistrano deployment tool, also originally written by 37signals, but Kamal is working with Docker containers. When I was the technical lead of the Article Level Metrics project at the publisher PLOS 2012-2015 (at the time Docker was not yet adopted for production deployments), I made heavy use of Capistrano. InvenioRDM Starter now includes a Kamal configuration option, and I deployed an InvenioRDM instance to https://demo.front-matter.io using Kamal. Feel free to play around, but only admin accounts can create records – use the official InvenioRDM demo instance (also linked in the footer) if you want to create and/or update records. I will spend the next few weeks refining the Kamal setup and documentation, so that InvenioRDM Starter is ready for Kamal deployments when InvenioRDM v13.0 is officially released. ## References Fenner, M. (2024, June 17). Announcing InvenioRDM Starter Beta. _Front Matter_. https://doi.org/10.53731/jxecm-0me48 Fenner, M. (2015, July 29). Thank you PLOS. _Front Matter_. https://doi.org/10.53731/r294649-6f79289-8cvzn
27.06.2025 15:51 — 👍 0    🔁 2    💬 0    📌 0
Preview
Rogue Scholar upgrade pains This week the Rogue Scholar science blog archive experienced major upgrade pains, and Rogue Scholar search became unavailable from Tuesday until Thursday. I tried to upgrade to a pre-release version (13.0.0b4.dev0) of the InvenioRDM repository software, and ran into multiple issues. Going back to the previously installed v12.1.0 took longer than anticipated, mainly because of issues with the Opensearch index. This morning Rogue Scholar is almost working normally again, except for the blog communities, which will take until Monday to be fixed. The primary reason for upgrading the InvenioRDM software was so that I could integrate Crossref DOI registration, based on work I completed last week. The experience with installing a pre-release version of InvenioRDM told me a few things: * install InvenioRDM to production only after extensive testing (as I did last September/October), * service stability is more important than new features, and I am adjusting my deployment strategy and tooling, * observability is critical when running infrastructure, and this can be improved for Rogue Scholar. ### Postpone upgrading to InvenioRDM v13.0 Integrating Crossref DOI registration into InvenioRDM requires a current development version of InvenioRDM, which is currently v13.x, ahead of the last released version v12.1​. As InvenioRDM v13.0 will be released in a few weeks, I will wait with that work until v13.0 is released and Rogue Scholar is updated to that version. There is additional DOI registration work needed, as Rogue Scholar not only registers DOIs with Crossref, but uses multiple DOI prefixes (not yet supported in InvenioRDM) and also accepts blog posts with DOIs registered externally with DataCite. ### Upgrade Rogue Scholar Infrastructure After growing to more than 40,000 blog posts in recent months, the Rogue Scholar infrastructure, particularly the Opensearch search index, needs a hardware upgrade. I will take this opportunity to also change my deployment strategy and tooling, and will start to use Kamal to deploy the InvenioRDM software to a dedicated server (provided by Hetzner). ### Improve Observability Rogue Scholar uses metrics and logging provided by Prometheus/Grafana and error reporting provided by Sentry. More work is needed to improve this observability to better handle incidents such as this week's upgrade issues. ## References Fenner, M. (2025, May 27). Major update on Commonmeta Crossref DOI registration. _Front Matter_. https://doi.org/10.53731/69k7z-w7030
06.06.2025 10:46 — 👍 0    🔁 1    💬 0    📌 0
Preview
Rogue Scholar Newsletter May 2025 This is the May issue of the monthly newsletter from the Rogue Scholar science blog archive. The newsletter reports on new blogs that have joined the platform, important technical updates in Rogue Scholar infrastructure, community updates, and other news relevant to Rogue Scholar users. ## Blogs added to Rogue Scholar No new blogs were added in May, but the processing of two blog submissions is being worked on. ## Technical Updates On May 6, the Rogue Scholar registration workflow received a major update in how personal names are handled to better accommodate edge cases such as multiple family names. Starting on May 15, new DOI registrations for all WordPress blogs use a new scheme for the DOI suffix, using the Rogue Scholar _blog identifier_ and WordPress _post_id_ , e.g. https://doi.org/10.59350/rzepa.28773 for the latest post on Henry Rzepa's Blog. This allows WordPress blog authors to know the DOI for their blog posts before publication. On May 27, Rogue Scholar switched the DOI registration workflow to the commonmeta-py Python library. This was an important step towards integrating Crossref DOI registration directly into the InvenioRDM repository platform. Since that switch, several smaller issues have been fixed, and in June, the work on Crossref DOI registration in InvenioRDM can begin. ### Community Update On May 15, I published the Rogue Scholar authorship guidelines to clarify the rights and responsibilities of Rogue Scholar blog post authors, following the same basic guidelines that apply to other scholarly outputs. One particular focus was on the limitations and transparency in reporting when using the help of Artificial Intelligence in writing scholarly blog posts. Please use Slack, email, Mastodon, or Bluesky if you have any questions or comments regarding this monthly newsletter. Rogue Scholar is a scholarly infrastructure that is free for all authors and readers. You can support Rogue Scholar with a one-time or recurring donation or by becoming a sponsor. ## References 1. Fenner, M. (2025, May 6). Personal names in science blogs. _Front Matter_. https://doi.org/10.53731/r5fw0-tdd11 2. Fenner, M. (2025, May 27). Major update on Commonmeta Crossref DOI registration. _Front Matter_. https://doi.org/10.53731/69k7z-w7030 3. Fenner, M. (2025). _Commonmeta-py_ (Version 0.113) Computer software]. Zenodo. [https://doi.org/10.5281/ZENODO.15524711 4. Fenner, M. (2025, May 15). Rogue Scholar Authorship Guidelines. _Front Matter_. https://doi.org/10.53731/fnv8b-qfy78 5. McNutt, M. K., Bradford, M., Drazen, J. M., Hanson, B., Howard, B., Jamieson, K. H., Kiermer, V., Marcus, E., Pope, B. K., Schekman, R., Swaminathan, S., Stang, P. J., & Verma, I. M. (2018). Transparency in authors’ contributions and responsibilities to promote integrity in scientific publication. _Proceedings of the National Academy of Sciences_ , _115_(11), 2557–2560. https://doi.org/10.1073/pnas.1715374115
02.06.2025 12:17 — 👍 1    🔁 2    💬 0    📌 0
Preview
Major update on Commonmeta Crossref DOI registration Today I released a new version of the commonmeta-py Python library with major improvements in Crossref DOI registration, including refactoring to use the Python marshmallow library, XML schema validation, and API calls to Crossref and InvenioRDM instances via the commonmeta-py command-line interface. ### Using the marshmallow library Marshmallow is a popular Python library for converting complex objects to and from simple Python datatypes. The InvenioRDM repository software heavily uses marshmallow to convert metadata from and to JSON. Marshmallow is not specific to JSON, and writing Crossref metadata in XML requires an additional serialization step. commonmeta-py uses xmltodict to convert XML to Python data structures, and now also uses xmltodict for writing XML. This replaces lxml and the ElementTree API for XML writing. This worked well but didn't integrate with the rest of commonmeta-py, as the Crossref XML writer is the only place where commonmeta-py currently writes XML. More importantly, this change will make integrating commonmeta-py into InvenioRDM easier for Crossref DOI registration. ### XML schema validation Crossref metadata are fairly complex and have different requirements depending on content type, e.g. International Standard Serial Numbers (ISSN) are only supported for some content types, or the order of metadata elements might be different. For this reason, XML schema validation before submission is critical, and commonmeta-py now supports this, using the recently released schema 5.4.0. A large part of the work for this update was generating and validating XML for the various Crossref content types. I could not cover all use cases, so feedback is appreciated, e.g., by sending me DOIs registered with Crossref but not validating in commonmeta-py. Commonmeta (both the Python and Go versions) relies heavily on JSON schema validation, which I greatly prefer over XML Schema Definition (XSD) validation. But until Crossref allows content registration via JSON metadata (similarly to the change DataCite made a few years ago), XML schema validation remains important. The commonmeta Go library does not yet use XML schema validation. ### API calls via the CLI The Rogue Scholar science blogging archive switched to the InvenioRDM repository platform in October 2024 and uses the commonmeta Go library and GitHub Actions for Crossref DOI registration. GitHub Actions are wonderful, but for more complex workflows it is easier to have the logic built into the application running in the GitHub Action. Since May 2024 that was the commonmeta Go library, and commonmeta-py now has similar functionality, including calling the Crossref and InvenioRDM APIs directly. Starting today, the GitHub Actions for Rogue Scholar DOI registrations and updates use commonmeta-py instead of the commonmeta Go library. The next two weeks I will carefully monitor them for any issues that might have escaped testing. The next major milestone is integrating Crossref DOI registration directly into InvenioRDM. This will not only simplify the workflows for Rogue Scholar, but makes InvenioRDM a more interesting option for repositories with original textual content, e.g. preprints, reports or dissertations. ## References Fenner, M. (2025). _Commonmeta-py_ (Version 0.113) Computer software]. Zenodo. [https://doi.org/10.5281/ZENODO.15524711 Feeney, P. (2025, March 19). Version 5.4.0 metadata schema update now available. _Crossref Blog_. https://doi.org/10.13003/325070
27.05.2025 08:35 — 👍 0    🔁 1    💬 0    📌 0
Preview
Commonmeta understands OpenAlex Last week I released updated Python, and Go versions of the commonmeta library that can now read metadata from OpenAlex. OpenAlex is an open index of over 250 million scholarly works from 250k sources. OpenAlex uses its own identifier for works, people, organizations, sources, and concepts, but also understands common identifiers for works (e.g. DOI or PMID), people (e.g. ORCID), or organizations, including funders (e.g. ROR). Commonmeta can now fetch metadata from the OpenAlex API and convert them into the commonmeta or any other supported format. An example command-line call would look like this: commonmeta convert https://pubmed.ncbi.nlm.nih.gov/17160063 --from openalex Or you could fetch a random sample of 100 preprints: commonmeta list --sample --type preprint -n 100 --from openalex OpenAlex is an impressive service for the scholarly community, launched three years ago when the Microsoft Academic Graph database stopped being updated. I particularly like the following features: * coverage of a large number of text publications, including content registered via Crossref and DataCite, * links to legal copies of full-text versions of publications, * enrichment of metadata with persistent identifiers, e.g. affiliation information, * rich automated subject area classification into 4500 topics. When working on integrating OpenAlex into commonmeta, I noticed some areas where the service (still only three years old) could be improved upon: * personal names are not treated as a combination of given and family names. This can cause problems in cases of unusual names and formatted citations, which typically split personal names into given and family names, * Metadata enrichment should not be done with personal names, as this is very difficult and may have privacy implications. My OpenAlex profile – which covers publications over 30 years in different research areas (mainly basic and clinical cancer research and scholarly infrastructure) – contains most of my publications but also publications not written by me, including several papers published before I finished high school in 1983, * license information uses a simple schema that aligns with Creative Commons licenses, but for example doesn't consider different versions (e.g. CC-BY 3.0 vs. CC-BY 4.0). Commonmeta supports the SPDX license list that includes all Creative Commons license versions but also many software licenses. The initial OpenAlex support in commonmeta is the result of a wonderful pull request for the Python version. I mainly added test coverage and added the same functionality to the Go version. Please provide feedback via email, Slack, or GitHub if you discover bugs or missing functionality of the OpenAlex support in commonmeta. ## References Fenner, M. (2025). _Commonmeta-py_ (Version 0.107) Computer software]. Zenodo. [https://doi.org/10.5281/ZENODO.15465786 Martin Fenner. (2025). _front-matter/commonmeta: V0.25.0_ (Version v0.25.0) Computer software]. Zenodo. [https://doi.org/10.5281/ZENODO.15461402
19.05.2025 17:06 — 👍 0    🔁 2    💬 0    📌 0
Preview
Rogue Scholar Authorship Guidelines Rogue Scholar archives the content of currently more than 150 science blogs with more than 40,000 blog posts. In this blog post, I want to clarify the guidelines that Rogue Scholar tries to follow regarding authorship. Rogue Scholar blog posts are scholarly content and thus follow the same basic guidelines as other scholarly outputs, such as journal articles, preprints, or book chapters. ### Authorship All authors are expected to have made substantial contributions to the submitted work and to be accountable for the work both before and after publication. Those who contributed to the work but do not meet the criteria for authorship can be mentioned in the Acknowledgments. ### Artificial Intelligence (AI) AI tools cannot meet the requirements for authorship, as explained by the Committee on Publication Ethics (COPE): > AI tools cannot meet the requirements for authorship as they cannot take responsibility for the submitted work. As non-legal entities, they cannot assert the presence or absence of conflicts of interest nor manage copyright and license agreements. And COPE recommends that: > Authors who use AI tools in the writing of a manuscript, production of images or graphical elements of the paper, or in the collection and analysis of data, must be transparent in disclosing in the Materials and Methods (or similar section) of the paper how the AI tool was used and which tool was used. Rogue Scholar blogger Mark Dingemanse recently made a strong case for why synthetic text is incompatible with science blogging. ### Contributor Roles For blog posts with multiple authors, Rogue Scholar plans to add support for the Contributor Role Taxonomy (CRediT). And for blog posts handled by an editor or undergoing peer review, Rogue Scholar also wants to add those roles. ### Possible Actions Rogue Scholar is an archive of science blog posts, the content is originally published elsewhere, and the decision for publication was taken by the blog authors. In rare cases, blog authors might retract a blog post or post a correction, and that information should also be communicated by Rogue Scholar and via the DOI metadata. When blog posts don't follow the above guidelines, e.g. when inappropriately using AI Tools, Rogue Scholar staff, after consultation with the Rogue Scholar Advisory Board, will decide on appropriate actions, including retraction. ## References 1. McNutt, M. K., Bradford, M., Drazen, J. M., Hanson, B., Howard, B., Jamieson, K. H., Kiermer, V., Marcus, E., Pope, B. K., Schekman, R., Swaminathan, S., Stang, P. J., & Verma, I. M. (2018). Transparency in authors’ contributions and responsibilities to promote integrity in scientific publication. _Proceedings of the National Academy of Sciences_ , _115_(11), 2557–2560. https://doi.org/10.1073/pnas.1715374115 2. _Authorship and AI tools_. (2024). Committee on Publication Ethics. https://doi.org/10.24318/cCVRZBms 3. Holcombe, A. O. (2019). Contributorship, Not Authorship: Use CRediT to Indicate Who Did What. _Publications_ , _7_(3), 48. https://doi.org/10.3390/publications7030048 4. Marcum, C. S. (2025, April 8). Peer-Review for a Blog Post? My Experience with MetaROR. _Front Matter_. https://doi.org/10.54900/bymaz-4fw37 5. Dingemanse, M. (2025, May 2). Why synthetic text is incompatible with science blogging. _Front Matter_. https://doi.org/10.59350/63b1y-1js90
15.05.2025 20:01 — 👍 1    🔁 2    💬 0    📌 0
Post image

Science is under attack. In a new Upstream post "The resilience of open science in times of crisis" @jeroenbosman and I detail current events around 5 types of threats, show how scientists and others are pushing back, and we propose a resilience model to […]

[Original post on akademienl.social]

13.05.2025 09:00 — 👍 1    🔁 4    💬 0    📌 0
Preview
Personal names in science blogs Personal names remain among the hardest scholarly metadata to capture properly, including for science blog posts. This week, the Rogue Scholar science blog archive therefore changed how it stores blog post author names: no longer as **name** , which is the standard in RSS, Atom, and JSON Feeds, but as **name** only for an organizational author, and as **given** and **family** name for personal authors. This follows the best practices established by ORCID, Crossref, and the Citation Style Language. DataCite and InvenioRDM (the repository platform powering Rogue Scholar and based on the DataCite metadata model) use an implementation that was good for transitioning from names to given and family names (keeping the name field for personal names), but created confusion – personal names should be stored as **family name** , **given name**(Doe, John) in the **name** field, but that is not trivial to enforce and led to many organizations still submitting DataCite metadata with **given name family name** (e.g. John Doe) as name. For this reason, Rogue Scholar is following the Crossref model and is dropping **name** for personal names, even if it is a breaking change. Extracting the given and family name from a name can not be fully automated, as there are important edge cases: a) organization names that look like personal names (Alfred P. Sloan Foundation) and b) personal names with multiple family names (Bastian Greshake Tsovaras) vs. multiple given names (Martin Paul Eve) vs. names with propositions (Wilma van Weezenbeck). More background info and additional edge cases (e.g. given name without family name) can be found in a 2011 W3C document. To handle these special cases, Rogue Scholar has started a curated list of author names that fall into one of these categories and can correctly split names into given and family names, or not split the name as it is an organizational name. The transition of the more than 40K blog posts into the new format will take some time, but is only urgent for the edge cases mentioned above. There are further issues with personal names and metadata for scholarly blogs, including handling multiple authors (which not all blogging platforms support), author identifiers (ORCID, etc.), and author affiliations (using identifiers such as ROR, etc., affiliation changes over time) – but that is material for another blog post. ## References _Personal names around the world_. (n.d.). Retrieved May 6, 2025, from https://www.w3.org/International/questions/qa-personal-names.en
06.05.2025 11:42 — 👍 1    🔁 0    💬 0    📌 0
Preview
Rogue Scholar Newsletter April 2025 This is the April issue of the monthly newsletter from the Rogue Scholar science blog archive. The newsletter reports on new blogs that have joined the platform, important technical updates in Rogue Scholar infrastructure, community updates, and other news relevant to Rogue Scholar users. ## Blogs added to Rogue Scholar Nine blogs from six different subject areas were added in April. Welcome everybody! More information about some of the blogs added will follow in the coming weeks. ### Open Access News _Social science, English._ https://legacy.earlham.edu/~peters/fos/fosblog.html ### Crossref Blog _Computer and information sciences, English._ https://www.crossref.org/blog/ ### Open Bioinformatics Foundation _Biological sciences, English._ https://www.open-bio.org/ ### Research Organization Registry (ROR) _Computer and information sciences, English._ https://ror.org/blog/ ### Netzwerk Fluchtforschung _Social science, German._ https://fluchtforschung.net/de/blog/ ### kfitz _Humanities, English._ https://kfitz.info/ ### Jachère Journal _Other humanities, English._ https://jache.re/ ### Appalachianhistorian.org _History and archaeology, English._ https://appalachianhistorian.org/ ### Bauhinia Genome _Biological sciences, English._ http://bauhiniagenome.hk/ ## Technical Updates Since April 15, Rogue Scholar has shown the full-text of all blog posts in the web interface, after making the full-text available via API since the service's launch. The full-text continues to be included in Rogue Scholar full-text search, allowing readers to find terms not in the metadata, e.g. National Miners Union in this example from a post published today. On April 14, I enabled Rogue Scholar web analytics using the Plausible Analytics platform. The platform collects no personally identifiable information and the data are available via a public dashboard: Public Plausible Analytics dashboard It is too early to take a deeper dive into the data, including how the traffic to Rogue Scholar compares to the traffic to participating blogs. Both this blog and the Upstream blog also use the Plausible Analytics platform, so in a few months that comparison can take place. DOI registration has become easier for WordPress blogs, as the DOI is now automatically generated from the blog identifier and the **post_id** , e.g. https://doi.org/10.59350/fluchtforschung.14743 for the post with the**post_id** https://fluchtforschung.net/?p=14743. This makes it easier for blogs to display the DOI for each post, as the information is available before publication. This functionality is available to new WordPress blogs and early adopters now, and becomes the standard setting for Wordpress blogs on May 15. Blogs using static site generators (Hugo, Jekyll, Quarto, etc.) have similar functionality since January. Going forward I will add the same pre-registration functionality to the blogging platforms currently missing, e.g. Blogger or Substack. ### Community Update The Rogue Scholar Advisory Board met virtually on April 16. A summary of the discussions and advice will be posted as a blog post in the coming weeks. The small but growing Rogue Scholar Slack community continues to have interesting conversations that are difficult or impossible over social media or email. Please use Slack, email, Mastodon, or Bluesky if you have any questions or comments regarding this monthly newsletter. Rogue Scholar is a scholarly infrastructure that is free for all authors and readers. You can support Rogue Scholar with a one-time or recurring donation or by becoming a sponsor. ## References 1. Fenner, M. (2025, April 14). Rogue Scholar adds full-text content to all blog post web pages. _Front Matter_. https://doi.org/10.53731/f6tvb-se107 2. Fenner, M. (2025, April 24). DOI registration workflow for a science blog (version 2). _Front Matter_. https://doi.org/10.53731/fz73s-sv368
30.04.2025 19:24 — 👍 0    🔁 0    💬 0    📌 0
Preview
DOI registration workflow for a science blog (version 2) _This post is an updated version of the_ _DOI registration workflow for a science blog_ _post I published in September 2023. It reflects the best practices used by the Rogue Scholar science blog archive and contains one important announcement._ In previous blog posts such as the one published earlier, I discussed the various elements involved in registering a DOI for a science blog post. Briefly, the Rogue Scholar service takes advantage of the fact that blogs * use RSS feeds (or the Atom or JSON Feed format) and/or JSON APIs to distribute content and metadata at the time of publication, * these feeds contain the most important metadata needed for publication – such as title, authors, publication date, and * addition metadata (such as abstract and references) can be automatically extracted from the full-text content included in the feed. DOI registration itself has technical (generating metadata that conforms to a specific schema) and business (membership in a DOI registration agency such as Crossref) requirements that are not trivial, so ideally and unless the blog is publishing a lot of content similar to a journal, it is handled by a dedicated service — Rogue Scholar. This basic workflow can be optimized in many ways, such as including funding information, but one fundamental issue remains to be solved: how does the blog learn about the DOI registered for a new post and automatically add it to the blog? There are two basic approaches: a) generate a random DOI and communicate this back to the blog, or b) let the blog pick the DOI, following some basic rules. Most importantly that the DOI is unique, but ideally is a relatively short string without special characters that can easily copy/pasted, and that the DOI is opaque, i.e. contains no meaning that becomes problematic over time. Before January 2025, Rogue Scholar was using the first workflow, i.e. generate a random DOI and communicate this back to the blog via the Rogue Scholar API and website. ## Canonical URL As much as possible Rogue Scholar takes advantage of technologies that have existed for a long time and are not specific to scholarly content. That's why the service works with existing blogs that use standard blogging software - currently eleven different platforms, the most popular being Wordpress, Blogger, and Hugo. These platforms don't know about DOIs without extra work, but they all know about a similar concept: canonical URLs. Wikipedia explains: > A **canonical link element** is an HTML element that helps webmasters prevent duplicate content issues in search engine optimization by specifying the "canonical" or "preferred" version of a web page. It is described in RFC 6596, which went live in April 2012. The problem canonical URLs are addressing is duplicate content at different locations that can confuse search engines such as Google or Bing. This is related to the problem persistent identifiers such as DOIs are addressing for the scholarly community: accessing content over long periods of time that may change its location on the web (its URL), with two inter-related strategies: * **URL redirection**. DOIs redirect to a target URL that can be changed by the publisher, * **Persistence**. The publisher of scholarly content makes an extra effort to make sure content doesn't disappear (link rot), or significantly change (content drift). Obviously, canonical URLs are not DOIs, but they provide a standard way for a science blog to add a DOI to a post. ## Backends Science blogs provide a backend to store content and metadata, including the canonical URL. This can either be a database (as in the case of Wordpress or Ghost) or a file (as in the case of Hugo and many other static site generators). ### Wordpress Wordpress doesn't know about canonical URLs out of the box, but they can be added via a plugin, the most popular for this being Yoast SEO (which comes in free and paid versions). After installing and activating the plugin you can add a canonical URL in a new Yoast SEO section of the post editor: Alternatively, you can fiddle with your Wordpress configuration to add a custom field for the canonical URL. ### Ghost The Ghost blogging platform has a canonical URL field for every post, which you can access from the post settings sidebar: ### Hugo Hugo and other Open Source static site generators give you a lot of flexibility with metadata. If you add a `canonicalUrl` field to the blog post Front Matter, you can reuse it for the canonical URL (with some additional work). The canonical URL or DOI is now stored with the blog post, but also exposed to web crawlers. The format is `<link rel="canonical" href="``https://doi.org/10.53731/gvb08-7kc16``">`. ## Frontends To display the canonical URL aka DOI on your blog frontend, you have to modify your blog theme, the popular themes for Wordpress, Ghost, and Hugo don't really support displaying the canonical URL out of the box, as they are primarily intended for web crawlers and not humans. You should follow the Crossref DOI display guidelines, when thinking about how to display the DOI for your blog post, i.e. always be displayed as a clickable full URL link. Rogue Scholar displays DOIs like this: This blog (using the Ghost platform) displays DOIs like this in a sidebar: ## DOI registration workflow The changes to the backend and frontend explained above are good enough for occasional blog posts or to get started with Rogue Scholar. After a blog post is published, Rogue Scholar will register a DOI within 20 minutes and show that DOI on the website or via API. You can then copy/paste that DOI into your new canonical URL field. A simple improvement would be notifications of new DOI registrations by email, similar to what Crossref is sending to Front Matter as the Crossref member: <?xml version="1.0" encoding="UTF-8"?> <doi_batch_diagnostic status="completed" sp="ds5"> <submission_id>1590342900</submission_id> <batch_id>8a637b09-fda6-4980-baa1-147497683bd9</batch_id> <record_diagnostic status="Success"> <doi>10.53731/w6nzs-jta75</doi> <msg>Successfully added</msg> <citations_diagnostic> <citation key="ref1" status="resolved_reference">10.53731/gvb08-7kc16</citation> <citation key="ref2" status="resolved_reference">Cite to nonCR doi: 10.5281/zenodo.1324300</citation> <citation key="ref3" status="resolved_reference">10.1371/journal.pone.0115253</citation> <citation key="ref4" status="resolved_reference">10.59350/p000s-pth40</citation> <citation key="ref5" status="resolved_reference">10.53731/r79x921-97aq74v-ag5a2</citation> </citations_diagnostic> </record_diagnostic> <batch_data> <record_count>1</record_count> <success_count>1</success_count> <warning_count>0</warning_count> <failure_count>0</failure_count> </batch_data> </doi_batch_diagnostic> But maybe including a clickable link to the DOI just registered and some basic metadata that were registered (as it takes a few hours until the metadata show up in the Crossref REST API). For blogs with a more frequent publication frequency (e.g. weekly or daily) this workflow should be automated. One important consideration is whether the blog should know the DOI that will be registered in advance, avoiding the round trip with Rogue Scholar and Crossref, and allowing customizations of the DOI name, such as `10.53731/front-matter.2023-09-19`. The biggest advantage would be that the DOI name can be shared in advance of publication, e.g. for press releases, or to reference in other content. While these considerations are reasonable and not new for DOIs in general, for the science blog use case the workflow should be simple and I want to follow these principles: * Rogue Scholar DOIs will be generated as a short random 10-character string upon DOI registration. Rogue Scholar users or staff can't modify the DOI names that will be generated. Rogue Scholar DOIs are cool DOIs. * If you see a Rogue Scholar DOI, it can be used (immediately as a link, accessing the metadata after a few hours). Rogue Scholar is not offering DOIs that are not or not fully registered, i.e. DOIs for pending publications (Crossref) or draft DOIs (DataCite). * DOI registration happens with the Rogue Scholar service talking to the Crossref API, participating blogs don't need to install or develop functionality to generate Crossref metadata and/or interact with the Crossref API. While this workflow was a reasonable start, it was overly complicated and required an extra effort by the science blog. So in January 2025, Rogue Scholar started a new workflow: If the blog generated the DOI string containing the same random 10-character string, and added this string to the RSS feed, Rogue Scholar would use that string for DOI registration. Ten blogs are already participating in that workflow and the experience the past three months has been very positive. As always, the devil is in the details, and on one occasion the checksum of the provided DOI string was not valid. The limitation of this workflow is that it requires the blog to send the intended DOI string in the RSS feed. Which works nicely for static site generators, but for database-driven blogging platforms this may not possible. So this week Rogue Scholar is launching a new workflow. ### Generating DOI strings from the id/guid in the blog post feed Blogging platforms that are not static site generators but database-driven use a unique identifier for blog posts provided by the database. This can be long and complicated, as is the case for Blogger, Substack, or Ghost, but in the case of Wordpress the **post_id** is a simple number that increases with every post. And the feed contains this `id/guid` together with the hostname of the blog as URL, e.g. `https://svpow.com/?p=23496`. Every blog in Rogue Scholar has a unique identifier, which is used internally and to identify the blog communities, typically based on the domain name, so the **Sauropod Vertebra Picture of the Week** (svpow) blog can be found here. The combination makes a relatively short, globally unique identifier that can be used for the DOI string: `https://doi.org/10.59350/svpow.23496` Rogue Scholar added support for this DOI format for Wordpress blogs this week. This feature is currently in beta testing, please reach out if you want to be an early adopter. If there are no surprising issues, I expect this feature to roll out for all Rogue Scholar Wordpress blogs on May 15. And if your blog uses a static site generator (e.g. Hugo, Jekyll, or Quarto), you can also reach out if you want to pre-assign DOIs in the random format. They still made sense here, as static site generators don't automatically generate unique persistent IDs for posts (they generate permalinks, which depend on the configuration and may change over time). ## References Fenner, M. (2023, September 22). DOI registration workflow for a science blog. _Front Matter_. https://doi.org/10.53731/w6nzs-jta75 Fenner, M. (2023, September 19). Streamlining the archiving of science blog posts. _Front Matter_. https://doi.org/10.53731/gvb08-7kc16 Fenner, M. (2025, January 16). Persistent identifiers, random strings, and checksums. _Front Matter_. https://doi.org/10.53731/6kfyy-nq280
24.04.2025 16:53 — 👍 1    🔁 1    💬 0    📌 0
Preview
Working with the Research Organization Registry (ROR) Data Dump The commonmeta Go library has seen a major update this week that dramatically simplifies working with the Research Organization Registry (ROR) data dump, including conversion to other serialization formats (e.g. JSON Lines) and metadata formats (InvenioRDM), and integrated affiliation matching. ### File Download ROR metadata are updated regularly (typically about once a month) and made available as a file download via the Zenodo repository under a Creative Commons Zero waiver. The single file is a compressed zip archive with the metadata in JSON and CSV formats, each for v1 and v2 of the ROR schema. There are two challenges with the ROR data dump file download: while there is a stable DOI for the latest version of the data, that DOI resolves to the dataset landing page, and there is no easy way to automatically get to the file download URL for automatic downloads of new versions. The other challenge is that the compressed zip archive contains four archived files that can't be downloaded individually. The complete archive is 58.6 MB, whereas the zipped v2 JSON would be 17.7 MB (the uncompressed file is 256.9 MB). In the commonmeta library, the download URL of the most recent ROR data dump and the file names in that zip archive are hard-coded. commonmeta can automatically fetch the full zip archive and selectively extract the v2 JSON. When using commands that require ROR metadata, commonmeta looks for a data dump in zipped Avro format (more on Avro below) in the folder where the command is run. If that file isn't found, commonmeta looks for a v2 JSON file from the data dump, and if that file isn't found either, fetches the data dump from Zenodo, extracts the v2 JSON file, and generates the compressed Avro file. For example a local lookup of an organization via its ROR identifier: commonmeta convert https://ror.org/04jvcky17 Running this command transparently downloads, extracts, and converts the latest ROR data dump and looks up the metadata for https://ror.org/04jvcky17 (Newport Festivals Foundation). This takes only a few seconds (depending on your network connection), and going forward uses ROR data stored locally. ### File Formats Commonmeta can automatically convert the JSON data of all 115K ROR records into other serialization formats, currently JSON Lines, YAML, CSV, and Avro. And optionally compress them as zip archive, for example: commonmeta list --from ror --file mydata.csv.zip This command in a few seconds generates a compressed CSV file (almost) identical to the CSV provided in the ROR data dump. Whereas JSON, JSON Lines, and YAML are straightforward to work with, the CSV format has limitations and shows only a (large) subset of the metadata. Avro is another special format; it is focused on efficient storage and transmission over network connections, but is not human-readable. Avro uses a schema in JSON format, which has advantages over the schema-less JSON, YAML, and CSV, including data validation and smaller file sizes. These are the file sizes for the ROR data dump (the JSON file is slightly smaller than the original file because null values were omitted). Format | Size (MB) | Size ZIP (MB) ---|---|--- JSON | 182.3 | 17.0 JSON Lines | 108.1 | 14.5 YAML | 125.5 | 15.3 CSV | 33.5 | 10.3 Avro | 41.6 | 13.2 Other criteria besides file size are the speed of reading and writing files in this format, readability by humans, and supported data types. CSV is very readable, but is only useful as output format, as data types other than text and numbers are not supported. Avro generates the smallest files, but is more complicated to work with as it requires a schema. YAML is very human-readable, whereas JSON Lines works well over network connections as it is easier to stream than JSON. commonmeta allows working with all these formats for ROR data (CSV only as output as it doesn''t include all metadata), so you can for example give a JSON Lines file to a colleague and she can use it as commonmeta input: commonmeta list mydata.jsonl --from ror --file mydata2.csv ### Metadata formats and filtering The ROR schema describes the metadata needed for the affiliation use case in scholarly works. A related metadata schema is used by the InvenioRDM repository platform that the Rogue Scholar blogging platform also uses. Here metadata vocabularies are typically described in YAML and use a subset of the ROR metadata, both for author affiliations and for funders. One small twist is that one metadata field is different: affiliations support acronyms for names, whereas funders support the country code. commonmeta can handle this and also generate a subset of organizations that are of type `funder` : commonmeta list --from ror --to inveniordm --file funders.yaml One challenge in the current version of the InvenioRDM platform (v12.0) is that importing large vocabularies (e.g. all ROR data) is slow and error-prone. In a previous blog post I suggested to only import the affiliations needed, but that workflow is slow and complicated. The upcoming version v13.0 of InvenioRDM has better handling of large vocabulary imports, in the meantime commonmeta supports the generation of smaller vocabularies that can be imported in batches, e.g. commonmeta list --from ror --to inveniordm --file affilations_ror.yaml -n 10000 --page 1 This command generates a YAML file in a format InvenioRDM understands and containing only 10,000 organizations. By installing commonmeta on the InvenioRDM server and running this command repeatedly and importing the YAML in batches we can overcome the limitations of the v12.0 vocabulary import. commonmeta currently supports two filters to generate subsets of the ROR data: by organization type (e.g. funder or university) and/or by country. Please reach out if you are interested in other metadata formats for organizations and/or filters. ### Queries commonmeta supports simple queries of the local ROR data by ROR ID or external ID (Crossref Funder ID, GRID, ISNI, Wikidata): commonmeta convert Q7713086 More complex queries are currently not possible with local data, but commonmeta integrates with the ROR API to support affiliation matching. commonmeta match --from ror "The Alfred Hospital" This will return a single ROR record if a match with a score of 0.9 or higher is found by the ROR API. The affiliation matching can be combined with looking up metadata from Crossref, DataCite, or InvenioRDM, a core functionality of commonmeta. If affiliation names but no ROR ids are provided, commonmeta can automatically merge the information found by affiliation matching. To indicate that the metadata was found by ROR and not the publisher, commonmeta adds an `assertedBy` field with the value `ror` to the response. Affiliation identifiers provided by the publisher will have a `publisher` value in the response. The following query returns a random sample of 50 publications from Crossref member 31795 (Front Matter) with affiliation matching applied and the results stored in commonmeta schema format. commonmeta list --from crossref --member 31795 --sample=true -n 50 --match=true --file matching.json ### Conclusions The work on using Go to read, format, and integrate ROR metadata was inspired by a session at the recent InvenioRDM Partner meeting in Hamburg. InvenioRDM is written in Python and Javascript/React, but Go is a great alternative for simple installations (commonmeta is a single 5 MB binary) and performance-critical functions (e.g. converting the ROR data dump into the InvenioRDM YAML format). Rust is another language that people use in these situations, and more work is needed to compare the relative strengths and weaknesses. More work is also needed to decide on the best file format for storing scholarly metadata at scale. Avro looks promising, but needs to be compared with JSON in more detail. Another interesting newer format is Parquet. It is a column-oriented file format in contrast to the formats described here, which are row-based. This makes some things harder but other things much easier, and this becomes more critical as the number of metadata records grows from 100K (ROR) to the millions (DataCite, Crossref). Finally, this work demonstrates that a good proportion of metadata work can be done locally, working with data dumps rather than high frequencies of API calls. ## References Research Organization Registry. (2025). _ROR Data_ (Version v1.63) Dataset]. Zenodo. [https://doi.org/10.5281/ZENODO.6347574 Martin Fenner. (2025). _front-matter/commonmeta: V0.19.4_ (Version v0.19.4) Computer software]. Zenodo. [https://doi.org/10.5281/ZENODO.15256488 Fenner, M. (2025, April 7). Where I simplified ROR affiliation metadata handling. _Front Matter_. https://doi.org/10.53731/ymbv8-7jm78 Fenner, M. (2025, March 19). Rogue Scholar meets the InvenioRDM community. _Front Matter_. https://doi.org/10.53731/1aw0b-pr243
21.04.2025 18:39 — 👍 5    🔁 6    💬 0    📌 0
Preview
Rogue Scholar adds full-text content to all blog post web pages Today, the Rogue Scholar science blog archive launched an important new feature: showing the full-text content (in addition to metadata) of all participating blogs on blog post pages. Rogue Scholar has always stored the full-text internally and made it available via the REST API, as the full-text is needed for archiving and full-text search. The display of full-text content on Rogue Scholar blog post pages gives blog authors immediate feedback on how their blog posts look outside of their blogging platform and how they will be archived. This is especially important for included images and advanced metadata such as references. Screenshot of https://rogue-scholar.org/records/macsk-y9124 The references that Rogue Scholar detects and registers with the Crossref metadata for the post are shown on the same page, giving immediate feedback. Screenshot of https://rogue-scholar.org/records/macsk-y9124 The display of images helps detect broken image links (images are not yet stored with Rogue Scholar) and wrong image sizes. It also shows best practices such as providing figure legends and alt text (both missing in this example): Screenshot of https://rogue-scholar.org/records/5dgfh-cdh66 The display of the full-text also gives quick feedback on full-text search results, e.g. for the term Xanadu. Future versions might also show the highlights returned by the Opensearch search index. This new feature requires re-indexing of all blog posts, as full-text is now stored in HTML instead of markdown format (as both RSS feeds and Rogue Scholar use HTML). About 70% of blog posts are already processed; the remaining posts will be stored as HTML full-text until the end of this week. With nearly 150 participating blogs, not all formatting edge cases, mostly around custom CSS, could be addressed initially. I hope to resolve the outstanding issues by the end of May. Please reach out via Slack or email if you find an issue with the HTML full-text. Older content such as the first post on my personal blog from August 2007 has seen several platform migrations (four in this case), so archiving content independent of any platform-specific formatting is important. I hope that the display of the full-text helps with that goal. One positive additional outcome could be that migrating to a different blogging platform becomes easier. ## References Fenner, M. (2014, March 3). Six Misunderstandings about Scholarly Markdown. _Front Matter_. https://doi.org/10.53731/r294649-6f79289-8cw0j Fenner, M. (2020, August 27). DataCite Commons—Exploiting the Power of PIDs and the PID Graph. _Front Matter_. https://doi.org/10.53731/kx45q-14h82 Fenner, M. (2007, August 3). Open access may become mandatory for NIH-funded research. _Front Matter_. https://doi.org/10.53731/r294649-6f79289-8cw1q
14.04.2025 15:18 — 👍 1    🔁 4    💬 0    📌 0
Preview
Where I simplified ROR affiliation metadata handling The InvenioRDM project partners met in Hamburg two weeks ago, and we discussed a wide range of topics over five days. InvenioRDM is the open source repository platform that also powers the Rogue Scholar science blog archive, and one priority for me is to keep the repository platform's maintenance simple while adding the features I need. The workshop motivated me to work on author affiliations, inspired by a session on using programming languages other than Python or JavaScript for tasks that require a lot of data processing and network traffic. One core functionality of InvenioRDM and Rogue Scholar is information about author affiliations, using the Research Organization Registry (ROR) persistent identifier. About 100K organizations relevant for the scholarly community are included in ROR with a persistent identifier and relevant metadata such as names in multiple languages, organization type, geolocation, and other organizational identifiers (disclaimer: I was heavily involved in launching the ROR registry in early 2019). To make this functionality work, InvenioRDM stores information about affiliations from a YAML file into the InvenioRDM database, allowing users to pick an affiliation via the InvenioRDM user interface. This YAML file contains only a small subset of the metadata made available via ROR, basically only the ROR ID and organization names in one or more languages. ROR releases an updated data dump once a month, a 59 MB compressed file containing metadata for all records in JSON and CSV formats. To extract all records from the 257 MB uncompressed JSON file and store the metadata needed for InvenioRDM in a YAML file is not difficult, but it requires automation and is a good task for the Go programming language. I updated the commonmeta Go library to do this with version v0.17.5 released today. To convert a ROR data dump in JSON format into a YAML file InvenioRDM can understand, run this command: commonmeta transform v1.63-2025-04-03-ror-data_schema_v2.json --from ror --to inveniordm --file affiliations_ror.yaml This step takes about five seconds. You can also generate a compressed version by adding the `--compress` flag. This brings the size of the YAML file containing all 115K ROR records down from 21 MB to 4.1 MB. But you can go one step further. commonmeta is a single Go binary without dependencies. In version v0.17 I added the compressed ROR records in InvenioRDM YAML format to the commonmeta binary, increasing the file size from 4 MB to 8 MB. You can now output the same YAML file with this command (i.e., without the ROR JSON input file): commonmeta transform --to inveniordm --file affiliations_ror.yaml This command runs without any network operations. As Golang programs are distributed as a single binary, embedding files is a convenient way to distribute metadata and/or content together with code. The process of downloading the latest ROR data dump could also be optimized. One challenge is the direct download of files, as the DOI of the data dump resolves to the record landing page on Zenodo, and more work is needed to get the download links from the API in an automated fashion. ### InvenioRDM Starter For a smooth start with InvenioRDM, I am maintaining the InvenioRDM Starter project that includes a prebuilt InvenioRDM Docker image and can be started with the included Docker Compose file. The commonmeta Go library can provide data to load into your InvenioRDM Starter instance by querying DataCite, Crossref, or other InvenioRDM instances (not working yet with Zenodo because the API is highly customized to be backward-compatible). One obvious use case is to load Crossref or DataCite DOI metadata from authors from a particular institution using the ROR identifier, for example 1000 Crossref records from the University of Münster (ROR ID https://ror.org/00pd74e08): commonmeta push -f crossref -t inveniordm --ror 00pd74e08 -n 1000 --host localhost --token xxx To do the same query with DataCite, replace the `-f` (from) flag: commonmeta push -f datacite -t inveniordm --ror 00pd74e08 -n 1000 --host localhost --token xxx You first need to create an account and set up a token in your InvenioRDM Starter instance running at https://localhost. And you need to do one additional step: load all ROR affiliations that you need. You have two options: a) load the file with all ROR affiliations created above, or b) only load the ROR affiliations you need for your test data. The former is challenging as the import of more than 100K affiliations via API can be tricky. To do the latter, you can generate an JSON file with your metadata first and generate an `affiliations_ror.yaml` file with the `--vocabulary` flag: commonmeta list -f crossref -t inveniordm --ror 00pd74e08 -n 1000 --vocabulary There is unfortunately still a bit work needed to get all this working. But the end result is something like this in your InvenioRDM Starter instance: This also works with the hundreds of authors and their affiliations common to high-energy physics: And ROR is also used for the search facets, allowing you to filter records by affiliation (e.g. co-autors from outside the University of Münster): The improved tooling for working with ROR metadata enables a lot of interesting functionality for institutions running an instance of InvenioRDM. A lot more work is needed to improve the documentation and developer experience, but the InvenioRDM Starter example already nicely demonstrates how to quickly get external metadata into InvenioRDM without intermediaries such as research information systems (CRIS). ## References _Hear us ROR! Announcing our first prototype and next steps_. (2019, February 10). Research Organization Registry (ROR). https://ror.org/blog/2019-02-10-announcing-first-ror-prototype/ Research Organization Registry. (2025). _ROR Data_ (Version v1.63) Dataset]. Zenodo. [https://doi.org/10.5281/ZENODO.6347574 Martin Fenner. (2025). _front-matter/commonmeta: V0.17.5_ (Version v0.17.5) Computer software]. Zenodo. [https://doi.org/10.5281/ZENODO.15169107
07.04.2025 17:49 — 👍 2    🔁 2    💬 0    📌 0
At the same time, the organization disclosed today in a new blog post, the situation is becoming untenable: AI-driven requests for Wikipedia content are growing exponentially, at significant cost to the foundation, and are not being paid by the companies responsible. Those companies are also serving information from Wikipedia without attribution, greatly reducing the chances that their users will visit Wikipedia and contribute or donate to it.

Moreover, during surges of interest in the site — after the death of a celebrity, for example, or in the wake of a natural disaster — bot traffic to other pages is now so significant that it is causing slower page loading times for human users.

At the same time, the organization disclosed today in a new blog post, the situation is becoming untenable: AI-driven requests for Wikipedia content are growing exponentially, at significant cost to the foundation, and are not being paid by the companies responsible. Those companies are also serving information from Wikipedia without attribution, greatly reducing the chances that their users will visit Wikipedia and contribute or donate to it. Moreover, during surges of interest in the site — after the death of a celebrity, for example, or in the wake of a natural disaster — bot traffic to other pages is now so significant that it is causing slower page loading times for human users.

NEW: AI bots caused Wikipedia's bandwidth costs to increase by 50 percent last year, and are still growing exponentially. I wrote about its plan to fight back — and whether it will be enough. https://www.platformer.news/wikipedia-ai-bot-traffic-costs-plan/

02.04.2025 00:27 — 👍 21    🔁 106    💬 3    📌 0
Preview
Rogue Scholar Newsletter March 2025 This is the third issue of the monthly newsletter from the Rogue Scholar science blog archive. The newsletter reports on new blogs that have joined the platform, important technical updates in Rogue Scholar infrastructure, community updates, and other news relevant to Rogue Scholar users. ## Blogs added to Rogue Scholar Four blogs from four different subject areas were added in February. Welcome everybody! ### John Arundel (Bitfield Consulting) _Computer and information sciences, English._ https://bitfieldconsulting.com/posts/ ### Roger Beecham's blog Social and economic geography _, English._ https://www.roger-beecham.com ### Análise Quantitativa das Mudanças Sociais _Social science, Portuguese._ https://aqms.substack.com/ ### Blogposts on autosys _Computer and information sciences, English._ https://autosys.informatik.haw-hamburg.de/blog/ ## Technical Updates In March, I continued work on the statistics page, which was renamed to the Rogue Scholar dashboard. The data for this page comes from Rogue Scholar search facets, which have been greatly expanded, including filtering by publication year: Work has started to show the full-text content (stored in the database since Rogue Scholar launched and available via API) on record landing pages. This helps with archiving and searching the full-text content. The feature is currently undergoing extensive testing and will launch on April 14. Users with **manager** permissions for communities (blog, subject area, or topic) can see the full-text already. If you do, please provide feedback. ### Community Update InvenioRDM is the repository platform that powers Rogue Scholar as well as more than 20 other repositories, including Zenodo. Last week, about 40 people met in Hamburg for the annual partner meeting to discuss ongoing development, new features, and the timeline for the release of version v13 of the platform. On the first day, we had short presentations from about ten InvenioRDM instances in production, highlighting unique functionalities. I shared my slides two weeks ago. Please use Slack, email, Mastodon, or Bluesky if you have any questions or comments regarding this monthly newsletter. Rogue Scholar is a scholarly infrastructure that is free for all authors and readers. You can support Rogue Scholar with a one-time or recurring donation or by becoming a sponsor. ## References Fenner, M. (2025, March 10). Working on the Rogue Scholar dashboard. _Front Matter_. https://doi.org/10.53731/wtvvs-f4h04 Fenner, M. (2025, March 19). Rogue Scholar meets the InvenioRDM community. _Front Matter_. https://doi.org/10.53731/1aw0b-pr243 Fenner, M. (2025). _Rogue Scholar InvenioRDM Workshop 2025_. https://doi.org/10.5281/ZENODO.15050863
31.03.2025 16:41 — 👍 0    🔁 1    💬 0    📌 0
Gone surfing It's the start of a new week and time for a fresh round of updates from your favorite ~~neighborhood~~ intergalactic pugs. Here, hold my miniature kong. Last week we covered the opening gambit of our social web beta, along with our new onboarding guide. This week, we're diving headfirst into some of the changes and improvements that have been made based on your feedback, and how we're going to address your _most_ requested feature. ## What's new with ActivityPub? This week is a veritable grab bag of news, mostly centered around bugfixes and improvements, of which there have been many. Since the launch of the beta, our little team has been carefully watching all the feedback coming in, investigating reports of things that aren't working, and responding as quickly as possible. In the past seven days, we shipped 47 different fixes and improvements, addressing a combination of things you noticed and things we noticed while we were fixing the things you noticed. Many of the fixes were small, like how sometimes you couldn't see the "following" list when viewing a profile. Others were more substantial, like a permissions bug that meant the ActivityPub screens weren't visible to administrator users. In the meantime, the number of users in the beta is still rapidly climbing! Thanks to some work from the Threads team, we also made significant progress in interoperability. You can now follow Threads accounts from Ghost, and their posts will appear in your feeds. There's still a long way to go, though: Profile pictures don't work, and posts from Ghost don't yet show up on Threads. These are things that still need to be resolved by the Threads team. ## How to find people to follow In our last newsletter we encouraged everyone to reply using ActivityPub to help Ghost users discover each other – and you did! There were over 50 replies, and last week's replies became a fantastic way to find and follow Ghost publishers. If you're ever reading a post on the social web inside Ghost and you see someone interesting in the replies, you can click on their username or profile picture to reach their profile, and follow them. 0:00 /0:17 1× This is the backbone of organic discoverability: following the rabbit hole of other people responding and engaging with content you're interested in, which often leads to unexpected finds. Back in the olden days, we would call this "surfing." [_cries in millennial_] We're also using your replies to find more people and publications to add to the **Explore** section inside Ghost – you can expect to see some updates there this week. ## What's coming next Outside of bugfixes and general improvements, there are two main things on our pug radar screens right now. The first is announcing the social web beta more widely, which we plan to do this week. Initially, we only announced the beta to subscribers of this newsletter to give us a little time to monitor how it was all going. Now that we're reasonably confident no servers will melt, we'll promote the beta on all of Ghost's official channels. The second is username customization. We know `@index@www.site.com` isn't the prettiest, and what you really want is `@name@site.com`. We're working on it! The easy part is allowing the username to be changed. The hard part is making sure you don't lose all your followers when you do. Hopefully, we'll have some good news to share on this front in the next couple of weeks.
31.03.2025 06:24 — 👍 7    🔁 4    💬 1    📌 0

Substack rival Ghost is now connected to the fediverse https://techcrunch.com/2025/03/19/substack-rival-ghost-is-now-connected-to-the-fediverse/

19.03.2025 18:48 — 👍 12    🔁 133    💬 5    📌 0
Preview
Rogue Scholar meets the InvenioRDM community Next week the InvenioRDM community will meet in Hamburg for five days to discuss the open source repository platform. Front Matter has been part of the InvenioRDM community since August 2021, and Rogue Scholar relaunched on the InvenioRDM platform in October 2024. The last workshop of the InvenioRDM partners took place in March 2024 in Münster, so this is the first meetup where I am running a production instance of InvenioRDM. I hope to share some of the things I learned running Rogue Scholar, talk about the customizations that may be of interest to other InvenioRDM instances, and ask many questions. On the first day of the workshop we will have a session where InvenioRDM instances can present the highlights of what they are doing in 5 minutes, and today I have uploaded my slides to Zenodo, which is of course the repository where it all started. Here are some of the special Rogue Scholar features I want to highlight: * All records are scholarly blog posts, in multiple languages and covering all subject areas. This makes Rogue Scholar different from both an institutional repository (research outputs from a specific institution) and a disciplinary repository (research outputs in a specific subject area). Rogue Scholar is similar to preprint servers such as arXiv or bioRxiv, but with a broader scope in the subject area. * Almost everything in Rogue Scholar is automated, from extracting content and metadata from participating blogs, to uploading content to Rogue Scholar and DOI registration with Crossref. * Rogue Scholar offers full-text search of all its content, a functionality that is not common in other InvenioRDM instances. The main reason for this is of course that full-text search is difficult to implement if content is not text, or in formats where searching is difficult (e.g. CSV or PDF). * Rogue Scholar tracks the citations of its content and shows them to users. This is done differently from other InvenioRDM instances, as Rogue Scholar uses DOIs from Crossref rather than DataCite and takes advantage of the Crossref Cited-by service. * More recently Rogue Scholar has improved the faceting/aggregation of search results, e.g. by affiliation or publication year. This makes it very easy to visualize key repository indicators in a dashboard. Some of the questions I have for the InvenioRDM community include: * further consolidation of Python package management with uv, including support for the _site_ local folder, * improve Javascript bundling by replacing webpack with a modern alternative such as Rspack, * override UI React components. Documented but I am struggling with the workflow, * using other programming languages besides Python and React/Javascript for InvenioRDM, in particular Golang, * simpler ways to collect use COUNTER-compliant usage stats (views and downloads) that don't use log file processing, * interest in simplifying the InvenioRDM stack by replacing Elasticsearch/Opensearch with the pg_search Postgres extension. If you are a Rogue Scholar user and have topics or questions I should discuss in Hamburg, let me know via email or Slack. ## References 1. Fenner, M. (2021, August 5). First InvenioRDM Long-Term Support (LTS) version released today – and Front Matter is joining as a participating partner. _Front Matter_. https://doi.org/10.53731/r8c26t1-97aq74v-ag66m 2. Fenner, M. (2024, October 14). The Rogue Scholar migration to InvenioRDM is taking shape. _Front Matter_. https://doi.org/10.53731/a7v8h-8px31 3. _InvenioRDM Partner Meeting Summary, March 2024—Inveniosoftware.org_. (n.d.). Retrieved March 19, 2025, from https://inveniosoftware.org/blog/2024-04-23-april-project-meeting-update/ 4. Fenner, M. (2025). _Rogue Scholar InvenioRDM Workshop 2025_. https://doi.org/10.5281/ZENODO.15050863 5. Fenner, M. (2025, March 10). Working on the Rogue Scholar dashboard. _Front Matter_. https://doi.org/10.53731/wtvvs-f4h04
19.03.2025 12:50 — 👍 0    🔁 1    💬 0    📌 0