Screenshot that reads:
Introducing the Anthology for Computers and the Humanities
Taylor Arnold, Maria Antoniak, Miguel Escobar Varela, Marie Puren, Mila Oiva , Amanda Regan, Lauren Tilton, and Melanie Walsh
1 Data Science and Statistics, University of Richmond, U.S.A.
2 Computer Science, University of Colorado Boulder, U.S.A.
3 Faculty of Arts and Social Sciences, National University of Singapore
4 Laboratoire de Recherche de l'EPITA, Paris, France
5 History and Archaeology, University of Turku, Finland
6 History and Geography, Clemson University, U.S.A.
7 Rhetoric and Communication Studies, University of Richmond, U.S.A.
8 Information School, University of Washington, U.S.A.
Permanent Link: https://doi.org/10.63744/HHsQG7hNWyxG
Published: 25 September 2025
As DH grows, itβs increasingly important to publish conference papers, but there hasnβt been a clear venue for that.
So Iβm thrilled to share this new home for DH proceedings, which will include CHR papers & more.
Thanks to @taylor-arnold.bsky.social for leading this effort!
bit.ly/ach-anthology
29.10.2025 15:39 β π 112 π 61 π¬ 6 π 2
The Index and the Vector
Converting ambiguity into precision can help a broader audience discover and learn from collections
New issue of my newsletter: "The Index and the Vector" β Converting ambiguity into precision can help a broader audience discover and learn from collections newsletter.dancohen.org/archive/the-...
20.10.2025 15:29 β π 3 π 4 π¬ 0 π 0
The Libraryβs New Entryway
An interface that combines the advantages of the traditional index with the power of LLMs is the path forward
New issue of my newsletter: βThe Libraryβs New Entrywayβ β An interface that combines the advantages of the traditional index with the power of LLMs is the path forward newsletter.dancohen.org/archive/the-...
10.10.2025 19:32 β π 11 π 6 π¬ 1 π 2
highly recommend!
06.10.2025 18:29 β π 3 π 2 π¬ 0 π 0
Bartz v. Anthropic: A Preliminary Look at What LibGen Books May Be Included in the Class Action
The LibGen Logo For this post, we relied heavily on the help of Charles Horn, self-described βmetadata wrangler,β for data analysis.Β As readers are likely aware, the Bartz v. Anthropic AI lawβ¦
Bartz v. Anthropic has had a couple of major developments. Though the lawsuit was initially brought to address the legality of using copyrighted materials for training AI, the suit now focuses on Anthropicβs storageβwithout training useβof copies of books downloaded from LibGen and PiLiMi.
05.09.2025 13:08 β π 6 π 2 π¬ 0 π 0
Will a Landmark AI Settlement Make Authors Feel Whole?
The remuneration from Bartz v. Anthropic may not provide what writers really want: respect, recognition, and readers
I have updated my in-depth analysis of Bartz v Anthropic to reflect this important and overlooked aspect of the proposed settlement: βIn what may be a rude surprise for authors, partial or full payments for many books may go to publishers rather than authors.β newsletter.dancohen.org/archive/land...
08.09.2025 13:32 β π 14 π 5 π¬ 4 π 2
AI and Libraries, Archives, and Museums, Loosely Coupled
A new framework provides a way for cultural heritage institutions to take advantage of the technology with fewer misgivings, and to serve students, scholars, and the public better
New issue of my newsletter: βAI and Libraries, Archives, and Museums, Loosely Coupled"βA new framework provides a way for cultural heritage institutions to take advantage of the tech with fewer misgivings, and to serve students, scholars, and the public better newsletter.dancohen.org/archive/ai-a...
18.08.2025 21:06 β π 6 π 4 π¬ 1 π 1
A new translation of @espejolento.bsky.socialβ¬'s lesson!
doi.org/10.46430/phe...
Weβre grateful to Javier Cisneros Brito + Alberto Santiago MartΓnez for their translation.
Thank you to @betovargas.github.io⬠+ Marisol Andrade Muñoz for their reviews, and to @giuliataurino.bsky.social for editing.
09.07.2025 14:21 β π 2 π 1 π¬ 0 π 0
A grid of eight bold black icons, some mathematical and programming-related, arranged in a 3Γ3 layout on a white background, with the center-left icon β a bold X inside a red circle β standing out in color. Text says: Wikifunctions lets you explore programming logic, language, and math β without writing a line of code. These simple, introductory functions can get you started.
What does a "function" mean? What does it look like on a Wikimedia project? It might be something that checks leap years, tests for prime numbers, or decodes a cipher. These are small, clear examples that you can experiment with easily on Wikifunctions. π§΅β¬οΈ (1/3)
27.06.2025 14:00 β π 18 π 7 π¬ 1 π 0
Screenshot showing a document page image on the left with corresponding OCR output on the right of the page.
Everyoneβs dropping VLM-based OCR models latelyβ¦
But are they actually better than traditional OCR engines, which output XML for historical docs?
I built OCR Time Machine to test it!
π Upload image + ALTO/PAGE XML
βοΈ Compare outputs side by side
π huggingface.co/spaces/davan...
24.06.2025 17:35 β π 30 π 9 π¬ 2 π 0
The impact of language models on the humanities and vice versa
Nature Computational Science - Many humanists are skeptical of language models and concerned about their effects on universities. However, researchers with a background in the humanities are also...
New this morning, a Comment I contributed to Nature Computational Science on the interaction between large language models and the humanities. π§ͺ π€ #MLSky
rdcu.be/etk07
The link above will be open-access for a month β plus, I'll reply to this post with a link to a permanently open preprint. +
25.06.2025 12:58 β π 165 π 54 π¬ 14 π 8
Corpus Analysis with Voyant Tools
In this lesson, you will learn how to organise a set of texts into a corpus and perform some basic linguistic analysis using the Voyant Tools platform.
A new translation of @espejolento.bsky.socialβ¬βs lesson:
doi.org/10.46430/phe...
Weβre grateful to Javier Cisneros Brito + Alberto Santiago MartΓnez for their translation.
Thank you to @betovargas.github.io + Marisol Andrade MuΓ±oz for their reviews, and to @giuliataurino.bsky.social for editing.
19.06.2025 11:40 β π 2 π 4 π¬ 0 π 0
Image of a historic newspaper with bounding box predictions for "photographs" "headline" "illustration" etc.
Finally documented the Beyond Words dataset from the @librarycongress.bsky.social labs / @bcgl.bsky.social for the BigLAM @hf.co org!
- 3.5K annotated historical newspaper pages
- Bounding boxes + category labels
- Photos, ads, headlines, cartoons & more
08.05.2025 08:41 β π 35 π 11 π¬ 1 π 0
Can you train a performant language model using only openly licensed text?
We are thrilled to announce the Common Pile v0.1, an 8TB dataset of openly licensed and public domain text. We train 7B models for 1T and 2T tokens and match the performance similar models like LLaMA 1 & 2
06.06.2025 19:18 β π 147 π 59 π¬ 2 π 2
Screenshot of first page of the FAccT '25 paper "Algorithms in the Stacks: Investigating automated, for-profit diversity audits in public libraries" by Melanie Walsh, Connor Franklin Rey, Chang Ge, Tina Nowak, and Sabina Tomkins.
Abstract: Algorithmic systems are increasingly being adopted by cultural heritage institutions like libraries. In this study, we investigate U.S. public libraries' adoption of one specific automated tool -- automated collection diversity audits -- which we see as an illuminating case study for broader trends. Typically developed and sold by commercial book distributors, automated diversity audits aim to evaluate how well library collections reflect demographic and thematic diversity. We investigate how these audits function, whether library workers find them useful, and what is at stake when sensitive, normative decisions about representation are outsourced to automated commercial systems. Our analysis draws on an anonymous survey of U.S. public librarians (n=99), interviews with 14 librarians, a sample of purchasing records, and vendor documentation. We find that many library workers view these tools as convenient, time-saving solutions for assessing and diversifying collections under real and increasing constraints. Yet at the same time, the audits often flatten complex identities into standardized categories, fail to reflect local community needs, and further entrench libraries' infrastructural dependence on vendors. We conclude with recommendations for improving collection diversity audits and reflect on the broader implications for public libraries operating at the intersection of AI adoption, escalating anti-DEI backlash, and politically motivated defunding.
Many libraries now use automated tools to measure diversity in their collections.
We examined how these tools work and whether library workers find them useful. A complex case study of libraries navigating automation, DEI, & shrinking public funding.
Our new FAccT paper: arxiv.org/abs/2505.14890
26.05.2025 12:10 β π 81 π 32 π¬ 2 π 1
Paper title "Cultural Evaluations of Vision-Language Models
Have a Lot to Learn from Cultural Theory"
I am excited to announce our latest work π "Cultural Evaluations of Vision-Language Models Have a Lot to Learn from Cultural Theory". We review recent works on culture in VLMs and argue for deeper grounding in cultural theory to enable more inclusive evaluations.
Paper π: arxiv.org/pdf/2505.22793
02.06.2025 10:36 β π 57 π 18 π¬ 3 π 5
Archivists Arenβt Ready for the βVery Onlineβ Era
The challenge: how to catalog and derive meaning from so much digital clutter
"Ardam was confronting a different, somewhat sensitive question about navigating a personβs digital history. When Sontag donated her laptop to the archive, did she realize how much she was giving away?" on archiving digital ephemera and culture -- www.theatlantic.com/culture/arch...
04.06.2025 21:46 β π 14 π 6 π¬ 0 π 0
Can Language Models Represent the Past without Anachronism?
Before researchers can use language models to simulate the past, they need to understand the risk of anachronism. We find that prompting a contemporary model with examples of period prose does not pro...
New preprint from @lauraknelson.bsky.social, @mattwilkens.bsky.social, and myself tests different ways of simulating the past with LLMs. We don't fully answer the title question hereβjust show that simple strategies based on prompting and fine-tuning are insufficient. +
02.05.2025 12:47 β π 180 π 56 π¬ 7 π 3
A watercolor of a thin waterfall in the mountains, and an associated rainbow emerging from the mist
βWhen Information is Networkedβ β My tribute to Clifford Lynch, who sadly passed away last week. Cliff saw before anyone else how digital technology would enable new forms of research and learning, and completely transform the production and dissemination of knowledge
14.04.2025 19:00 β π 16 π 6 π¬ 1 π 1
(*) Project supported by the @nehgov.bsky.social Digital Humanities Advancement Grant via IMLS.
08.04.2025 18:04 β π 0 π 0 π¬ 0 π 0
UVA's Library research center+community lab for practicing interdisciplinary+experimental scholarship around creative+critical techβinformed by digital humanities, spatial tech, SJ, & more
ππ³οΈβπ People > projects.
ππ» Research blog & more at ScholarsLab.org!
UC Berkeley - InterpretAI - Artificial Humanities
Book (40% off code PREORDERS25): https://press.umich.edu/Books/A/Artificial-Humanities3
Archives, DH, Museums, African American studies, C19 and early C20. I am, ashamedly, a curator now too
all opinions my own, not my employer's
www.dorothy-berry.com
@ScholarsLab Director β¨π»ππ³οΈβπ (but personal acct π)
Just+joyful, critical+creative tech+culture
*DIY scholcom=letterpress+zine+blog+code
*Bookadjacent data+making
*Experimental/DH/library futures+community
3 !s in a hoodie
Dr! π³οΈββ§οΈ They/Them
AmandaVisconti.com
Bringing the history of early globalisation and colonialism at the fingertips of researchers and the wider public. More info: https://globalise.huygens.knaw.nl/.
News, resources, and discussions on provenance research.
Neuigkeiten, Ressourcen und Diskussionen zur Provenienzforschung.
Breakthrough AI to solve the world's biggest problems.
βΊ Join us: http://allenai.org/careers
βΊ Get our newsletter: https://share.hsforms.com/1uJkWs5aDRHWhiky3aHooIg3ioxm
Promoting authorship for the public good by supporting authors who write to be read. authorsalliance.org and authorsalliance.substack.com for updates.
Manuscripts & other old things in the digital age (www.kakapitan.com)
Junior Professor in #digitalhumanities at @ecoledeschartes.bsky.social, @psl-univ.bsky.social. Past @ox.ac.uk, @haskoliislands.bsky.social, @ucph.bsky.social.
past: circus performer; historian of science; librarian; chief data officer at NEH.
present: dad; resident scholar at dartmouth; chief technology officer at the library of virginia.
personal account; views solely my own.
https://scottbot.github.io
The AI community building the future!
Trabajo para la organizaciΓ³n sin Γ‘nimo de lucro detrΓ‘s de Wikipedia (@wikimediafoundation.org), pero aquΓ no.
Co-fundadora @RLadiesCDMX π»y candidata a doc @UniLeipzigπ
Librarian @ Swarthmore College. No technology is "here to stay."
Your are not stuck in traffic, YOU are traffic...
https://betovargas.github.io/
Official account of the worldβs largest library. Explore collections & plan a visit. All Library accounts: https://loc.gov/connect
Assistant Professor @ the University of Washington iSchool | formerly an Innovator in Residence @ Library of Congress | essays in WIRED, Gawker, The New Republic, Current Affairs, etc.
π www.bcglee.com
Using computational methods for modelling, integrating, and interpreting cultural heritage. Assistant Professor at iSchool@Illinois | Formerly UNIGE, UZH, iTatti, CNRS | https://nicola.carboni.me
@J_E_Barr at the other place. City of Angels.
#Provenance with a dash of Pleistocene. Collector of images of dealer stamps and stickers. Bad photos of good art & all opinions strictly my own. #jhuprovenance