Giulia Taurino's Avatar

Giulia Taurino

@giuliataurino.bsky.social

AI+Cultural Heritage (OCR, CV, LLMs), Digital Humanities, Archival Science, Media & Cultural Studies. Member of NULab & Turing Institute AI+Arts Group / Editor at The Programming Historian.

105 Followers  |  113 Following  |  9 Posts  |  Joined: 01.10.2024  |  1.8868

Latest posts by giuliataurino.bsky.social on Bluesky

Screenshot that reads: 

Introducing the Anthology for Computers and the Humanities

Taylor Arnold, Maria Antoniak, Miguel Escobar Varela, Marie Puren, Mila Oiva , Amanda Regan, Lauren Tilton, and Melanie Walsh

1 Data Science and Statistics, University of Richmond, U.S.A.
2 Computer Science, University of Colorado Boulder, U.S.A.
3 Faculty of Arts and Social Sciences, National University of Singapore
4 Laboratoire de Recherche de l'EPITA, Paris, France
5 History and Archaeology, University of Turku, Finland
6 History and Geography, Clemson University, U.S.A.
7 Rhetoric and Communication Studies, University of Richmond, U.S.A.
8 Information School, University of Washington, U.S.A.

Permanent Link: https://doi.org/10.63744/HHsQG7hNWyxG

Published: 25 September 2025

Screenshot that reads: Introducing the Anthology for Computers and the Humanities Taylor Arnold, Maria Antoniak, Miguel Escobar Varela, Marie Puren, Mila Oiva , Amanda Regan, Lauren Tilton, and Melanie Walsh 1 Data Science and Statistics, University of Richmond, U.S.A. 2 Computer Science, University of Colorado Boulder, U.S.A. 3 Faculty of Arts and Social Sciences, National University of Singapore 4 Laboratoire de Recherche de l'EPITA, Paris, France 5 History and Archaeology, University of Turku, Finland 6 History and Geography, Clemson University, U.S.A. 7 Rhetoric and Communication Studies, University of Richmond, U.S.A. 8 Information School, University of Washington, U.S.A. Permanent Link: https://doi.org/10.63744/HHsQG7hNWyxG Published: 25 September 2025

As DH grows, it’s increasingly important to publish conference papers, but there hasn’t been a clear venue for that.

So I’m thrilled to share this new home for DH proceedings, which will include CHR papers & more.

Thanks to @taylor-arnold.bsky.social for leading this effort!

bit.ly/ach-anthology

29.10.2025 15:39 β€” πŸ‘ 112    πŸ” 61    πŸ’¬ 6    πŸ“Œ 2
Preview
The Index and the Vector Converting ambiguity into precision can help a broader audience discover and learn from collections

New issue of my newsletter: "The Index and the Vector" β€” Converting ambiguity into precision can help a broader audience discover and learn from collections newsletter.dancohen.org/archive/the-...

20.10.2025 15:29 β€” πŸ‘ 3    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0
Preview
The Library’s New Entryway An interface that combines the advantages of the traditional index with the power of LLMs is the path forward

New issue of my newsletter: β€œThe Library’s New Entryway” β€” An interface that combines the advantages of the traditional index with the power of LLMs is the path forward newsletter.dancohen.org/archive/the-...

10.10.2025 19:32 β€” πŸ‘ 11    πŸ” 6    πŸ’¬ 1    πŸ“Œ 2

highly recommend!

06.10.2025 18:29 β€” πŸ‘ 3    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Preview
Not Every AI Problem Is a Data Problem | Communications of the ACM

dl.acm.org/doi/10.1145/...

26.09.2025 18:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Calvino and the I in the Computer | Jeffrey Schnapp Calvino and the I in the Computer | Jeffrey Schnapp | webpage // blog // log

jeffreyschnapp.com/2025/09/26/c...

26.09.2025 15:48 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Data Quality May Be All You Need – Communications of the ACM

cacm.acm.org/news/data-qu...

15.09.2025 17:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Apertus: a fully open, transparent, multilingual language model EPFL, ETH Zurich and the Swiss National Supercomputing Centre (CSCS) released Apertus 2 September, Switzerland’s first large-scale, open, multilingual language model β€” a milestone in generative AI for...

ethz.ch/en/news-and-...

15.09.2025 16:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Bartz v. Anthropic: A Preliminary Look at What LibGen Books May Be Included in the Class Action The LibGen Logo For this post, we relied heavily on the help of Charles Horn, self-described β€œmetadata wrangler,” for data analysis.Β  As readers are likely aware, the Bartz v. Anthropic AI law…

Bartz v. Anthropic has had a couple of major developments. Though the lawsuit was initially brought to address the legality of using copyrighted materials for training AI, the suit now focuses on Anthropic’s storageβ€”without training useβ€”of copies of books downloaded from LibGen and PiLiMi.

05.09.2025 13:08 β€” πŸ‘ 6    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Preview
The Anthropic Settlement – what it is and isn’t (and who could get paid) www.anthropiccopyrightsettlement.com EDIT: On Sunday evening, Judge Alsup granted the motion for a hearing on Monday, September 8th, but expressed disappointment over lack of details, mostly on the…

Anthropic’s copyright settlement is historic, but it’s also not what many authors and publishers think. Check out our latest on what’s inside the proposed settlement:

08.09.2025 11:10 β€” πŸ‘ 7    πŸ” 8    πŸ’¬ 0    πŸ“Œ 0
Preview
Will a Landmark AI Settlement Make Authors Feel Whole? The remuneration from Bartz v. Anthropic may not provide what writers really want: respect, recognition, and readers

I have updated my in-depth analysis of Bartz v Anthropic to reflect this important and overlooked aspect of the proposed settlement: β€œIn what may be a rude surprise for authors, partial or full payments for many books may go to publishers rather than authors.” newsletter.dancohen.org/archive/land...

08.09.2025 13:32 β€” πŸ‘ 14    πŸ” 5    πŸ’¬ 4    πŸ“Œ 2
Preview
Digital Collections Explorer: An Open-Source, Multimodal Viewer for Searching Digital Collections We present Digital Collections Explorer, a web-based, open-source exploratory search platform that leverages CLIP (Contrastive Language-Image Pre-training) for enhanced visual discovery of digital col...

With @yh-huang.bsky.social, I'm excited to share our Digital Collections Explorer, an open-source, multimodal viewer for digital collections! Users can search with both natural language inputs and reverse image search.

Paper: arxiv.org/abs/2507.00961
Public demo: digital-collections-explorer.com

02.07.2025 20:56 β€” πŸ‘ 76    πŸ” 26    πŸ’¬ 2    πŸ“Œ 3
Preview
AI and Libraries, Archives, and Museums, Loosely Coupled A new framework provides a way for cultural heritage institutions to take advantage of the technology with fewer misgivings, and to serve students, scholars, and the public better

New issue of my newsletter: β€œAI and Libraries, Archives, and Museums, Loosely Coupled"β€”A new framework provides a way for cultural heritage institutions to take advantage of the tech with fewer misgivings, and to serve students, scholars, and the public better newsletter.dancohen.org/archive/ai-a...

18.08.2025 21:06 β€” πŸ‘ 6    πŸ” 4    πŸ’¬ 1    πŸ“Œ 1

A new translation of @espejolento.bsky.social‬'s lesson!

doi.org/10.46430/phe...

We’re grateful to Javier Cisneros Brito + Alberto Santiago MartΓ­nez for their translation.

Thank you to @betovargas.github.io‬ + Marisol Andrade MuΓ±oz for their reviews, and to @giuliataurino.bsky.social for editing.

09.07.2025 14:21 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
A grid of eight bold black icons, some mathematical and programming-related, arranged in a 3Γ—3 layout on a white background, with the center-left icon – a bold X inside a red circle – standing out in color. Text says: Wikifunctions lets you explore programming logic, language, and math – without writing a line of code. These simple, introductory functions can get you started.

A grid of eight bold black icons, some mathematical and programming-related, arranged in a 3Γ—3 layout on a white background, with the center-left icon – a bold X inside a red circle – standing out in color. Text says: Wikifunctions lets you explore programming logic, language, and math – without writing a line of code. These simple, introductory functions can get you started.

What does a "function" mean? What does it look like on a Wikimedia project? It might be something that checks leap years, tests for prime numbers, or decodes a cipher. These are small, clear examples that you can experiment with easily on Wikifunctions. πŸ§΅β¬‡οΈ (1/3)

27.06.2025 14:00 β€” πŸ‘ 18    πŸ” 7    πŸ’¬ 1    πŸ“Œ 0
Screenshot showing a document page image on the left with corresponding OCR output on the right of the page.

Screenshot showing a document page image on the left with corresponding OCR output on the right of the page.

Everyone’s dropping VLM-based OCR models lately…
But are they actually better than traditional OCR engines, which output XML for historical docs?

I built OCR Time Machine to test it!

πŸ“„ Upload image + ALTO/PAGE XML
βš–οΈ Compare outputs side by side
πŸ”— huggingface.co/spaces/davan...

24.06.2025 17:35 β€” πŸ‘ 30    πŸ” 9    πŸ’¬ 2    πŸ“Œ 0
The impact of language models on the humanities and vice versa Nature Computational Science - Many humanists are skeptical of language models and concerned about their effects on universities. However, researchers with a background in the humanities are also...

New this morning, a Comment I contributed to Nature Computational Science on the interaction between large language models and the humanities. πŸ§ͺ πŸ€– #MLSky

rdcu.be/etk07

The link above will be open-access for a month β€” plus, I'll reply to this post with a link to a permanently open preprint. +

25.06.2025 12:58 β€” πŸ‘ 165    πŸ” 54    πŸ’¬ 14    πŸ“Œ 8
Preview
Corpus Analysis with Voyant Tools In this lesson, you will learn how to organise a set of texts into a corpus and perform some basic linguistic analysis using the Voyant Tools platform.

A new translation of @espejolento.bsky.social‬’s lesson:

doi.org/10.46430/phe...

We’re grateful to Javier Cisneros Brito + Alberto Santiago MartΓ­nez for their translation.

Thank you to @betovargas.github.io + Marisol Andrade MuΓ±oz for their reviews, and to @giuliataurino.bsky.social for editing.

19.06.2025 11:40 β€” πŸ‘ 2    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0
Image of a historic newspaper with bounding box predictions for "photographs" "headline" "illustration" etc.

Image of a historic newspaper with bounding box predictions for "photographs" "headline" "illustration" etc.

Finally documented the Beyond Words dataset from the @librarycongress.bsky.social labs / @bcgl.bsky.social for the BigLAM @hf.co org!

- 3.5K annotated historical newspaper pages
- Bounding boxes + category labels
- Photos, ads, headlines, cartoons & more

08.05.2025 08:41 β€” πŸ‘ 35    πŸ” 11    πŸ’¬ 1    πŸ“Œ 0
Post image

Can you train a performant language model using only openly licensed text?

We are thrilled to announce the Common Pile v0.1, an 8TB dataset of openly licensed and public domain text. We train 7B models for 1T and 2T tokens and match the performance similar models like LLaMA 1 & 2

06.06.2025 19:18 β€” πŸ‘ 147    πŸ” 59    πŸ’¬ 2    πŸ“Œ 2
Screenshot of first page of the FAccT '25 paper "Algorithms in the Stacks: Investigating automated, for-profit diversity audits in public libraries" by Melanie Walsh, Connor Franklin Rey, Chang Ge, Tina Nowak, and Sabina Tomkins.

Abstract: Algorithmic systems are increasingly being adopted by cultural heritage institutions like libraries. In this study, we investigate U.S. public libraries' adoption of one specific automated tool -- automated collection diversity audits -- which we see as an illuminating case study for broader trends. Typically developed and sold by commercial book distributors, automated diversity audits aim to evaluate how well library collections reflect demographic and thematic diversity. We investigate how these audits function, whether library workers find them useful, and what is at stake when sensitive, normative decisions about representation are outsourced to automated commercial systems. Our analysis draws on an anonymous survey of U.S. public librarians (n=99), interviews with 14 librarians, a sample of purchasing records, and vendor documentation. We find that many library workers view these tools as convenient, time-saving solutions for assessing and diversifying collections under real and increasing constraints. Yet at the same time, the audits often flatten complex identities into standardized categories, fail to reflect local community needs, and further entrench libraries' infrastructural dependence on vendors. We conclude with recommendations for improving collection diversity audits and reflect on the broader implications for public libraries operating at the intersection of AI adoption, escalating anti-DEI backlash, and politically motivated defunding.

Screenshot of first page of the FAccT '25 paper "Algorithms in the Stacks: Investigating automated, for-profit diversity audits in public libraries" by Melanie Walsh, Connor Franklin Rey, Chang Ge, Tina Nowak, and Sabina Tomkins. Abstract: Algorithmic systems are increasingly being adopted by cultural heritage institutions like libraries. In this study, we investigate U.S. public libraries' adoption of one specific automated tool -- automated collection diversity audits -- which we see as an illuminating case study for broader trends. Typically developed and sold by commercial book distributors, automated diversity audits aim to evaluate how well library collections reflect demographic and thematic diversity. We investigate how these audits function, whether library workers find them useful, and what is at stake when sensitive, normative decisions about representation are outsourced to automated commercial systems. Our analysis draws on an anonymous survey of U.S. public librarians (n=99), interviews with 14 librarians, a sample of purchasing records, and vendor documentation. We find that many library workers view these tools as convenient, time-saving solutions for assessing and diversifying collections under real and increasing constraints. Yet at the same time, the audits often flatten complex identities into standardized categories, fail to reflect local community needs, and further entrench libraries' infrastructural dependence on vendors. We conclude with recommendations for improving collection diversity audits and reflect on the broader implications for public libraries operating at the intersection of AI adoption, escalating anti-DEI backlash, and politically motivated defunding.

Many libraries now use automated tools to measure diversity in their collections.

We examined how these tools work and whether library workers find them useful. A complex case study of libraries navigating automation, DEI, & shrinking public funding.

Our new FAccT paper: arxiv.org/abs/2505.14890

26.05.2025 12:10 β€” πŸ‘ 81    πŸ” 32    πŸ’¬ 2    πŸ“Œ 1
Paper title "Cultural Evaluations of Vision-Language Models
Have a Lot to Learn from Cultural Theory"

Paper title "Cultural Evaluations of Vision-Language Models Have a Lot to Learn from Cultural Theory"

I am excited to announce our latest work πŸŽ‰ "Cultural Evaluations of Vision-Language Models Have a Lot to Learn from Cultural Theory". We review recent works on culture in VLMs and argue for deeper grounding in cultural theory to enable more inclusive evaluations.

Paper πŸ”—: arxiv.org/pdf/2505.22793

02.06.2025 10:36 β€” πŸ‘ 57    πŸ” 18    πŸ’¬ 3    πŸ“Œ 5
Preview
Archivists Aren’t Ready for the β€˜Very Online’ Era The challenge: how to catalog and derive meaning from so much digital clutter

"Ardam was confronting a different, somewhat sensitive question about navigating a person’s digital history. When Sontag donated her laptop to the archive, did she realize how much she was giving away?" on archiving digital ephemera and culture -- www.theatlantic.com/culture/arch...

04.06.2025 21:46 β€” πŸ‘ 14    πŸ” 6    πŸ’¬ 0    πŸ“Œ 0
Preview
Can Language Models Represent the Past without Anachronism? Before researchers can use language models to simulate the past, they need to understand the risk of anachronism. We find that prompting a contemporary model with examples of period prose does not pro...

New preprint from @lauraknelson.bsky.social, @mattwilkens.bsky.social, and myself tests different ways of simulating the past with LLMs. We don't fully answer the title question hereβ€”just show that simple strategies based on prompting and fine-tuning are insufficient. +

02.05.2025 12:47 β€” πŸ‘ 180    πŸ” 56    πŸ’¬ 7    πŸ“Œ 3
A watercolor of a thin waterfall in the mountains, and an associated rainbow emerging from the mist

A watercolor of a thin waterfall in the mountains, and an associated rainbow emerging from the mist

β€œWhen Information is Networked” β€” My tribute to Clifford Lynch, who sadly passed away last week. Cliff saw before anyone else how digital technology would enable new forms of research and learning, and completely transform the production and dissemination of knowledge

14.04.2025 19:00 β€” πŸ‘ 16    πŸ” 6    πŸ’¬ 1    πŸ“Œ 1
Preview
What They Saw: Historical Photobooks by Women, 1843–1999 This pop-up reading room surveys a global history of photobooks by women photographers from the Getty Library.

Remapping the canon: a global history of photobooks by women photographers from the Getty Library. www.getty.edu/exhibitions/...

12.04.2025 18:58 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

(*) Project supported by the @nehgov.bsky.social Digital Humanities Advancement Grant via IMLS.

08.04.2025 18:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Copyright, Privacy, and Public Access in News Archives: a proof of concept on the Boston Globe photograph morgue - AI & SOCIETY Whether supplementing written articles in newspapers or playing a leading role in photo-reporting, photography has achieved an influencial role in the delivery of information and framing of narratives...

I recently co-authored a paper on Copyright, Privacy, and Public Access in News Archives (*). Thanks to @lisejaillant.bsky.social for including our work in AI & Society's Special Issue "When data turns into archives: making digital records more accessible with AI."

link.springer.com/article/10.1...

08.04.2025 18:04 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Preview
Opinion | Photos are disappearing, one archive at a time As photographers try to find homes for their work, archives are vanishing, especially in local journalism.

"Photos are disappearing, one archive at a time."

www.washingtonpost.com/opinions/int...

18.03.2025 01:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
The Secret Apartments Hidden Above Carnegie Libraries Family quarters were built in dozens of New York City branches for custodians with the grueling job of stoking the coal-fired furnaces.

Library as home

www.nytimes.com/2025/03/05/r...

07.03.2025 09:50 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

@giuliataurino is following 20 prominent accounts