Library Innovation Lab harvardlil

Replication of Government Datasets and the Principles of Provenance | Library Innovation Lab As part of our Public Data Project, LIL recently launched Data.gov Archive Search. In this post, we consider the importance of provenance for large, replicat...

When might digital provenance matter? Could we imagine it being used to right past wrongs, to return objects to their rightful places, to restore justice?

Our Public Data Project's @mollyhardy.bsky.social reflects on copying gov data & principles of provenance

lil.law.harvard.edu/blog/2025/12...

15.12.2025 12:07 — 👍 6 🔁 2 💬 0 📌 2

If you'd like an informative and interesting conversation to enjoy try "Inside Harvard’s Data.gov Archive – A Conversation with Jack Cushman"

@jed.co interviewing Jack Cushman @harvardlil.bsky.social about Data.gov archiving on @source.coop

Now live www.youtube.com/watch?v=XYMb...

19.11.2025 19:20 — 👍 8 🔁 6 💬 0 📌 0

Our EOT2024 partner @harvardlil.bsky.social was interviewed by @jed.co on @source.coop about archiving government data.

Listen & learn how 300,000+ federal datasets are archived for posterity

Inside Harvard’s Data.gov Archive, A Conversation with Jack Cushman: www.youtube.com/watch?v=XYMb...

19.11.2025 19:28 — 👍 4 🔁 2 💬 0 📌 0

If you missed our conversation with Jack Cushman from @harvardlil.bsky.social‬, you can catch up now. We discussed the Data.gov Archive and the challenge of preventing federal data loss – it's about more than just web pages.

youtube.com/live/XYMbQru...

20.11.2025 21:06 — 👍 5 🔁 5 💬 1 📌 0

Workshop Report: "Resilience in Times of Crisis - Strengthening Open Science Against Geopolitical Pressures" (via Research Group Information Management" at Humboldt-Universität zu Berlin) infomgnt.org/posts/2025-1... @datarescueproject.org @harvardlil.bsky.social #libraries #openscience

22.11.2025 18:26 — 👍 3 🔁 1 💬 0 📌 0

Upcoming Episode: Inside Harvard's data.gov Archive How the Harvard Library Innovation Lab is preserving and making Data.gov datasets discoverable using BagIt and static search.

And join Jack live on @source.coop's Great Data Products podcast on 11/19 to hear how we're making cultural memory collections easier to access and harder to delete. greatdataproducts.com/housekeeping...

31.10.2025 13:43 — 👍 4 🔁 3 💬 0 📌 1

Pioneers and Pathfinders: Jack Cushman Today, we’re joined by Jack Cushman, director of the Harvard Library Innovation Lab, where he and his team are reimagining how library principles can shape the future of legal technology. Jack is a…

Podcast: Jack Cushman joins “Pioneers & Pathfinders” to discuss libraries shaping legal tech, digital preservation, and realities of legal AI. Listen: www.seyfarth.com/news-insight...

31.10.2025 13:40 — 👍 3 🔁 1 💬 1 📌 0

Century-Scale Storage If you had to store something for 100 years, how would you do it?

This series is part of our work investigating not just the technical, but the human and societal routes to long-term digital preservation. You can also check out @maxy.bsky.social's Century-Scale Storage on our website.

lil.law.harvard.edu/century-scal...

15.10.2025 14:50 — 👍 2 🔁 0 💬 0 📌 0

Frank Cifaldi | Library Innovation Lab InterviewerIf you were given unlimited funding to design a system for storing and preserving digital information for at least a century, what would you do?

One answer from Frank Cifaldi who is the Founder and Director of the Video Game History Foundation:

"I would have the best copyright lawyers in the country figuring out how we can actually make this work."

lil.law.harvard.edu/generational...

15.10.2025 14:50 — 👍 2 🔁 0 💬 1 📌 0

Rebecca Frank | Library Innovation Lab InterviewerIf you were given unlimited funding to design a system for storing and preserving digital information for at least a century, what would you do?

One answer from Rebecca Frank who is an Assistant Professor at the University of Michigan School of Information:

"The glib answer is the money itself would be the solution."

lil.law.harvard.edu/generational...

15.10.2025 14:50 — 👍 1 🔁 1 💬 1 📌 0

Amelia Acker | Library Innovation Lab InterviewerIf you were given unlimited funding to design a system for storing and preserving digital information for at least a century, what would you do?

One answer from Amelia Acker who is an Associate Professor in the School of Communication & Information at Rutgers:

"I probably wouldn't build a system. I'd build a bureaucracy."

lil.law.harvard.edu/generational...

15.10.2025 14:50 — 👍 2 🔁 0 💬 1 📌 0

Che-Wei Wang & Taylor Levy | Library Innovation Lab InterviewerIf you were given unlimited funding to design a system for storing and preserving digital information for at least a century, what would you do?

One answer from Che-Wei Wang and Taylor Levy, aka CW&T, who are designers, fabricators, and artists:

"Wait, only a hundred years?"

lil.law.harvard.edu/generational...

15.10.2025 14:50 — 👍 0 🔁 0 💬 1 📌 0

Generational Data Interviews | Library Innovation Lab 14 Designs for Digital Preservation in 2025

LIL fellow @maxy.bsky.social asked 14 scholars, archivists, designers, business leaders & engineers: "If you were given unlimited funding to design a system for storing and preserving digital information for at least a century, what would you do?"

Their answers:
lil.law.harvard.edu/generational...

15.10.2025 14:50 — 👍 12 🔁 6 💬 1 📌 3

Guest Post - Rethinking Disciplinary Data Regimes - The Scholarly Kitchen Between a political policy environment focused on defunding and deleting data collections – an environment in which little can be trusted – and an onslaught of new AI tools that feed indiscriminately ...

Our Public Data Project Lead @mollyhardy.bsky.social writes about public data preservation and how it’s complicated by the artificial contemporary distinction between science and the humanities @scholarlykitchen.bsky.social

08.10.2025 14:57 — 👍 0 🔁 2 💬 0 📌 0

Product and Research Manager

Join our team! LIL is looking for a Product and Research Manager to help create, shape, and execute on our portfolio of open knowledge projects. PRMs work across every piece of the LIL ecosystem, from software experimentation to convening of events. Learn more at careers.harvard.edu/job/product-...

03.07.2025 17:26 — 👍 4 🔁 2 💬 0 📌 1

Vote for the best of the internet I just voted in The Webby People's Voice Awards and checked my voter registration.

Thrilled to share that @maxy.bsky.social's "Century-Scale Storage" was nominated for a Webby Award!

You can vote in the "Best Individual Editorial Feature" category here: vote.webbyawards.com/PublicVoting...

01.04.2025 20:59 — 👍 12 🔁 4 💬 1 📌 1

Using AI to Accelerate Digitization at Boston Public Librarys Today, as part of our mission expansion, we’re announcing a collaboration with BPL to develop AI-driven tools capable of accelerating new digitization of large collections at libraries across the worl...

As the @institutionaldatainitiative.org expands its mission, we’re announcing a collaboration with @bpl.boston.gov to develop AI-driven tools capable of accelerating new digitization at libraries across the world, starting at the Boston Public Library. institutionaldatainitiative.org/posts/using-...

12.03.2025 13:23 — 👍 18 🔁 10 💬 1 📌 1

Expanding Our Mission: An Open Call for Collaborators Today, we’re pleased to announce an open call for institutional collaborators as new support expands the research capacity of the Institutional Data Initiative.

I'm pleased to announce we're expanding our mission at the @institutionaldatainitiative.org with an open call for institutional collaborators, new digitization at Harvard Law School Library, and additional support to advance this work. institutionaldatainitiative.org/posts/open-c...

05.03.2025 15:36 — 👍 11 🔁 9 💬 1 📌 0

Bagging data.gov

Ed Summers at Stanford wrote this great deep dive of how and why we designed our data.gov archiver the way we did. Thanks for digging in, Ed, this is excellent. inkdroid.org/2025/02/17/n...

19.02.2025 19:16 — 👍 18 🔁 6 💬 1 📌 0

Announcing the Data.gov Archive | Library Innovation Lab Today we released our archive of data.gov on Source Cooperative. The 16TB collection includes over 311,000 datasets harvested during 2024 and 2025, a complet...

We just launched a 16TB archive of every dataset that has been available on data.gov since November. This will be updated day by day as new datasets appear. It can be freely copied, and we're sharing the code behind it to help others make their own archives of data they depend on.

06.02.2025 21:23 — 👍 1904 🔁 1002 💬 43 📌 66

Data Rescue Efforts Data / Website Rescue Efforts End of Term Crawl - The main coordinated effort to archive websites, but datasets have been more of a challenge. EDGI - They have been focused on environmental data. A ...

Penn is getting a lot of questions about Data Refuge. That effort no longer exists, but several efforts are currently active. I've created a doc from what I & others have suggested. I'll update as I hear more. Feel free to share or suggest: docs.google.com/document/d/1...

03.02.2025 16:13 — 👍 78 🔁 45 💬 3 📌 5

What are we all missing? Anything you can't get by clicking from link to link like EOT, or downloading datasets directly from data.gov. If there's things you care about preserving that fit that description, that's where to focus.

31.01.2025 21:11 — 👍 11 🔁 3 💬 1 📌 0

End of Term Web Archive The End of Term Web Archive is a collaborative initiative that collects, preserves, and makes accessible United States Government websites at the end of presidential administrations.

Another limited but vital effort is @eotarchive.org. They collect a huge amount from .gov domains every four years, and make it discoverable through @archive.org

31.01.2025 21:09 — 👍 20 🔁 6 💬 0 📌 0

Data.gov Home - Data.gov

Our collection from data.gov is limited: if an entry points directly to the data, such as a csv, we have the data. If it points to an html landing page, we just have the landing page. This means many, many datasets are not included. What we have from data.gov adds up to 15 or 20TB.

31.01.2025 21:07 — 👍 19 🔁 3 💬 1 📌 1

Data.gov Home - Data.gov

Speaking of telling someone, here’s our update: we have copies of all metadata from data.gov, and all of the dataset URLs it points to (shallow crawl); all federal Github repositories with issues, comments, etc.; and articles from PubMed.

31.01.2025 21:06 — 👍 17 🔁 1 💬 1 📌 0

Internet Archive: Digital Library of Free & Borrowable Texts, Movies, Music & Wayback Machine

Third, tell someone. Archive.org is one good place to store public data for discovery, and we at LIL will consider storing and signing data in some cases as well. Just posting data somewhere search engines can find is good too.

31.01.2025 21:05 — 👍 14 🔁 3 💬 0 📌 0

FOIA requests are another great way to scale up — check out @muckrock.com to get started.

31.01.2025 21:04 — 👍 11 🔁 2 💬 0 📌 0

Next, scale up. If you’re a programmer (or can team up with one), write a python script to download a full collection — say, everything from the data portal of a given government website. Run it yourself, and share it so we libraries can use it too.

31.01.2025 21:03 — 👍 12 🔁 3 💬 0 📌 0

If you’re a data scientist, good news — your work isn't just downloading data and publishing about it, but also keeping safe copies!

31.01.2025 21:03 — 👍 10 🔁 0 💬 0 📌 0

ArchiveWeb.page

To keep access to stuff you care about: first just make a copy. Use ArchiveWeb.page to click around and download all the parts of a website you’re interested in. We like the desktop version to avoid capturing login cookies or extensions, but the browser extension is good too.

31.01.2025 21:02 — 👍 13 🔁 5 💬 0 📌 0

Posts by Library Innovation Lab (@harvardlil.bsky.social)