Ben Lee's Avatar

Ben Lee

@bcgl.bsky.social

Assistant Professor @ the University of Washington iSchool | formerly an Innovator in Residence @ Library of Congress | essays in WIRED, Gawker, The New Republic, Longreads, Current Affairs, etc. 🌐 www.bcglee.com

1,998 Followers  |  814 Following  |  74 Posts  |  Joined: 01.09.2023  |  2.0736

Latest posts by bcgl.bsky.social on Bluesky

Preview
CNI Fall 2025 Membership Meeting - YouTube CNI Fall 2025 Membership Meeting videos at the Hyatt Regency on Capitol Hill Washington, DC. Learn more at: https://www.cni.org/mm/fall-2025

The plenary videos from the CNI meeting are now available:
🀝Shaping CNI’s Future Together
πŸ“šA Landscape of AI in Libraries @bcgl.bsky.social
πŸ†AUPresses Stand UP Award
@brettbobley.bsky.social @aupresses.bsky.social
πŸ’°The State of Funding for US Higher Ed, Science, and Technology in a Time of Change

19.12.2025 18:21 β€” πŸ‘ 7    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Post image

Publication day! My article on how to read an 18C newspapers, on digital remediation, and on the unfree press out in the world. Thanks @andy-schocket.bsky.social @historymatterssyd.bsky.social & M. Karrs for making it all possible. @universitypress.cambridge.org
lnkd.in/eEwbdbSw

17.12.2025 15:54 β€” πŸ‘ 10    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0

Figured I’d wait to post anything for this, but I can’t tell you how excited I am for us to have the opportunity to be funded to do this work. We really do believe that our project connecting communities, digital archives of the early Black press, and human-AI systems will have a large impact

12.12.2025 21:39 β€” πŸ‘ 4    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0

I really appreciate your kind words!

11.12.2025 22:55 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

For "Communities in the Loop: AI for Cultures & Contexts in Multimodal Archives," congrats to: Jim Casey, Christopher Dancy, @snblickhan.bsky.social, Tiffany Smith, Benjamin Lee. (And I must also include @profgabrielle.bsky.social !)

11.12.2025 17:32 β€” πŸ‘ 7    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

Seconding all of this β€” so much fun not work with so many people I deeply admire, and we can’t wait to share out as our project progresses!

11.12.2025 16:13 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I can confirm, thinking with this crew is honestly wonderful, 10/10, Dream Team experience. I have more to say but a meeting to facilitate! More soon.

11.12.2025 15:59 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Honored to be giving an opening plenary talk on AI & libraries later today at CNI 2025! I’m excited for the conversation with Kate Zwaard and for the full program by @cni-org.bsky.social!

www.cni.org/events/membe...

11.12.2025 16:10 β€” πŸ‘ 8    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Schmidt Sciences awards $750,000 to UCSB-led team to transform Black press archives with AI | Division of Humanities and Fine Arts English professor Jim Casey leads a national coalition to recover 19th-century African American newspapers using machine learning and public crowdsourcing.

I am beyond thrilled to share some good news:

1. I've moved to a new job at UC Santa Barbara.

2. We have been selected for a 2025 Humanities and AI Virtual
Institute award from @schmidtsciences.bsky.social!

Both, we hope, will allow us to continue building the work! +
hfa.ucsb.edu/news/schmidt...

11.12.2025 15:21 β€” πŸ‘ 62    πŸ” 8    πŸ’¬ 12    πŸ“Œ 2
Preview
Guest Post: GovScape: A Public Search System for 10+ Million Government PDFs This week's guest post is from Benjamin Charles Germain Lee, Assistant Professor at the University of Washington, and Kyle Deeds, Assistant Professor at Boston University. Learn more about their recen...

Guest Post on GovScape. It is a great example of how we can build on each other's systems and work to enable access to government information! Follow @govscape.bsky.social for more!

www.datarescueproject.org/guest-post-g...

02.12.2025 16:44 β€” πŸ‘ 15    πŸ” 5    πŸ’¬ 0    πŸ“Œ 2

Thank you to @datarescueproject.org for publishing this blog post by @kdeeds.bsky.social and myself on GovScape! Extremely grateful to @datarescueproject.org for all their incredible work!

02.12.2025 17:40 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Very excited to have a software paper with @yh-huang.bsky.social in the CHR journal on the Digital Collections Explorer, our open-source multimodal viewer for digital collections!

02.12.2025 17:36 β€” πŸ‘ 10    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

New Research Tool: GovScape (US Gov PDFs)
govscape.net ||| Research Paper (preprint) About #GovScape arxiv.org/abs/2511.11010 #govdocs @eotarchive.org

19.11.2025 14:20 β€” πŸ‘ 7    πŸ” 6    πŸ’¬ 0    πŸ“Œ 0

Anyone interested in govt transparency and public access should check out GovScape from @bcgl.bsky.social and his teamπŸ‘‡

It's an incredibly powerful tool that allows visual, semantic text, and keywords search of 10 million U.S. government PDFs (70 million pages!) and counting: www.govscape.net

19.11.2025 18:24 β€” πŸ‘ 51    πŸ” 26    πŸ’¬ 1    πŸ“Œ 1

Thanks so much! Truly appreciate it!

19.11.2025 19:09 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Thanks so much!

19.11.2025 03:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

We’re live! Search 10 million+ U.S. government PDFs (70 million pages)! GovScape offers visual search, semantic text search, and keyword search. Explore below:

Website: govscape.net
ArXiv link: arxiv.org/abs/2511.11010

18.11.2025 21:16 β€” πŸ‘ 17    πŸ” 5    πŸ’¬ 0    πŸ“Œ 1

Huge step forward in enabling access and use of content archived from government websites!

18.11.2025 20:27 β€” πŸ‘ 11    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

7/ Lastly, we’d love to hear your feedback on GovScape at bcgl@uw.edu! For more updates on GovScape, follow: @govscape.bsky.social

18.11.2025 20:19 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

7/ A particular thank-you to @kdeeds.bsky.social for leading this project with me and for making this possible! And to @yh-huang.bsky.social, who did an incredible job with the front-end and dev-ops!

18.11.2025 20:19 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

6/ GovScape is the result of a multidisciplinary collaboration, co-led by myself and @kdeeds.bsky.social. We’re enormously grateful to the team: Ying-Hsiang Huang, Claire Gong, Shreya Shaji, Alison Yan, Leslie Harka, @tjowens.bsky.social, @vphill.bsky.social, @shannonshen.bsky.social, and SJ Klein!

18.11.2025 20:19 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
GovScape: A Tutorial Video
YouTube video by GovScape GovScape: A Tutorial Video

5/ Interested in learning more? Visit GovScape at: www.govscape.net – try some searches and read the FAQ! You can also watch a demo video here: www.youtube.com/watch?v=mNda...

18.11.2025 20:19 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
A visual search for "redacted documents" showing a number of documents with heavy redactions.

A visual search for "redacted documents" showing a number of documents with heavy redactions.

4/ What does visual search do? Here’s a visual search for β€œredacted documents”

18.11.2025 20:19 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
A diagram showing the GovScape architecture, including the client, server, and databases.

A diagram showing the GovScape architecture, including the client, server, and databases.

3/ The full GovScape architecture is detailed in this figure, showing how the client interacts with the server, DBs, and indices. We utilize FAISS for text embeddings and for CLIP embeddings, and SQLite FTS5 for keyword indexing.

18.11.2025 20:19 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
An diagram showing the GovScape PDF pre-processing pipeline, including PDF identification, rendering, and semantification (including embedding generation).

An diagram showing the GovScape PDF pre-processing pipeline, including PDF identification, rendering, and semantification (including embedding generation).

2/ The pre-processing pipeline ingests PDFs, renders them, generates CLIP and BGE embeddings of individual pages, and indexes the text. The total compute cost for GovScape's pre-processing pipeline for 10 million PDFs was approximately $1,500. Our code is available at: github.com/bcglee/govsc....

18.11.2025 20:19 β€” πŸ‘ 8    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
A diagram showing the three central query methods within GovScape: semantic text search, visual search, and keyword search.

A diagram showing the three central query methods within GovScape: semantic text search, visual search, and keyword search.

2/ GovScape is built on top of the End of Term Web Archive (eotarchive.org) and currently contains all renderable PDFs (50 pages or fewer) from the 2020 crawl, documenting the first Trump administration. An overview of GovScape’s search functionality can be found in this diagram.

18.11.2025 20:19 β€” πŸ‘ 3    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0

1/ Announcing GovScape – a public search system for 10 million U.S. government PDFs (70 million pages)! GovScape offers visual search, semantic text search, and keyword search. Explore below:

Website: www.govscape.net
ArXiv link: arxiv.org/abs/2511.11010

18.11.2025 20:19 β€” πŸ‘ 79    πŸ” 35    πŸ’¬ 3    πŸ“Œ 4
Preview
Uncanny Testimony - Longreads As the last Holocaust survivors approach the end of their lives, an AI scholar grapples with technology that promises to freeze them in time.

Assistant Professor Ben Lee @bcgl.bsky.social's essay on the topic of AI and Holocaust Memory, β€œUncanny Testimony,” was published by @longreads.com: longreads.com/2025/09/25/a...

16.10.2025 22:49 β€” πŸ‘ 6    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Thanks so much, I appreciate it!

11.10.2025 21:37 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Thanks so much for your kind words - I really appreciate them, and I'm glad that the piece resonated with you! And thank you for your work, too, in having volunteered as a docent and being the family archivist!

11.10.2025 21:37 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@bcgl is following 20 prominent accounts