nitasha tiku's Avatar

nitasha tiku

@nitasha.bsky.social

technology mother @ the washington post. baddie in the digital badlands. signal: nitasha.10

42,443 Followers  |  1,761 Following  |  158 Posts  |  Joined: 11.04.2023  |  2.3723

Latest posts by nitasha.bsky.social on Bluesky

Big news that I'm delighted to no longer be keeping a secret: Next week I join @wired.com as a senior writer!

As everyone knows, this is a top-tier publication and indispensable record of digital life today. I'll be covering internet culture in all forms, as I have for almost 15 years now.

28.01.2026 20:22 β€” πŸ‘ 2384    πŸ” 115    πŸ’¬ 257    πŸ“Œ 26

oh this will be so fun. congrats miles!!

28.01.2026 20:52 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

It made a starter pack. Lack of inclusion is not a judgment against you and is only an indicator doll's mental ram ran out of room. It can add you on request.

go.bsky.app/L71zwey

21.01.2026 00:27 β€” πŸ‘ 141    πŸ” 16    πŸ’¬ 37    πŸ“Œ 20

ty for this!!

28.01.2026 20:48 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Inside an AI start-up’s plan to scan and dispose of millions of books Court filings reveal how AI companies raced to obtain more books to feed chatbots, including by buying, scanning and disposing of millions of titles.

Absolutely damning from @aaronschaffer.com, @willoremus.com, & @nitasha.bsky.social.

To get more data, Anthropic:
* "destructively scanned" millions of books
* downloaded the shadow library LibGen
* hailed another shadow library's arrival as "just in time!!!"

www.washingtonpost.com/technology/2...

28.01.2026 13:49 β€” πŸ‘ 278    πŸ” 149    πŸ’¬ 5    πŸ“Œ 42

use @alt-text.bsky.social bot

28.01.2026 09:03 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

alt-text added! tyy for the gentle nudge.

28.01.2026 06:34 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Slide from Anthropic copyright case. It says:
Why is Paying for New Data Important at all?

1. To date, most of our data is trained on web crawled data. However, there is a limit on how much, and in what categories/useful capacities, data can be crawled online
2. New data is always better. Repeating data is bad for model efficiency and intelligence
3. Not all types of new data are created equal. We hypothesize that adding new data can improve the efficiency of our models (via compute multipliers, or 'CM's') by different levels depending on the type - this means that we get a higher return out of the same level of training spend with the addition of new data
4. Different types of data improves our models in different capacities (e.g. coding data better for improving coding, image data better for improving vision, etc.)
5. Unique, differentiated data could give us a competitive edge from others who don't have access to the same data

Slide from Anthropic copyright case. It says: Why is Paying for New Data Important at all? 1. To date, most of our data is trained on web crawled data. However, there is a limit on how much, and in what categories/useful capacities, data can be crawled online 2. New data is always better. Repeating data is bad for model efficiency and intelligence 3. Not all types of new data are created equal. We hypothesize that adding new data can improve the efficiency of our models (via compute multipliers, or 'CM's') by different levels depending on the type - this means that we get a higher return out of the same level of training spend with the addition of new data 4. Different types of data improves our models in different capacities (e.g. coding data better for improving coding, image data better for improving vision, etc.) 5. Unique, differentiated data could give us a competitive edge from others who don't have access to the same data

Slide from Anthropic copyright case. From a company meeting in 2021
New Canonical Text Dataset: Construction

No repeats - fuzzy dedup, max 1 epoch per dataset
Libgen - now 200GB of books total, 2x GPT-3
Filtered Common Crawl - 50% of final dataset

Seems similar quality to webtext
Currently filtering 20x, could filter another 100x if helpful
Diversity padding: patents, PubMed, FreeLaw, StackExchange, PhilPapers, Ubuntu IRC, Math, etc
Gory details

Slide from Anthropic copyright case. From a company meeting in 2021 New Canonical Text Dataset: Construction No repeats - fuzzy dedup, max 1 epoch per dataset Libgen - now 200GB of books total, 2x GPT-3 Filtered Common Crawl - 50% of final dataset Seems similar quality to webtext Currently filtering 20x, could filter another 100x if helpful Diversity padding: patents, PubMed, FreeLaw, StackExchange, PhilPapers, Ubuntu IRC, Math, etc Gory details

slide from anthropic copyright case
From: Dario Amodei (Google Docs) comments-noreply@docs.google.com
Sent: Sunday, January 8, 2023 10:00 PM
To: jared@anthropic.com
Subject: Anthropic Plan fo... - I think we need to have more specifi...

Dario Amodei resolved a comment in the following document Anthropic Plan for 2023
(https://docs.google.com/document/d/[redacted]
1 resolved
Comments
Neerav Kingsland
| buy books and scientific papers.
I think we need to have more specific approach here? Others might have a clearer vision but I don't know what our strategy is...
Dario Amodei
I think we have many places from which we could in principle buy these things, but it's a huge legal/practice/business slog.
Neerav Kingsland
Agree. Just naming I still have FUD here. Maybe we could be more aggressive if important?
Jared Kaplan
Two places that sound promising are (1) buying [redacted] from ProQuest -- we're negotiating this now and (2) Springer all access may just give us all books & papers from Springer. But overall this is a hodgepodge.
Dario Amodei
Marked as resolved

slide from anthropic copyright case From: Dario Amodei (Google Docs) comments-noreply@docs.google.com Sent: Sunday, January 8, 2023 10:00 PM To: jared@anthropic.com Subject: Anthropic Plan fo... - I think we need to have more specifi... Dario Amodei resolved a comment in the following document Anthropic Plan for 2023 (https://docs.google.com/document/d/[redacted] 1 resolved Comments Neerav Kingsland | buy books and scientific papers. I think we need to have more specific approach here? Others might have a clearer vision but I don't know what our strategy is... Dario Amodei I think we have many places from which we could in principle buy these things, but it's a huge legal/practice/business slog. Neerav Kingsland Agree. Just naming I still have FUD here. Maybe we could be more aggressive if important? Jared Kaplan Two places that sound promising are (1) buying [redacted] from ProQuest -- we're negotiating this now and (2) Springer all access may just give us all books & papers from Springer. But overall this is a hodgepodge. Dario Amodei Marked as resolved

28.01.2026 06:28 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
slide from Anthropic copyright lawsuit that says:
ATTORNEY CLIENT PRIVILEGED WORK PRODUCT
Project Panama
Owner: Tom Turvey
Last major update: Apr 13, 2024
What is Project Panama?
Project Panama is our effort to destructively scan all the books in the world.
Why use a codename?
We use a "soft codename" for it because we don't want it to be known that we are working on this. This document is visible to all Anthropic employees, but you should avoid talking about it in public areas, and the fact that we are working on this should not be shared with anyone outside Anthropic.

slide from Anthropic copyright lawsuit that says: ATTORNEY CLIENT PRIVILEGED WORK PRODUCT Project Panama Owner: Tom Turvey Last major update: Apr 13, 2024 What is Project Panama? Project Panama is our effort to destructively scan all the books in the world. Why use a codename? We use a "soft codename" for it because we don't want it to be known that we are working on this. This document is visible to all Anthropic employees, but you should avoid talking about it in public areas, and the fact that we are working on this should not be shared with anyone outside Anthropic.

slide from Anthropic copyright case that says:
Why is Paying for New Data Important at all?

To date, most of our data is trained on web crawled data. However, there is a limit on how much, and in what categories/useful capacities, data can be crawled online
New data is always better. Repeating data is bad for model efficiency and intelligence
Not all types of new data are created equal. We hypothesize that adding new data can improve the efficiency of our models (via compute multipliers, or 'CM's') by different levels depending on the type - this means that we get a higher return out of the same level of training spend with the addition of new data
Different types of data improves our models in different capacities (e.g. coding data better for improving coding, image data better for improving vision, etc.)
Unique, differentiated data could give us a competitive edge from others who don't have access to the same data

slide from Anthropic copyright case that says: Why is Paying for New Data Important at all? To date, most of our data is trained on web crawled data. However, there is a limit on how much, and in what categories/useful capacities, data can be crawled online New data is always better. Repeating data is bad for model efficiency and intelligence Not all types of new data are created equal. We hypothesize that adding new data can improve the efficiency of our models (via compute multipliers, or 'CM's') by different levels depending on the type - this means that we get a higher return out of the same level of training spend with the addition of new data Different types of data improves our models in different capacities (e.g. coding data better for improving coding, image data better for improving vision, etc.) Unique, differentiated data could give us a competitive edge from others who don't have access to the same data

slide from Anthropic copyright case that says: 
Beyond the data that is widely available on the web, the highest volume of useful data is available to us via published books. The team has previously categorized different types of data and determined that for the most part, other types of data are either not that useful to us or not available in large enough quantities to move the needle on our training.
And then has a chart rating Books, Social Media, News, Research Databases, and Financial Information according to 
Conviction in Usefulness and Availability in Large Quantities
Books High to Very High, Very High
Social Media: Low, Medium
News: Low, Medium
Research Databases: Very High, Low
Financial Information: Medium-Low, Medium

slide from Anthropic copyright case that says: Beyond the data that is widely available on the web, the highest volume of useful data is available to us via published books. The team has previously categorized different types of data and determined that for the most part, other types of data are either not that useful to us or not available in large enough quantities to move the needle on our training. And then has a chart rating Books, Social Media, News, Research Databases, and Financial Information according to Conviction in Usefulness and Availability in Large Quantities Books High to Very High, Very High Social Media: Low, Medium News: Low, Medium Research Databases: Very High, Low Financial Information: Medium-Low, Medium

slide from the Anthropic copyright case. It's a chart. Top says "Unique Books in the World ~130 million." Next line "Unique books that we are able to buy." That category is divided into "Books in Print" and "Used Books" under "Books in Print "Not Self-Published" "Self-Published" Under "Used Books" it says: List Price. next line, Under "Not Self Published" it says "List Price" and "LP"

slide from the Anthropic copyright case. It's a chart. Top says "Unique Books in the World ~130 million." Next line "Unique books that we are able to buy." That category is divided into "Books in Print" and "Used Books" under "Books in Print "Not Self-Published" "Self-Published" Under "Used Books" it says: List Price. next line, Under "Not Self Published" it says "List Price" and "LP"

reposting all the slides w/alt-text

28.01.2026 06:28 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 1    πŸ“Œ 1

The people of Minnesota have executed one of the most impressive civil resistance campaigns I can remember:

- Organized a city wide general strike
- Maintained nonviolent discipline amidst violence
- Mobilized 10,000s in subzero temps to protest and watch ICE
- Flipped public opinion against ICE

26.01.2026 16:17 β€” πŸ‘ 32033    πŸ” 7610    πŸ’¬ 498    πŸ“Œ 375

Interesting piece. I had never considered that Google (who started doing the same thing 20 years ago for Google Books) might have already fed that entire library into Gemini. I wonder if they already have?

27.01.2026 21:30 β€” πŸ‘ 2    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0

the couple times ive seen an ai option for it, i took it. easy to go in and edit

27.01.2026 19:51 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

it's *such* a pain that alt text is so manual; I wish the OS makers would add a metadata field to image files for alt text so we could do it once and have it get carried along when we share and save images. I wish screenreaders could scrape text out of images; most platforms have OCR in now!

27.01.2026 19:16 β€” πŸ‘ 10    πŸ” 1    πŸ’¬ 4    πŸ“Œ 0

yes i think about this all the time. especially considering that datasets everyone uses, like LAION, run on alt-text, which so few people participate in. and it would be good corporate pr if anyone took up the cause

27.01.2026 19:22 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

omg mary i have been feeling so guilty, but every time i do alt-text on multiple social networks and can't port over, it takes a lot of time and im working on 2 other stories & trying not to get laid off lol. i really wish platforms made it easier. i'll reupload w/alt-text tonight

27.01.2026 19:09 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

At least I’ll be getting… (checks notes) $1500 at most for my book they stole that took me two years to write.

27.01.2026 18:54 β€” πŸ‘ 12    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Post image

this exchange about not paying for books from Anthropic CEO Dario Amodei was already reported, but worth revisiting

β€œI think we have man places from which we could in principle buy these things, but it’s a huge legal/practice/business slog”

machines of spine-slicing grace?

27.01.2026 18:52 β€” πŸ‘ 11    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Preview
Inside a tech company’s secretive plan to destroy millions of books Court filings reveal how AI companies raced to obtain more books to feed chatbots, including by buying, scanning and disposing of millions of titles.

more details and sharp analysis from @willoremus.com and @aaronschaffer.com here is a gift link wapo.st/4rjXAMQ

27.01.2026 18:48 β€” πŸ‘ 19    πŸ” 9    πŸ’¬ 1    πŸ“Œ 1
Post image

and just for kicks, from a company meeting in 2021. LibGen is a pirated database

27.01.2026 18:47 β€” πŸ‘ 12    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Post image Post image Post image

from a 2025 draft memo on Anthropic's data acquisition strategy. the heading "Why is Paying for New Data Important at all?" really pops out! also a graph on how valuable books are to LLMs

27.01.2026 18:45 β€” πŸ‘ 12    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Post image

some of my favorite snippets from newly released court docs in the Anthropic copyright book case. eye-opening stuff on Project Panama, their plan to "destructively scan all the books in the world" in order to train AI

27.01.2026 18:44 β€” πŸ‘ 68    πŸ” 42    πŸ’¬ 4    πŸ“Œ 13
As I reviewed photos of protesters and tear gas in the wake of his death, I didn’t realize, in the hours before his name was released to the public, that the man millions of people had seen lying facedown on the pavement from multiple angles of eyewitness video was my childhood best friend.

We have become familiar with being barraged by videos of people we do not know getting detained and ripped from their families and beaten by agents whose salaries we pay. As social media does its work putting bits and pieces together about each day of unfolding tragedy, more and more of us will realize that those pieces belong to someone we know.

As I reviewed photos of protesters and tear gas in the wake of his death, I didn’t realize, in the hours before his name was released to the public, that the man millions of people had seen lying facedown on the pavement from multiple angles of eyewitness video was my childhood best friend. We have become familiar with being barraged by videos of people we do not know getting detained and ripped from their families and beaten by agents whose salaries we pay. As social media does its work putting bits and pieces together about each day of unfolding tragedy, more and more of us will realize that those pieces belong to someone we know.

what a paragraph @kristenradtke.bsky.social www.theverge.com/policy/86856...

27.01.2026 18:15 β€” πŸ‘ 13    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Inside one company’s secret plan to β€˜destructively scan every book in the world’ Court filings reveal how AI companies raced to obtain more books to feed chatbots, including by buying, scanning and disposing of millions of titles.

This story led me to conclude that the rule of law is an illusion clung to only by those who lack sufficient lust for power & money. The method of buying used books, ripping their spines apart & scanning every page turned out to be the more legally sound method www.washingtonpost.com/technology/2...

27.01.2026 13:42 β€” πŸ‘ 29    πŸ” 6    πŸ’¬ 2    πŸ“Œ 1
Preview
How Silicon Valley built AI: Buying, scanning and discarding millions of books Court filings reveal how AI companies raced to obtain more books to feed chatbots, including by buying, scanning and disposing of millions of titles.

β€œThe humans cut apart all the books filled with their knowledge to teach the AI” is absolutely the beginning of a good sci-fi series

www.washingtonpost.com/technology/2...

27.01.2026 14:08 β€” πŸ‘ 45    πŸ” 20    πŸ’¬ 2    πŸ“Œ 1
Post image

There's a predictable approach to denial that we're already seeing, which I call the Four Pillars of Disordered Doubt, which allows actors to constantly question evidence that don't fit their preferred narratives.

27.01.2026 12:09 β€” πŸ‘ 731    πŸ” 312    πŸ’¬ 23    πŸ“Œ 18

i have just gotten off a productive call with sauron where i laid out our requests

- nazgul bodycams
- morgul knife must remain sheathed unless suspect is determined to be carrying the one ring
- shelob will be the new point of contact

27.01.2026 15:15 β€” πŸ‘ 10171    πŸ” 2940    πŸ’¬ 108    πŸ“Œ 70
Preview
How Silicon Valley built AI: Buying, scanning and destroying millions of books Court filings reveal how AI companies raced to obtain more books to feed chatbots, including by buying, scanning and disposing of millions of titles.

New: Unsealed court docs detail Big Tech’s yearslong, secret race to ingest the collective works of humanity, including Anthropic’s project to β€œdestructively scan all the books in the world.”

Gift link: wapo.st/4rjXAMQ

27.01.2026 15:20 β€” πŸ‘ 295    πŸ” 153    πŸ’¬ 8    πŸ“Œ 30
Preview
How Silicon Valley built AI: Buying, scanning and destroying millions of books Court filings reveal how AI companies raced to obtain more books to feed chatbots, including by buying, scanning and disposing of millions of titles.

It's called Project Panama. The goal? "Destructively scan all the books in the world."

The kind of deep reporting with @willoremus.com and @nitasha.bsky.social that I love doing at @washingtonpost.com. wapo.st/4rjXAMQ

27.01.2026 15:52 β€” πŸ‘ 8    πŸ” 3    πŸ’¬ 0    πŸ“Œ 1
Spotlight graph of X discourse about the shooting/killing of a man by ICE agents on January 24 in Minneapolis. Graph showing X posts along time (X axis) and cumulative number of X posts shared by that time (Y axis). Post counts are estimated (by Brandwatch) and include both X posts and reposts where the text contained terms indicating the post was about the *shooting AND the video* during the time period. Individual posts are plotted on the graph, sized by the number of reposts that post received (during the time period). Plotted posts are limited to posts that received >1 reposts.

Two posts are highlighted.

One by @gremloe (1/24/2026, 12:31:31 PM EDT): β€œIt appears from zooming in just moments before ICE/CBP shoot yet another US citizen, one agent removes the victims firearm from his waste holster. The victim was UNARMED when he was shot multiple times. This is a state execution. Again.”

A second by @bennyjohnson (1/24/2026, 1:21:28 PM EDT): β€œTim Walz just a few days ago was climbing on the gate of his mansion urging protesters to keep causing β€œtrouble” and fighting ICE. An armed man just attacked agents and got killed. See how that works? Minnesota officials are fueling this.”

Spotlight graph of X discourse about the shooting/killing of a man by ICE agents on January 24 in Minneapolis. Graph showing X posts along time (X axis) and cumulative number of X posts shared by that time (Y axis). Post counts are estimated (by Brandwatch) and include both X posts and reposts where the text contained terms indicating the post was about the *shooting AND the video* during the time period. Individual posts are plotted on the graph, sized by the number of reposts that post received (during the time period). Plotted posts are limited to posts that received >1 reposts. Two posts are highlighted. One by @gremloe (1/24/2026, 12:31:31 PM EDT): β€œIt appears from zooming in just moments before ICE/CBP shoot yet another US citizen, one agent removes the victims firearm from his waste holster. The victim was UNARMED when he was shot multiple times. This is a state execution. Again.” A second by @bennyjohnson (1/24/2026, 1:21:28 PM EDT): β€œTim Walz just a few days ago was climbing on the gate of his mansion urging protesters to keep causing β€œtrouble” and fighting ICE. An armed man just attacked agents and got killed. See how that works? Minnesota officials are fueling this.”

If you're interested in seeing how framing contests are taking shape after the ICE killing of another person in Minneapolis, here's a window into the conversation on X this morning.
Link to interactive graph: faculty.washington.edu/kstarbi/Spot...
* I put this together quickly. Sorry for any errors

24.01.2026 20:56 β€” πŸ‘ 451    πŸ” 167    πŸ’¬ 12    πŸ“Œ 12
Alex Pretti from his early days working at the VA

Alex Pretti from his early days working at the VA

Alex from our time working together, while he was in nursing school. Later, he moved to ICU, working as a nurse to support critically ill Veterans. He had such a great attitude. We’d chat between patients about trying to get in a mountain bike ride together. Will never happen now

24.01.2026 19:39 β€” πŸ‘ 23783    πŸ” 7904    πŸ’¬ 636    πŸ“Œ 618

@nitasha is following 19 prominent accounts