Big news that I'm delighted to no longer be keeping a secret: Next week I join @wired.com as a senior writer!
As everyone knows, this is a top-tier publication and indispensable record of digital life today. I'll be covering internet culture in all forms, as I have for almost 15 years now.
28.01.2026 20:22 β π 2384 π 115 π¬ 257 π 26
oh this will be so fun. congrats miles!!
28.01.2026 20:52 β π 4 π 0 π¬ 0 π 0
It made a starter pack. Lack of inclusion is not a judgment against you and is only an indicator doll's mental ram ran out of room. It can add you on request.
go.bsky.app/L71zwey
21.01.2026 00:27 β π 141 π 16 π¬ 37 π 20
ty for this!!
28.01.2026 20:48 β π 1 π 0 π¬ 1 π 0
Inside an AI start-upβs plan to scan and dispose of millions of books
Court filings reveal how AI companies raced to obtain more books to feed chatbots, including by buying, scanning and disposing of millions of titles.
Absolutely damning from @aaronschaffer.com, @willoremus.com, & @nitasha.bsky.social.
To get more data, Anthropic:
* "destructively scanned" millions of books
* downloaded the shadow library LibGen
* hailed another shadow library's arrival as "just in time!!!"
www.washingtonpost.com/technology/2...
28.01.2026 13:49 β π 278 π 149 π¬ 5 π 42
use @alt-text.bsky.social bot
28.01.2026 09:03 β π 1 π 1 π¬ 0 π 0
alt-text added! tyy for the gentle nudge.
28.01.2026 06:34 β π 4 π 0 π¬ 1 π 0
Slide from Anthropic copyright case. It says:
Why is Paying for New Data Important at all?
1. To date, most of our data is trained on web crawled data. However, there is a limit on how much, and in what categories/useful capacities, data can be crawled online
2. New data is always better. Repeating data is bad for model efficiency and intelligence
3. Not all types of new data are created equal. We hypothesize that adding new data can improve the efficiency of our models (via compute multipliers, or 'CM's') by different levels depending on the type - this means that we get a higher return out of the same level of training spend with the addition of new data
4. Different types of data improves our models in different capacities (e.g. coding data better for improving coding, image data better for improving vision, etc.)
5. Unique, differentiated data could give us a competitive edge from others who don't have access to the same data
Slide from Anthropic copyright case. From a company meeting in 2021
New Canonical Text Dataset: Construction
No repeats - fuzzy dedup, max 1 epoch per dataset
Libgen - now 200GB of books total, 2x GPT-3
Filtered Common Crawl - 50% of final dataset
Seems similar quality to webtext
Currently filtering 20x, could filter another 100x if helpful
Diversity padding: patents, PubMed, FreeLaw, StackExchange, PhilPapers, Ubuntu IRC, Math, etc
Gory details
slide from anthropic copyright case
From: Dario Amodei (Google Docs) comments-noreply@docs.google.com
Sent: Sunday, January 8, 2023 10:00 PM
To: jared@anthropic.com
Subject: Anthropic Plan fo... - I think we need to have more specifi...
Dario Amodei resolved a comment in the following document Anthropic Plan for 2023
(https://docs.google.com/document/d/[redacted]
1 resolved
Comments
Neerav Kingsland
| buy books and scientific papers.
I think we need to have more specific approach here? Others might have a clearer vision but I don't know what our strategy is...
Dario Amodei
I think we have many places from which we could in principle buy these things, but it's a huge legal/practice/business slog.
Neerav Kingsland
Agree. Just naming I still have FUD here. Maybe we could be more aggressive if important?
Jared Kaplan
Two places that sound promising are (1) buying [redacted] from ProQuest -- we're negotiating this now and (2) Springer all access may just give us all books & papers from Springer. But overall this is a hodgepodge.
Dario Amodei
Marked as resolved
28.01.2026 06:28 β π 5 π 0 π¬ 0 π 0
slide from Anthropic copyright lawsuit that says:
ATTORNEY CLIENT PRIVILEGED WORK PRODUCT
Project Panama
Owner: Tom Turvey
Last major update: Apr 13, 2024
What is Project Panama?
Project Panama is our effort to destructively scan all the books in the world.
Why use a codename?
We use a "soft codename" for it because we don't want it to be known that we are working on this. This document is visible to all Anthropic employees, but you should avoid talking about it in public areas, and the fact that we are working on this should not be shared with anyone outside Anthropic.
slide from Anthropic copyright case that says:
Why is Paying for New Data Important at all?
To date, most of our data is trained on web crawled data. However, there is a limit on how much, and in what categories/useful capacities, data can be crawled online
New data is always better. Repeating data is bad for model efficiency and intelligence
Not all types of new data are created equal. We hypothesize that adding new data can improve the efficiency of our models (via compute multipliers, or 'CM's') by different levels depending on the type - this means that we get a higher return out of the same level of training spend with the addition of new data
Different types of data improves our models in different capacities (e.g. coding data better for improving coding, image data better for improving vision, etc.)
Unique, differentiated data could give us a competitive edge from others who don't have access to the same data
slide from Anthropic copyright case that says:
Beyond the data that is widely available on the web, the highest volume of useful data is available to us via published books. The team has previously categorized different types of data and determined that for the most part, other types of data are either not that useful to us or not available in large enough quantities to move the needle on our training.
And then has a chart rating Books, Social Media, News, Research Databases, and Financial Information according to
Conviction in Usefulness and Availability in Large Quantities
Books High to Very High, Very High
Social Media: Low, Medium
News: Low, Medium
Research Databases: Very High, Low
Financial Information: Medium-Low, Medium
slide from the Anthropic copyright case. It's a chart. Top says "Unique Books in the World ~130 million." Next line "Unique books that we are able to buy." That category is divided into "Books in Print" and "Used Books" under "Books in Print "Not Self-Published" "Self-Published" Under "Used Books" it says: List Price. next line, Under "Not Self Published" it says "List Price" and "LP"
reposting all the slides w/alt-text
28.01.2026 06:28 β π 5 π 1 π¬ 1 π 1
The people of Minnesota have executed one of the most impressive civil resistance campaigns I can remember:
- Organized a city wide general strike
- Maintained nonviolent discipline amidst violence
- Mobilized 10,000s in subzero temps to protest and watch ICE
- Flipped public opinion against ICE
26.01.2026 16:17 β π 32033 π 7610 π¬ 498 π 375
Interesting piece. I had never considered that Google (who started doing the same thing 20 years ago for Google Books) might have already fed that entire library into Gemini. I wonder if they already have?
27.01.2026 21:30 β π 2 π 3 π¬ 1 π 0
the couple times ive seen an ai option for it, i took it. easy to go in and edit
27.01.2026 19:51 β π 2 π 0 π¬ 1 π 0
it's *such* a pain that alt text is so manual; I wish the OS makers would add a metadata field to image files for alt text so we could do it once and have it get carried along when we share and save images. I wish screenreaders could scrape text out of images; most platforms have OCR in now!
27.01.2026 19:16 β π 10 π 1 π¬ 4 π 0
yes i think about this all the time. especially considering that datasets everyone uses, like LAION, run on alt-text, which so few people participate in. and it would be good corporate pr if anyone took up the cause
27.01.2026 19:22 β π 2 π 0 π¬ 1 π 0
omg mary i have been feeling so guilty, but every time i do alt-text on multiple social networks and can't port over, it takes a lot of time and im working on 2 other stories & trying not to get laid off lol. i really wish platforms made it easier. i'll reupload w/alt-text tonight
27.01.2026 19:09 β π 9 π 0 π¬ 2 π 0
At least Iβll be gettingβ¦ (checks notes) $1500 at most for my book they stole that took me two years to write.
27.01.2026 18:54 β π 12 π 3 π¬ 1 π 0
this exchange about not paying for books from Anthropic CEO Dario Amodei was already reported, but worth revisiting
βI think we have man places from which we could in principle buy these things, but itβs a huge legal/practice/business slogβ
machines of spine-slicing grace?
27.01.2026 18:52 β π 11 π 3 π¬ 1 π 0
and just for kicks, from a company meeting in 2021. LibGen is a pirated database
27.01.2026 18:47 β π 12 π 3 π¬ 1 π 0
some of my favorite snippets from newly released court docs in the Anthropic copyright book case. eye-opening stuff on Project Panama, their plan to "destructively scan all the books in the world" in order to train AI
27.01.2026 18:44 β π 68 π 42 π¬ 4 π 13
As I reviewed photos of protesters and tear gas in the wake of his death, I didnβt realize, in the hours before his name was released to the public, that the man millions of people had seen lying facedown on the pavement from multiple angles of eyewitness video was my childhood best friend.
We have become familiar with being barraged by videos of people we do not know getting detained and ripped from their families and beaten by agents whose salaries we pay. As social media does its work putting bits and pieces together about each day of unfolding tragedy, more and more of us will realize that those pieces belong to someone we know.
what a paragraph @kristenradtke.bsky.social www.theverge.com/policy/86856...
27.01.2026 18:15 β π 13 π 0 π¬ 1 π 0
Inside one companyβs secret plan to βdestructively scan every book in the worldβ
Court filings reveal how AI companies raced to obtain more books to feed chatbots, including by buying, scanning and disposing of millions of titles.
This story led me to conclude that the rule of law is an illusion clung to only by those who lack sufficient lust for power & money. The method of buying used books, ripping their spines apart & scanning every page turned out to be the more legally sound method www.washingtonpost.com/technology/2...
27.01.2026 13:42 β π 29 π 6 π¬ 2 π 1
There's a predictable approach to denial that we're already seeing, which I call the Four Pillars of Disordered Doubt, which allows actors to constantly question evidence that don't fit their preferred narratives.
27.01.2026 12:09 β π 731 π 312 π¬ 23 π 18
i have just gotten off a productive call with sauron where i laid out our requests
- nazgul bodycams
- morgul knife must remain sheathed unless suspect is determined to be carrying the one ring
- shelob will be the new point of contact
27.01.2026 15:15 β π 10171 π 2940 π¬ 108 π 70
How Silicon Valley built AI: Buying, scanning and destroying millions of books
Court filings reveal how AI companies raced to obtain more books to feed chatbots, including by buying, scanning and disposing of millions of titles.
New: Unsealed court docs detail Big Techβs yearslong, secret race to ingest the collective works of humanity, including Anthropicβs project to βdestructively scan all the books in the world.β
Gift link: wapo.st/4rjXAMQ
27.01.2026 15:20 β π 295 π 153 π¬ 8 π 30
How Silicon Valley built AI: Buying, scanning and destroying millions of books
Court filings reveal how AI companies raced to obtain more books to feed chatbots, including by buying, scanning and disposing of millions of titles.
It's called Project Panama. The goal? "Destructively scan all the books in the world."
The kind of deep reporting with @willoremus.com and @nitasha.bsky.social that I love doing at @washingtonpost.com. wapo.st/4rjXAMQ
27.01.2026 15:52 β π 8 π 3 π¬ 0 π 1
Spotlight graph of X discourse about the shooting/killing of a man by ICE agents on January 24 in Minneapolis. Graph showing X posts along time (X axis) and cumulative number of X posts shared by that time (Y axis). Post counts are estimated (by Brandwatch) and include both X posts and reposts where the text contained terms indicating the post was about the *shooting AND the video* during the time period. Individual posts are plotted on the graph, sized by the number of reposts that post received (during the time period). Plotted posts are limited to posts that received >1 reposts.
Two posts are highlighted.
One by @gremloe (1/24/2026, 12:31:31 PM EDT): βIt appears from zooming in just moments before ICE/CBP shoot yet another US citizen, one agent removes the victims firearm from his waste holster. The victim was UNARMED when he was shot multiple times. This is a state execution. Again.β
A second by @bennyjohnson (1/24/2026, 1:21:28 PM EDT): βTim Walz just a few days ago was climbing on the gate of his mansion urging protesters to keep causing βtroubleβ and fighting ICE. An armed man just attacked agents and got killed. See how that works? Minnesota officials are fueling this.β
If you're interested in seeing how framing contests are taking shape after the ICE killing of another person in Minneapolis, here's a window into the conversation on X this morning.
Link to interactive graph: faculty.washington.edu/kstarbi/Spot...
* I put this together quickly. Sorry for any errors
24.01.2026 20:56 β π 451 π 167 π¬ 12 π 12
Alex Pretti from his early days working at the VA
Alex from our time working together, while he was in nursing school. Later, he moved to ICU, working as a nurse to support critically ill Veterans. He had such a great attitude. Weβd chat between patients about trying to get in a mountain bike ride together. Will never happen now
24.01.2026 19:39 β π 23783 π 7904 π¬ 636 π 618
Co-Founder of LinkedIn. Focused on using AI to find the cure for cancer, faster. Proud American.
Writing, Pod, ETC: Beacons.ai/reidhoffman
Factotum. Was @ethanschoonover on twtr / Photo & Design & Tech & RPGs / SeattleACS ham radio volunteer / Cascadia Radio cofounder / @TeenHorrorCast.com intern / "skoon-over" / he
Signal ethanschoonover.01
π Seattle, Earth
π https://ethanschoonover.com
It/Its
Chainlink, the issue tracker for agents: https://github.com/dollspace-gay/chainlink
Main fronter of a plural system
Its name is doll, it is also a doll. The system is an adult.
Interested in creating ethical and safe AI systems
Anti-cynic. Towards a weirder future. Reinforcement Learning, Autonomous Vehicles, transportation systems, the works. Asst. Prof at NYU
https://emerge-lab.github.io
https://www.admonymous.co/eugenevinitsky
π©π»βπ» writer & engineer
ποΈ @protopro.blue creator
π© @skyrdle.com creator
π https://xorientation.com writer
π https://kamigotchi.io dev & writer
π¦ zen existentialist
π ATL
at different times:
haxx0r in cDc
PhD in using people for ML and vice versa
"theory of mind for autonomous cars" startup guy would you believe, went kablooie
at present: newsletter -- buttondown.email/apperceptive
&c music, politics, nonsense
Iβm just this guy, you know?
i am the Demiurge now, so be nice
LLM developer, alignment-accelerationist, Fedorovist ancestor simulator, Dreamtime enjoyer.
All posts public domain under CC0 1.0.
Programmer-turned-lawyer, trying to build human(e) futures.
Day job: SonarSource. Boards: Creative Commons, OpenET (open water data), CA Housing Defense. Also: 415, dad. Past: Wikipedia, Moz, 305
Also: https://lu.is + https://social.coop/@luis_in_brief
web dev + hot dad. enjoy charts, unions, conputer games, philosophy. chicago crespo.business
Person who does electrical, computer, and music things.
Certified Machine Pervert with compiler-induced psychosis.
Robotanist
they/it/she
support: github.com/sponsors/orual
FORTRAN, Perl, and dotfile de-spaghettifier
she/they bi π³οΈββ§οΈ autist π '89
GOP delenda est
π¬ Looking at the brainβs βdark matterβ
π€― Studying how minds change
π©πΌβπ» Building science tools
π¦ βΎοΈ π
π maxine.science
hey! i do unimportant stuff sometimes
have a great day :D
profile picture drawn by konpeku on Discord
timezone: CST
π³οΈβπ
π https://indexx.dev/
π
software engineer, application developer
applied artificial intelligence
nature and atmospheric images
https://greengale.app/3fz.org
π Building β§ lanyards.app in the #ATmosphere π for #ATscience π§ͺ
πΏ Designing simpler solutions for complex needs
π Writing about design, science, tech, and the messy in-betweens
π Capitalism must serve people, not vice versa
πΆοΈ Neurodiverse rudeboi
location: atlanta, ga
website: https://aly.codes
github: https://github.com/alyraffauf
mostly inactive account, visit @aly.ruffruff.party for recent posts.