Meanwhile, “did women ruin the workplace?”
13.11.2025 00:13 — 👍 1897 🔁 502 💬 13 📌 6@lemonhail.bsky.social
Potential lottery winner
Meanwhile, “did women ruin the workplace?”
13.11.2025 00:13 — 👍 1897 🔁 502 💬 13 📌 6Tweet. Character from my comic book.
12.11.2025 08:41 — 👍 2226 🔁 212 💬 12 📌 2the new Steam Machine looks pretty kool
12.11.2025 18:59 — 👍 939 🔁 367 💬 8 📌 8Yeah I agree
08.11.2025 11:26 — 👍 0 🔁 0 💬 0 📌 0Mountain of shit is about to turn into gold, says king of shit mountain
06.11.2025 23:32 — 👍 7 🔁 0 💬 0 📌 0In July 2023, The New York Times sent a notice to Common Crawl asking for the removal of previously scraped Times content. (In their lawsuit against OpenAI, the Times noted that Common Crawl includes “at least 16 million unique records of content” from Times websites.) The nonprofit seemed amenable to the request. In November of that year, a Times spokesperson, Charlie Stadtlander, told Business Insider: “We simply asked that our content be removed, and were pleased that Common Crawl complied.” But as I explored Common Crawl’s archives, I found that many Times articles appear to still be present. When I mentioned this to the Times, Stadtlander told me: “Our understanding from them is that they have deleted the majority of the Times’s content, and continue to work on full removal.”
The Danish Rights Alliance (DRA), an organization that represents publishers and other rights-holders in Denmark, told me about a similar interaction with Common Crawl. Thomas Heldrup, the organization’s head of content protection and enforcement, showed me a redacted email exchange with the nonprofit that began in July 2024, in which the DRA requested that its members’ content be removed from the archive. In December 2024, more than six months after the DRA had initially requested removal, Common Crawl’s attorney wrote: “I confirm that Common Crawl has initiated work to remove your members’ content from the data archive. Presently, approximately 50% of this content has been removed.” I spoke with other publishers who’d received similar messages from Common Crawl. One was told, after multiple follow-up emails, that removal was 50 percent, 70 percent, and then 80 percent complete. By writing code to browse the petabytes of data, I was able to see that large quantities of articles from the Times, the DRA, and these other publishers are still present in Common Crawl’s archives. Furthermore, the files are stored in a system that logs the modification times of every file. The foundation adds a new “crawl” to its archive every few weeks, each containing 1 billion to 4 billion webpages, and it has been publishing these regular installments since 2013. None of the content files in Common Crawl’s archives appears to have been modified since 2016, suggesting that no content has been removed in at least nine years.
In our first conversation, Skrenta told me that removal requests are “a pain in the ass” but insisted that the foundation complies with them. In our second conversation, Skrenta was more forthcoming. He said that Common Crawl is “making an earnest effort” to remove content but that the file format in which Common Crawl stores its archives is meant “to be immutable. You can’t delete anything from it.” (He did not answer my question about where the 50, 70, and 80 percent removal figures come from.) Yet the nonprofit appears to be concealing this from visitors to its website, where a search function, the only nontechnical tool for seeing what’s in Common Crawl’s archives, returns misleading results for certain domains. A search for nytimes.com in any crawl from 2013 through 2022 shows a “no captures” result, when in fact there are articles from NYTimes.com in most of these crawls. I also discovered more than 1,000 other domains that produce this incorrect “no captures” result for at least several of the crawls, and most of these domains belong to publishers, including the BBC, Reuters, The New Yorker, Wired, the Financial Times, The Washington Post, and, yes, The Atlantic. According to my research and Common Crawl’s own disclosures, the companies behind each of these publications have sent legal requests to the nonprofit. At least one publisher I spoke with told me that it had used this search tool and concluded that its content had been removed from Common Crawl’s archives.
Common Crawl says it complies with removal requests—while telling us they are “a pain in the ass”—but also is not actually removing the data in question.
04.11.2025 12:18 — 👍 140 🔁 34 💬 3 📌 4NEW: Common Crawl, the massive archiver of the web, has gotten cozy with AI companies and is providing paywalled articles for training data. They’re also lying to publishers who have asked for material to be removed. “The robots are people too,” CC’s exec director told us when we asked about this.
04.11.2025 12:15 — 👍 851 🔁 502 💬 24 📌 89Can't get over this wallpaper Gainax sold in the 90s.
03.11.2025 19:59 — 👍 494 🔁 153 💬 4 📌 3A print copy of the Onion with comic panels from "Don and Jeff: Time Pedophiles". Clockwise from top left: Don and Jeff flee a T. Rex Don: How was I supposed to know the Cretaceous didn't have adolescent girls? Remodeling the Great Sphinx in Jeff's image Don: Jeff, what did you do the Sphinx?" Jeff (with girl in stereotypical ancient Egyptian garb): I gave all those sexy Egyptian minors a little something to look at! Fighting Samurai Don (holding katana): Jeff, a little help here?" Jeff (with girl in kimono): Sorry, Don, I've got my hands full myself! Briefly rescuing Joan of Arc Joan: Merci, time pedophiles you saved me! How can I ever repay you? Jeff: Have you ever thought about going blond? At the bottom, the headline reads "Trump: `Thats Not How I Draw Teenage Breasts`"
Subscribe to @theonion.com
02.11.2025 23:38 — 👍 203 🔁 17 💬 2 📌 1Studios Enter Bidding War Over Napkin Stephen King Wrote ‘Ghoul’ On
03.11.2025 17:00 — 👍 1005 🔁 103 💬 19 📌 12Lida
01.11.2025 17:58 — 👍 5252 🔁 1304 💬 16 📌 4Interesting how the terms
'grassroots group' and 'campaign group' are used in that snippet
that’s crazy dude. that is crazy
01.11.2025 01:16 — 👍 442 🔁 50 💬 5 📌 0habby halloween
31.10.2025 23:01 — 👍 9679 🔁 1827 💬 55 📌 2a Halloween treat from the REal Big Boys
www.youtube.com/watch?v=XCXZ...
He'll yeah, what a guy
31.10.2025 18:01 — 👍 1 🔁 0 💬 0 📌 0Did he have the flame trousers??
31.10.2025 17:37 — 👍 1 🔁 0 💬 1 📌 0Wow. I didn’t know that. I just, you’re telling me now for the first time.
30.10.2025 21:24 — 👍 1 🔁 0 💬 0 📌 0Kamala Harris
30.10.2025 21:15 — 👍 62 🔁 0 💬 1 📌 0Sorry sooz the algorithm has decided you like this now, time to get used to it and adapt your life accordingly.
29.10.2025 13:14 — 👍 2 🔁 0 💬 0 📌 0Barry Windsor-Smith, “Beguiled” (1982/1995), pen and ink, watercolour/gouache, and coloured pencil. Intended for the cover of Epic Illustrated, but fatigue prompted BWS to abandon it half-done. Thirteen years later, BWS completed the work.
11.01.2025 13:10 — 👍 526 🔁 197 💬 4 📌 4Mable #Pokemon
28.10.2025 15:57 — 👍 995 🔁 358 💬 5 📌 0"Pacino is wonderful"
I am always saying this
25.10.2025 21:26 — 👍 2 🔁 1 💬 0 📌 0gaming was dire in the 2000s...
23.10.2025 18:57 — 👍 1142 🔁 252 💬 53 📌 16"The hills have eyes" is a cautionary tale about having some stuff
11.10.2025 22:34 — 👍 756 🔁 292 💬 4 📌 4i've been playing a lot of third strike and i realize that the holes on the side could always be bigger
19.10.2025 15:57 — 👍 2105 🔁 296 💬 12 📌 0ink drawing on grid paper of a knight wearing a very ornate helmet
[inktober day 15]
15.10.2025 14:59 — 👍 1651 🔁 194 💬 6 📌 0?
14.10.2025 18:08 — 👍 1 🔁 0 💬 0 📌 0It can chair a meeting, it can hold a press conference, it can stay calm in an interview
10.10.2025 19:26 — 👍 0 🔁 0 💬 0 📌 0