Lemonhail @lemonhail - Bluesky Profile

Meanwhile, “did women ruin the workplace?”

13.11.2025 00:13 — 👍 1897 🔁 502 💬 13 📌 6

Tweet. Character from my comic book.

12.11.2025 08:41 — 👍 2226 🔁 212 💬 12 📌 2

the new Steam Machine looks pretty kool

12.11.2025 18:59 — 👍 939 🔁 367 💬 8 📌 8

Yeah I agree

08.11.2025 11:26 — 👍 0 🔁 0 💬 0 📌 0

Mountain of shit is about to turn into gold, says king of shit mountain

06.11.2025 23:32 — 👍 7 🔁 0 💬 0 📌 0

In July 2023, The New York Times sent a notice to Common Crawl asking for the removal of previously scraped Times content. (In their lawsuit against OpenAI, the Times noted that Common Crawl includes “at least 16 million unique records of content” from Times websites.) The nonprofit seemed amenable to the request. In November of that year, a Times spokesperson, Charlie Stadtlander, told Business Insider: “We simply asked that our content be removed, and were pleased that Common Crawl complied.” But as I explored Common Crawl’s archives, I found that many Times articles appear to still be present. When I mentioned this to the Times, Stadtlander told me: “Our understanding from them is that they have deleted the majority of the Times’s content, and continue to work on full removal.”

The Danish Rights Alliance (DRA), an organization that represents publishers and other rights-holders in Denmark, told me about a similar interaction with Common Crawl. Thomas Heldrup, the organization’s head of content protection and enforcement, showed me a redacted email exchange with the nonprofit that began in July 2024, in which the DRA requested that its members’ content be removed from the archive. In December 2024, more than six months after the DRA had initially requested removal, Common Crawl’s attorney wrote: “I confirm that Common Crawl has initiated work to remove your members’ content from the data archive. Presently, approximately 50% of this content has been removed.” I spoke with other publishers who’d received similar messages from Common Crawl. One was told, after multiple follow-up emails, that removal was 50 percent, 70 percent, and then 80 percent complete. By writing code to browse the petabytes of data, I was able to see that large quantities of articles from the Times, the DRA, and these other publishers are still present in Common Crawl’s archives. Furthermore, the files are stored in a system that logs the modification times of every file. The foundation adds a new “crawl” to its archive every few weeks, each containing 1 billion to 4 billion webpages, and it has been publishing these regular installments since 2013. None of the content files in Common Crawl’s archives appears to have been modified since 2016, suggesting that no content has been removed in at least nine years.

In our first conversation, Skrenta told me that removal requests are “a pain in the ass” but insisted that the foundation complies with them. In our second conversation, Skrenta was more forthcoming. He said that Common Crawl is “making an earnest effort” to remove content but that the file format in which Common Crawl stores its archives is meant “to be immutable. You can’t delete anything from it.” (He did not answer my question about where the 50, 70, and 80 percent removal figures come from.) Yet the nonprofit appears to be concealing this from visitors to its website, where a search function, the only nontechnical tool for seeing what’s in Common Crawl’s archives, returns misleading results for certain domains. A search for nytimes.com in any crawl from 2013 through 2022 shows a “no captures” result, when in fact there are articles from NYTimes.com in most of these crawls. I also discovered more than 1,000 other domains that produce this incorrect “no captures” result for at least several of the crawls, and most of these domains belong to publishers, including the BBC, Reuters, The New Yorker, Wired, the Financial Times, The Washington Post, and, yes, The Atlantic. According to my research and Common Crawl’s own disclosures, the companies behind each of these publications have sent legal requests to the nonprofit. At least one publisher I spoke with told me that it had used this search tool and concluded that its content had been removed from Common Crawl’s archives.

Common Crawl says it complies with removal requests—while telling us they are “a pain in the ass”—but also is not actually removing the data in question.

04.11.2025 12:18 — 👍 140 🔁 34 💬 3 📌 4

The Nonprofit Feeding the Entire Internet to AI Companies Common Crawl claims to provide a public benefit, but it lies to publishers about its activities.

NEW: Common Crawl, the massive archiver of the web, has gotten cozy with AI companies and is providing paywalled articles for training data. They’re also lying to publishers who have asked for material to be removed. “The robots are people too,” CC’s exec director told us when we asked about this.

04.11.2025 12:15 — 👍 851 🔁 502 💬 24 📌 89

Can't get over this wallpaper Gainax sold in the 90s.

03.11.2025 19:59 — 👍 494 🔁 153 💬 4 📌 3

A print copy of the Onion with comic panels from "Don and Jeff: Time Pedophiles". Clockwise from top left: Don and Jeff flee a T. Rex Don: How was I supposed to know the Cretaceous didn't have adolescent girls? Remodeling the Great Sphinx in Jeff's image Don: Jeff, what did you do the Sphinx?" Jeff (with girl in stereotypical ancient Egyptian garb): I gave all those sexy Egyptian minors a little something to look at! Fighting Samurai Don (holding katana): Jeff, a little help here?" Jeff (with girl in kimono): Sorry, Don, I've got my hands full myself! Briefly rescuing Joan of Arc Joan: Merci, time pedophiles you saved me! How can I ever repay you? Jeff: Have you ever thought about going blond? At the bottom, the headline reads "Trump: `Thats Not How I Draw Teenage Breasts`"

Subscribe to @theonion.com

02.11.2025 23:38 — 👍 203 🔁 17 💬 2 📌 1

Studios Enter Bidding War Over Napkin Stephen King Wrote ‘Ghoul’ On LOS ANGELES—Anticipating the project could be the biggest horror hit of the decade, film studios were reportedly locked in a bidding war Friday over a napkin Stephen King had written the word “Ghoul” ...

Studios Enter Bidding War Over Napkin Stephen King Wrote ‘Ghoul’ On

03.11.2025 17:00 — 👍 1005 🔁 103 💬 19 📌 12

Lida

01.11.2025 17:58 — 👍 5252 🔁 1304 💬 16 📌 4

Interesting how the terms
'grassroots group' and 'campaign group' are used in that snippet

01.11.2025 10:24 — 👍 9 🔁 0 💬 1 📌 0

that’s crazy dude. that is crazy

01.11.2025 01:16 — 👍 442 🔁 50 💬 5 📌 0

habby halloween

31.10.2025 23:01 — 👍 9679 🔁 1827 💬 55 📌 2

YouTube video by Real Big Boys MICHAEL MYERS PLAYED HIS OWN SCARY MUSIC

a Halloween treat from the REal Big Boys
www.youtube.com/watch?v=XCXZ...

31.10.2025 16:37 — 👍 33 🔁 12 💬 1 📌 0

He'll yeah, what a guy

31.10.2025 18:01 — 👍 1 🔁 0 💬 0 📌 0

Did he have the flame trousers??

31.10.2025 17:37 — 👍 1 🔁 0 💬 1 📌 0

Wow. I didn’t know that. I just, you’re telling me now for the first time.

30.10.2025 21:24 — 👍 1 🔁 0 💬 0 📌 0

Kamala Harris

30.10.2025 21:15 — 👍 62 🔁 0 💬 1 📌 0

Sorry sooz the algorithm has decided you like this now, time to get used to it and adapt your life accordingly.

29.10.2025 13:14 — 👍 2 🔁 0 💬 0 📌 0

Barry Windsor-Smith, “Beguiled” (1982/1995), pen and ink, watercolour/gouache, and coloured pencil. Intended for the cover of Epic Illustrated, but fatigue prompted BWS to abandon it half-done. Thirteen years later, BWS completed the work.

11.01.2025 13:10 — 👍 526 🔁 197 💬 4 📌 4