DeBoris @phrozen10 - Bluesky Profile

Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models High-quality training data has proven crucial for developing performant large language models (LLMs). However, commercial LLM providers disclose few, if any, details about the data used for training. ...

Want to know what training data has been memorized by models like GPT-4?

We propose information-guided probes, a method to uncover memorization evidence in *completely black-box* models,

without requiring access to
🙅‍♀️ Model weights
🙅‍♀️ Training data
🙅‍♀️ Token probabilities 🧵 (1/5)

21.03.2025 19:08 — 👍 97 🔁 27 💬 4 📌 8

a man with long hair is giving a thumbs up and saying he 's gonna fix the economy . ALT: a man with long hair is giving a thumbs up and saying he 's gonna fix the economy .

This is definitely lowering the cost of eggs

28.02.2025 22:13 — 👍 3 🔁 0 💬 0 📌 0

My friends are organizing a new workshop bringing together NLP and CSS research with psychology: First Workshop on Integrating NLP and Psychology to Study Social Interactions (NLPSI) at ICWSM 2025 nlpsi-workshop.github.io Consider submitting a paper (long/short) or an extended abstract :)

24.02.2025 10:20 — 👍 2 🔁 2 💬 0 📌 0

Imagine if @aoc @sanders.senate.gov , every dem, SIMULTANEOUSLY held Town Halls where they allowed grant and contract recipients to explain to the country what it is they do and why it's important

Invite all media. Including RW podcasters. @spaces Flood the zone

Call it a Day Of Transparency

11.02.2025 18:51 — 👍 96357 🔁 20133 💬 3741 📌 1640

“I, too” by Langston Hughes

10.02.2025 17:01 — 👍 718 🔁 115 💬 7 📌 6

Alumni too! Call/email your schools and tell them that they have a legal and ethical responsibility to protect student data. I’m still paying OSU and they should be safeguarding my data from third parties.

04.02.2025 14:36 — 👍 28 🔁 9 💬 1 📌 0

a close up of a man with the words trust your eyes on the bottom ALT: a close up of a man with the words trust your eyes on the bottom

04.02.2025 12:04 — 👍 2 🔁 0 💬 1 📌 0

Is this based on fear from politicians cutting funding or are some of these “scholars” being emboldened to show their true colors?

06.12.2024 12:42 — 👍 0 🔁 0 💬 0 📌 0

I firmly believe that there needs to be more conversation about ethical AI development and less about “SkyNet” like scenarios.

30.11.2024 12:56 — 👍 0 🔁 0 💬 0 📌 0

A statistical approach to model evaluations A research paper from Anthropic on how to apply statistics to improve language model evaluations

These seem like some very common (statistical) sense recommendations for eval. Still the title of the paper is somewhat confusing as if these ideas are new. I would’ve suggested “Bringing a statistical approach…”

www.anthropic.com/research/sta...

22.11.2024 10:39 — 👍 39 🔁 2 💬 2 📌 0

I can already feel my timeline getting smarter and happier

21.11.2024 19:04 — 👍 2 🔁 0 💬 1 📌 0