Around ICML with loose evening plans and an interest in "public AI", Canadian sovereign AI, or anything related? Swing by the Internet Archive Canada between 5p and 7p lu.ma/7rjoaxts
16.07.2025 23:30 β π 3 π 2 π¬ 0 π 0@nickmvincent.bsky.social
Studying people and computers (https://www.nickmvincent.com/) Blogging about data and steering AI (https://dataleverage.substack.com/)
Around ICML with loose evening plans and an interest in "public AI", Canadian sovereign AI, or anything related? Swing by the Internet Archive Canada between 5p and 7p lu.ma/7rjoaxts
16.07.2025 23:30 β π 3 π 2 π¬ 0 π 0Finally, I recently shared a preprint that relates deeply to the above ideas, on Collective Bargaining for Information: arxiv.org/abs/2506.10272, and have a blog post on this as well: dataleverage.substack.com/p/on-ai-driv...
24.06.2025 12:33 β π 4 π 1 π¬ 0 π 0And we have a blog post on algorithmic collective action with multiple collectives! dataleverage.substack.com/p/algorithmi...
24.06.2025 12:33 β π 3 π 1 π¬ 1 π 0These blog posts expand on attentional agency:
- genAI as ranking chunks of info: dataleverage.substack.com/p/google-and...
- utility of AI stems from people: dataleverage.substack.com/p/each-insta...
- connection to evals: dataleverage.substack.com/p/how-do-we-...
[FAccT-related link round-up]: It was great to present on measuring Attentional Agency with Zachary Wojtowicz at FAccT. Here's our paper on ACM DL: dl.acm.org/doi/10.1145/...
On Thurs Aditya Karan will present on collective action dl.acm.org/doi/10.1145/... at 10:57 (New Stage A)
βAttentional agencyβ β talk in new stage b at facct in the session right now!
24.06.2025 07:48 β π 1 π 0 π¬ 0 π 0Off to FAccT; Excited to see faces old and new!
21.06.2025 21:50 β π 6 π 0 π¬ 0 π 0Another blog post: a link roundup on AI's impact on jobs and power concentration, another proposal for Collective Bargaining for Information, and some additional thoughts on the topic:
dataleverage.substack.com/p/on-ai-driv...
Do some aspects seem wrong (in the next 2 posts, I get into how these ideas interact w/ reinforcement learning)?
27.05.2025 15:45 β π 1 π 0 π¬ 1 π 0arxiv.org/abs/2405.14614
Follow ups coming very soon (already drafted): would love to discuss these ideas with folks. Is this all repetitive with past data labor/leverage work? Are some aspects obvious to you?
This has implications for Internet policy, for understanding where the value in AI comes from, and for thinking about why we might even consider a certain model to be "good"!
This first post leans heavily on recent work with Zachary Wojtowicz and Shrey Jain, to appear at this upcoming FAccT
New data leverage post: "Google and TikTok rank bundles of information; ChatGPT ranks grains."
dataleverage.substack.com/p/google-and...
This will be post 1/3 in a series about viewing many AI products as all competing around the same task: ranking bundles or grains of records made by people.
Pre-print now on arxiv and to appear at FAccT 2025:
arxiv.org/abs/2505.00195
"Algorithmic Collective Action with Two Collectives --
Aditya Karan, Nicholas Vincent, Karrie Karahalios, Hari Sundaram"
Sharing a new paper (led by Aditya Karan):
there's growing interest in algorithmic collective action, when a "collective" acts through data to impact a recommender system, classifier, or other model.
But... what happens if two collectives act at the same time?
New early draft post: "Public AI, Data Appraisal, and Data Debates"
"A consortium of Public AI labs can substantially improve data pricing, which may also help to concretize debates about the ethics and legality of training practices."
dataleverage.substack.com/p/public-ai-...
βAlgo decision making systems are βleviathansβ, harmful not for their arbitrariness or opacity, but systemacity of decisions"
- @christinalu.bsky.social on need for plural #AI model ontologies (sounds technical, but has big consequences for human #commons)
www.combinationsmag.com/model-plural...
New Data Leverage newsletter post. It's about... data leverage (specifically, evaluation-focused bargaining) and products du jour (deep research, agents).
dataleverage.substack.com/p/evaluation...
Here's my round-up as a markdown file: github.com/nickmvincent...
Here's the newsletter post, Tipping Points for Content Ecosystems: dataleverage.substack.com/p/tipping-po...
I have some new co-authored writing to share, along with a round-up of important articles for the "content ecosystems and AI" space.
I'm doing an experiment with microblogging directly to a GitHub repo that I can share across platforms...
Global Dialogues has launched at the Paris #AIActionSummit.
Watch @audreyt.org give the announcement via @projectsyndicate.bsky.social
youtu.be/XkwqYQL6V4A?... (starts at 02:47:30)
AI labs and tech companies should open-source their data protection techniques so that content creators can benefit from new and old advances in this space: dataleverage.substack.com/p/ai-labs-co...
31.01.2025 19:26 β π 2 π 0 π¬ 0 π 0Given it seems clear that data protection technologies (such as the techniques OpenAI used to gather this evidence) will play a role in the near-term, I put together another post with a simple proposal that could reduce some of the tension in the current paradigm.
31.01.2025 19:26 β π 2 π 0 π¬ 1 π 0On Mon, wrote a post on the live-by-the-sword, die-by-the-sword nature of the current data paradigm. On Wed, there was quite a development on this front -- OpenAI came out with a statement that they have evidence that DeepSeek "used" OpenAI models in some fashion (this was faster than I expected!)
31.01.2025 19:26 β π 3 π 0 π¬ 1 π 0Really appreciate all the AI lab data paradigm / hypocrisy discussion on the show! BTW, you might enjoy this academic-y newsletter post (dataleverage.substack.com/p/ai-labs-co...) in which I quote your recent tweet on the topic (and the prequel from monday dataleverage.substack.com/p/live-by-th...)
31.01.2025 19:22 β π 0 π 0 π¬ 0 π 0For other kinds of benchmarks, influence much more localized (small set of data that contributes directly to eg factual history knowledge). So reasoning is highly collective (weβve all contributed) but in theory still ablatable / subject to leverage and scaling
28.01.2025 14:54 β π 0 π 0 π¬ 0 π 0Donβt disagree directly w these points! But basically would add, for reasoning, influence is widely distributed amongst training data (but Iβd guess that eg code and philosophy materials punch above their weight). But even for this, data scaling applies(more data -> better at a set of such examples)
28.01.2025 14:54 β π 0 π 0 π¬ 1 π 0Maybe I created confusion in my first response -- I'm not particularly attached to the compositor framing, and am definitely not trying to argue for a plagiarism framing. Rather, unlike humans, it's much easier (I think!) to attribute the "reasoning breakpoint" to specific documents and efforts
28.01.2025 14:30 β π 0 π 0 π¬ 0 π 0(i say this is bordering on tautology because it's effectively true for any data-dependent system that I could "ablate" down to having only one training document -- but I think is relevant as its part of the point I want to appear *more* in public discussions of AI policy)
28.01.2025 14:23 β π 0 π 0 π¬ 0 π 0The bordering-on-tautological longer argument I'd make is: given enough resources, I'm confident a team could eventually do enough data ablations to remove this capability, and doing so more accurately pinpoint the specific upstream human efforts
28.01.2025 14:23 β π 0 π 0 π¬ 1 π 0