We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers.
The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
10.07.2025 19:46 β π 6906 π 3024 π¬ 112 π 625
Update for those whoβve left the other app:
Iβm now on the policy team at Model Evaluation and Threat Research (METR). Excited to be βdoing AI policyβ full-time.
07.03.2025 03:19 β π 14 π 1 π¬ 3 π 0
Why arenβt our AI evaluations better? AFAICT a key reason is that the incentives around them are kinda bad.
In a new post, I explain how the standardized testing industry works and write about lessons it may have for the AI evals ecosystem.
open.substack.com/pub/contextw...
26.02.2025 04:35 β π 4 π 1 π¬ 0 π 0
This is perfect in its own way
31.01.2025 21:20 β π 3 π 0 π¬ 0 π 0
Natural minds and natural bodies are irreplaceable. Artificial minds are costless to replace. We might value artificial bodies more, since they arenβt so disposable, at least in the brief period when they are still few and costly. Could be a good period to set stories in.
14.01.2025 01:12 β π 4 π 0 π¬ 0 π 0
YouTube video by TEMAC INDIA
TOYOTA AIR JET LOOMS JAT 810 JA4S-190 CM RUNNING AT 1200 RPM
When we optimize automation, we sometimes optimize *hard*. Like this automated loom working away at an inhuman 1200 RPM. Wild. youtu.be/WweMNDqDYhc?...
11.01.2025 02:10 β π 4 π 0 π¬ 1 π 0
In Vitalikβs post he mentions resolving only the highest-volume markets, which I think would address this concern even more directly, but Iβm less confident I understand that version.
24.12.2024 21:21 β π 1 π 0 π¬ 0 π 0
I dunno! Would be fun to find out
24.12.2024 21:17 β π 1 π 0 π¬ 0 π 0
I wouldnβt say it was free, really. Like, if the creator wouldβve needed to spend $1 in subsidies on a regular market, on each market that has a 90% chance of reversion they would need to offer $10 in subsidies to compensate, or whatever.
24.12.2024 21:17 β π 1 π 0 π¬ 1 π 0
Since the expected payouts on each market are much lower, you probably need big subsidies to compensate. And since you donβt know ahead of time which markets you will resolve, you have to fund them all.
24.12.2024 21:01 β π 1 π 0 π¬ 1 π 0
You want traders to give you cheap but calibrated estimates for all the claims. The randomization reduces the expected size of payouts theyβd receive for their bets, since each market only has a 10% chance of getting audited & resolved, but it preserves the incentive to bet their true probabilities.
24.12.2024 20:58 β π 1 π 1 π¬ 1 π 0
Letβs take this to DMs :)
24.12.2024 19:45 β π 1 π 0 π¬ 1 π 0
So if you create a prediction market because you want information on a question, you can think of the market subsidy as the compensation youβre paying folks for their information.
24.12.2024 18:54 β π 1 π 0 π¬ 2 π 0
Yeah. Itβs kinda subtle. With a subsidy, youβre basically giving away money as an incentive. But you can increase liquidity without giving away money.
24.12.2024 18:51 β π 1 π 0 π¬ 1 π 0
Youβre thinking of liquidity, which is related but not the same. Subsidy here just means committing money to increase the payouts to whoever is right.
24.12.2024 18:42 β π 1 π 0 π¬ 1 π 0
What do you mean by βsolveβ?
You wanted information about all 100, so you subsidize markets on all of them, and traders canβt tell ahead of time which ones will be resolved, so if your subsidies were big they are incentivized to trade on any/all of the markets that they have information about.
24.12.2024 18:27 β π 1 π 0 π¬ 2 π 0
Feel like theyβve made a lot of wild statements but I donβt know if anybody has collected those in one place for easy reference.
22.12.2024 05:25 β π 3 π 0 π¬ 1 π 0
Is there a website/database out there that tracks what major AI company executives say about the future of AI?
22.12.2024 05:24 β π 8 π 0 π¬ 2 π 0
Transformers and other parallel sequence models like Mamba are in TCβ°. That implies they can't internally map (stateβ, actionβ ... actionβ) β stateβββ
But they can map (stateβ, actionβ, stateβ, actionβ ... stateβ, actionβ) β stateβββ
Just reformulate the task!
18.12.2024 06:59 β π 8 π 0 π¬ 0 π 0
YouTube
Share your videos with friends, family, and the world
Atticus Geiger gave a take on when sparse autoencoder (SAEs) are/arenβt what you should use. I basically agree with his recommendations. youtube.com/clip/UgkxKWI...
10.12.2024 22:28 β π 6 π 0 π¬ 0 π 0
These days, flow-based models are typically defined via (neural) differential equations, requiring numerical integration or simulation-free alternatives during training. This paper revisits autoregressive flows, using Transformer layers to define the sequence of flow transformations directly.
10.12.2024 17:08 β π 3 π 0 π¬ 0 π 0
It isnβt super clear to me what the monthly pricing will be. Like, on the one hand in a competitive market I think the price of AI services will tend downward toward the marginal cost. But also there are only a few providers and constraints on supply. Not sure how it comes out on balance.
05.12.2024 17:11 β π 2 π 0 π¬ 1 π 0
It might be like that! If so I would expect an experiment like this to indicate that. :)
04.12.2024 06:10 β π 0 π 0 π¬ 1 π 0
Re: instruction-tuning and RLHF as βlobotomyβ
Iβm interested in experiments that look into how much finetuning can βroll backβ a post-trained model to its base model perplexity on the original distribution.
Has anyone seen an experiment like this run?
04.12.2024 05:44 β π 4 π 0 π¬ 1 π 0
Ah. Yeah I donβt think thereβs anything special about services that brand themselves as βAI agentsβ. What matters IMO is itβs opaquely doing expensive work on behalf of the client without human oversight.
For those, I think they might want to advertise their guarantees. Not certain, though.
02.12.2024 01:18 β π 3 π 0 π¬ 0 π 0
Can you say more? Not sure that I understand.
02.12.2024 00:37 β π 1 π 0 π¬ 1 π 0
βProvider paysβ for failed automation services
If your AI works as well as you claim, why not make that a promise?
Iβve been wondering when it would make sense for βAI agentβ services to offer money-back guarantees. Wrote a short post about this on a flight.
open.substack.com/pub/contextw...
01.12.2024 23:26 β π 7 π 0 π¬ 1 π 0
xkcd comic 386, with back and forth that goes:
βAre you going to bed?β
βI canβt. This is important.β
βWhat?β
βSomeone is WRONG on the internet.β
https://xkcd.com/386/
Neat thing about real-money prediction markets is that you can get paid for doing this.
30.11.2024 16:40 β π 5 π 0 π¬ 0 π 0
From prediction markets to info finance
h/t @vitalik.ca, though I believe the idea is borrowed from @robinhanson.bsky.social
vitalik.eth.limo/general/2024...
28.11.2024 00:54 β π 2 π 0 π¬ 0 π 0
A bit of clever mechanism design: prediction markets + randomized auditing.
If you have 100 verifiable claims you want information on but can only afford to check 10, fund markets on each. Later, use a randomized ordering of them to check the first 10. Resolve those to yes/no, refund the rest.
28.11.2024 00:54 β π 5 π 0 π¬ 3 π 0
Independent AI researcher, creator of datasette.io and llm.datasette.io, building open source tools for data journalism, writing about a lot of stuff at https://simonwillison.net/
The world's leading venue for collaborative research in theoretical computer science. Follow us at http://YouTube.com/SimonsInstitute.
i like philos and computers and other things - building llms @ cohere
Research Assistant in the AI for Cyber Defence research centre at The Alan Turing Institute.
building Cursor @ Anysphere
open-source enthusiast and full stack developer
Senior Resident China Fellow at AtlanticCouncil's DFRLab. PhD candidate at Georgetown focusing on comparative China-U.S.-EU data policy. Methodologically promiscuous. π³οΈβπ
Head of AI @ NormalComputing. Tweets on Math, AI, Chess, Probability, ML, Algorithms and Randomness. Author of tensorcookbook.com
ML Research @ Apple.
Understanding deep learning (generalization, calibration, diffusion, etc).
preetum.nakkiran.org
I run AI Plans, an AI Safety lab focused on solving AI Alignment before 2029.
For several weeks I used a stone for a pillow.
I once spent a quarter of my paycheck on cheese.
Ping me! DM me (not working atm due to totalitarian UK law)!
SurpassAI
Open Data Specialist @eleutherai.bsky.social & Digital Historian @eui-history.bsky.social. Co-founder of @datarescueproject.org and @sucho-org.bsky.social. Website: https://www.storytracer.com/
PhD @ltiatcmu.bsky.social
previously @eleutherai.bsky.social
π lintang.sutawika.com
Como todos los hombres de Babilonia, he sido procΓ³nsul; como todos, esclavo; tambiΓ©n he conocido la omnipotencia, el oprobio, las cΓ‘rceles.
very sane ai newsletter: verysane.ai
Technology specialist at the EU AI Office / AI Safety / Prev: University of Amsterdam, EleutherAI, BigScience
Thoughts & opinions are my own and do not necessarily represent my employer.
Probabilistic ML, Learning Theory, Philosophy of Science,
Complex Systems, Nonlinear Dynamics, Disinformation Intervention
YIMBY
Research MLE, CoreWeave + EleutherAI
xStabilityai | xMSFT | xAMZN
What I'm reading: https://dmarx.github.io/papers-feed/
I like tokens! Lead for OLMo data at @ai2.bsky.social (Dolma π) w @kylelo.bsky.social. Open source is fun π€βοΈππ³οΈβπ Opinions are sampled from my own stochastic parrot
more at https://soldaini.net