By default, LLM agents with long action sequences use early steps to undermine your evaluation of later steps; a big alignment risk.
Our new paper mitigates this, keeps the ability for long-term planning, and doesnt assume you can detect the undermining strategy. π
23.01.2025 15:47 β π 13 π 1 π¬ 0 π 0
Reducing unnecessary action *does* drive growth. We are all more productive when we achieve the same things with fewer inputs, wasting citizens' time makes the whole country less productive. Create slack in people's lives and watch what they create with it!
21.01.2025 14:25 β π 1 π 0 π¬ 1 π 0
Interesting analogy, because of course the Dreadnoughts were mostly militarily useless and were obsoleted by changing strategic considerations before they were ever deployed.
17.12.2024 09:24 β π 4 π 0 π¬ 0 π 0
I desperately want to know what experience made you try out this prompt. Who hurt you?
10.12.2024 17:48 β π 2 π 0 π¬ 1 π 0
Interesting. I guess I'm surprised that oil prices would have such a big effect on total fossil fuel CO2 emissions (presumably mostly coal over the period?). But maybe substitutability links them enough.
04.12.2024 20:29 β π 1 π 0 π¬ 1 π 0
Actually just zoomed in on the data viewer. It does look like 1973 is the break point. Still curious about why the effect was so persistent.
04.12.2024 12:46 β π 1 π 0 π¬ 1 π 0
Why did land use emissions shrink lots between 196-70 and then stop shrinking?
Why did the oil price shock lead to sustained flat per capita fossil fuel emissions? It was short. Also it started after the trend breaks.
04.12.2024 12:44 β π 0 π 0 π¬ 1 π 0
I'm surprised that the per capita global emissions look like they are trending pretty flat from 1950ish, much earlier than I would have guessed. Presumably many people greatly increased their energy consumption after then? Do you know what is driving this?
03.12.2024 20:39 β π 1 π 0 π¬ 1 π 0
Updated! Keep em coming.
26.11.2024 09:07 β π 2 π 0 π¬ 0 π 0
@maosbot.bsky.social what do you think, do you belong on this list? I think most of your research isn't quite in this area but not sure how you self-identify on research focus at the moment.
25.11.2024 18:27 β π 0 π 0 π¬ 0 π 0
Weak signal perhaps, but you are one of two accounts on Twitter that I genuinely miss here. If you did make the leap that would be lovely :D
25.11.2024 14:38 β π 2 π 0 π¬ 0 π 0
Help me grow this starter pack for technical researchers working on AGI safety! go.bsky.app/D6P44sC Some flex, but aiming for mostly technical research rather than governance/strategy. Who am I missing?
25.11.2024 14:04 β π 28 π 9 π¬ 15 π 1
Agreed. I basically don't believe the result at all. Seems like the memetic strength is it lets you feel well informed.
24.11.2024 00:57 β π 2 π 0 π¬ 0 π 0
You too! Just DMed you :D
22.11.2024 18:21 β π 1 π 0 π¬ 0 π 0
Strongly agree. On a cold winter day they are basically a pure comfort upgrade. Also great for hayfever.
22.11.2024 18:12 β π 2 π 0 π¬ 1 π 0
The fact that every field that has tried to have a reproducibility crisis has been able to suggests that the way journals have done it for decades underinvests in finding critical flaws in papers and that retractions are too rare and late to depend on.
21.11.2024 10:36 β π 0 π 0 π¬ 0 π 0
I've seen at least a couple cases where a very high effort public review identified a significant flaw that the reviewers had missed. Losing that would be a real cost.
20.11.2024 20:53 β π 0 π 0 π¬ 1 π 0
Jakob N. Foerster - How To ML Paper
Little hat-tip to www.jakobfoerster.com/how-to-ml-pa... from Jakob Foerster and jsteinhardt.stat.berkeley.edu/blog/advice-... from Jacob Steinhardt who have excellent advice as well.
18.11.2024 20:09 β π 1 π 0 π¬ 0 π 0
timhunkin/illegal engineering
Entertaining essay about how the decline in practical engineering education has been devastating for *checks notes* professional criminal safe crackers. (Ok, mostly just a fun history of safe cracking.) www.timhunkin.com/94_illegal_e...
15.11.2024 17:14 β π 1 π 0 π¬ 0 π 0
And for readers! Twitter has been getting gradually more boring. Turns out this whole hyperlink thing is a big deal for the internet.
13.11.2024 16:02 β π 2 π 0 π¬ 0 π 0
Something I loved most about the internet in the 2000s was the idiosyncratic personal webpages that some people had put a crazy amount of time and effort into.
These pages must still exist right? What are the best ones you know of?
13.11.2024 10:40 β π 6 π 1 π¬ 1 π 0
Co-founder at Asana and Good Ventures (a funding partner of Coefficient Giving). Meta delenda est. Strange looper.
Author of Bea Wolf, A City on Mars, and the comic SMBC
Website: www.smbc-comics.com
Patreon: https://www.patreon.com/ZachWeinersmith?ty=h
New book: http://www.acityonmars.com/
Economist and legal scholar turned AI researcher focused on AI alignment and governance. Prof of government and policy and computer science at Johns Hopkins where I run the Normativity Lab. Recruiting CS postdocs and PhD students. gillianhadfield.org
Co-founder & editor, Works in Progress. Writer, Scientific Discovery. Podcaster, Hard Drugs. Advisor, Coefficient Giving. // Previously at Our World in Data.
Newsletter: https://scientificdiscovery.dev
Podcast: https://harddrugs.worksinprogress.co
π³οΈβπ
πΉ Labour MP for Earley and Woodley, covering Shinfield & Whitley
π Member of the Treasury Select Committee
π Chair of the APPG for Social Science and Policy
βοΈ Please email if you'd like a reply: yuan.yang.mp@parliament.uk
Making AI safer at Google DeepMind
davidlindner.me
Research scientist at Google DeepMind. All opinions are my own.
https://turntrout.com
CEO of Coefficient Giving
P(A|B) = [P(A)*P(B|A)]/P(B), all the rest is commentary. Click to read Astral Codex Ten, by Scott Alexander, a [β¦] [bridged from astralcodexten.com on the web: https://fed.brid.gy/web/astralcodexten.com ]
β Founder of Our World in Data
β Professor at the University of Oxford
Data to understand global problems and research to make progress against them.
What would we need to understand in order to design an amazing future? Ex DeepMind, OpenAI
Blog at thezvi.substack.com, this is a pure backup, same handle on Twitter.
Contra dance musician, parent, effective altruist, biosurveillant, software engineer, blogger.
www.jefftk.com
Trying for human-compatible humans