FleetingBits @fleetingbits - Bluesky Profile

22.07.2025 16:45 — 👍 7 🔁 2 💬 1 📌 0

22.06.2025 02:20 — 👍 2 🔁 0 💬 0 📌 0

kjv?

07.12.2024 00:58 — 👍 4 🔁 0 💬 1 📌 0

An example is state health insurance - if you move states, it feels complicated and confusing to fix your health insurance - no one wants this. But, hasn't changed in my lifetime.

07.12.2024 00:27 — 👍 3 🔁 0 💬 0 📌 0

And insurance has become the scapegoat (not even recently) - like over the last 20-30 years maybe.

07.12.2024 00:13 — 👍 2 🔁 0 💬 1 📌 0

I think people underestimate popular anger (whether or not it is justified - leaving that to another side) - there are a bunch of issues around healthcare that everyone seems to agree exist but don't seem to change (not even political stuff) - and I think that fuels the anger more

07.12.2024 00:13 — 👍 3 🔁 0 💬 1 📌 0

feels like the Apollo Research review of o1 was a bit adversarial - just getting that vibe from the description in the system card

07.12.2024 00:11 — 👍 1 🔁 0 💬 0 📌 0

YouTube video by Practical Engineering The Wild Story of the Taum Sauk Dam Failure

It's interesting how many disasters come from a collection of small failures - often because people are not sufficiently motivated to coordinate.

www.youtube.com/watch?v=zRM2...

06.12.2024 01:09 — 👍 4 🔁 0 💬 0 📌 0

x.com

Interesting thread on what social media rewards in academic articles. I think overbroad claims but, you know, you take what you can get.

x.com/0xredJ/statu...

05.12.2024 05:02 — 👍 2 🔁 0 💬 0 📌 0

hmm I feel it would be fun

05.12.2024 04:24 — 👍 1 🔁 0 💬 0 📌 0

inspiration

04.12.2024 02:09 — 👍 1 🔁 0 💬 0 📌 0

04.12.2024 02:06 — 👍 2 🔁 0 💬 1 📌 0

YouTube video by FAR․AI Buck Shlegeris - AI Control [Alignment Workshop]

Another interesting video - I think the idea that providers should have to stop deployment of their models if the models attempt to escape is reasonable.

Probably the starting point is actually a set of reporting requirements, but I digress...

04.12.2024 00:55 — 👍 3 🔁 0 💬 0 📌 0

I joke I joke lol

03.12.2024 19:48 — 👍 3 🔁 0 💬 1 📌 0

I think they are called GPUs ~~

03.12.2024 19:47 — 👍 3 🔁 0 💬 1 📌 0

x.com

lauren’s views on The Curve conference

x.com/typewriters/...

03.12.2024 19:20 — 👍 2 🔁 0 💬 0 📌 0

03.12.2024 05:51 — 👍 8 🔁 3 💬 0 📌 0

I can't really tell whether he is saying something other than -> "the lisp syntax generally looks like lisp lists and it is easy to parse those into trees using the tools of the language".

03.12.2024 00:46 — 👍 2 🔁 0 💬 1 📌 0

Richard Ngo – Reframing AGI Threat Models [Alignment Workshop] YouTube video by FAR․AI

claimed - AI misuse risk and AI misalignment risk are the same thing form a policy and technical perspective

03.12.2024 00:28 — 👍 2 🔁 0 💬 0 📌 0

I think changing the link behavior and encouraging people to go back to linking their substacks would really bring a lot of a academics back to twitter - he doesn't have good "monetization" for academics who don't want $ but want reputation with peers.

02.12.2024 20:36 — 👍 1 🔁 0 💬 0 📌 0

I feel like it hit a bunch of touchstones of the current twitter convo - a lot of which make Twitter more boring now?

But - just a few tweaks from Elon could take some of the wind out of Bluesky's sails...

02.12.2024 06:09 — 👍 3 🔁 0 💬 1 📌 0

I mean the goal of Anthropic & co is to figure out how to get a small number of extremely high quality labels, and then use synthetic data to get great coverage from them.

So, it is the "average data labeler's answer" but that average data labeler is an IMO winner.

02.12.2024 05:56 — 👍 2 🔁 0 💬 0 📌 0

I think some of this is aimed at "in domain" style questions - which are likely to be much closer to the preference labeled prompts.

I think he's making a valuable point there, aimed more at people who expect LLM answers to be magic to a very regular question.

02.12.2024 05:55 — 👍 3 🔁 0 💬 1 📌 0

going through more of the references - feels like "many such cases" type stuff

02.12.2024 05:11 — 👍 1 🔁 0 💬 0 📌 0

But - AI Safety would be much improved if the optimization was more around ||clearly communicating real findings|| rather than ||getting a great paper title||

02.12.2024 04:49 — 👍 2 🔁 0 💬 1 📌 0

I mean - all of this just shows how the OpenAI superalignment team really was asking the right questions - (1) how do we survive with bad labels? (2) how do we give labelers tools to make better labels? (3) how do we help models give answers that labelers can more easily discriminate between?

02.12.2024 04:48 — 👍 1 🔁 0 💬 1 📌 0

or, the data labelers don't check the 20 line code fragment that the model spit out

hmmm

02.12.2024 04:47 — 👍 0 🔁 0 💬 1 📌 0

And, then the finding ends up being something like "our data labelers don't check the references" so we get bad labels : X

uhh, ok

02.12.2024 04:44 — 👍 1 🔁 0 💬 1 📌 0

this paper is referenced, which should have examples of reward hacking and the authors are high quality authors (Ethan Perez!)

02.12.2024 04:44 — 👍 1 🔁 0 💬 1 📌 0

Reward Hacking in Reinforcement Learning Reward hacking occurs when a reinforcement learning (RL) agent exploits flaws or ambiguities in the reward function to achieve high rewards, without genuinely learning or completing the intended task....

like this blog post feels this way to me - very few real examples that I can inspect

lilianweng.github.io/posts/2024-1...

02.12.2024 04:43 — 👍 1 🔁 0 💬 1 📌 0

FleetingBits

Latest posts by fleetingbits.bsky.social on Bluesky

@fleetingbits is following 18 prominent accounts