FleetingBits's Avatar

FleetingBits

@fleetingbits.bsky.social

Are base models the dreams of an LLM?

240 Followers  |  18 Following  |  63 Posts  |  Joined: 26.11.2023  |  1.5499

Latest posts by fleetingbits.bsky.social on Bluesky

Post image 22.07.2025 16:45 โ€” ๐Ÿ‘ 5    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image 22.06.2025 02:20 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

kjv?

07.12.2024 00:58 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

An example is state health insurance - if you move states, it feels complicated and confusing to fix your health insurance - no one wants this. But, hasn't changed in my lifetime.

07.12.2024 00:27 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

And insurance has become the scapegoat (not even recently) - like over the last 20-30 years maybe.

07.12.2024 00:13 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I think people underestimate popular anger (whether or not it is justified - leaving that to another side) - there are a bunch of issues around healthcare that everyone seems to agree exist but don't seem to change (not even political stuff) - and I think that fuels the anger more

07.12.2024 00:13 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

feels like the Apollo Research review of o1 was a bit adversarial - just getting that vibe from the description in the system card

07.12.2024 00:11 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
The Wild Story of the Taum Sauk Dam Failure
YouTube video by Practical Engineering The Wild Story of the Taum Sauk Dam Failure

It's interesting how many disasters come from a collection of small failures - often because people are not sufficiently motivated to coordinate.

www.youtube.com/watch?v=zRM2...

06.12.2024 01:09 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
x.com

Interesting thread on what social media rewards in academic articles. I think overbroad claims but, you know, you take what you can get.

x.com/0xredJ/statu...

05.12.2024 05:02 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

hmm I feel it would be fun

05.12.2024 04:24 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

inspiration

04.12.2024 02:09 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image 04.12.2024 02:06 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Buck Shlegeris - AI Control  [Alignment Workshop]
YouTube video by FARโ€คAI Buck Shlegeris - AI Control [Alignment Workshop]

Another interesting video - I think the idea that providers should have to stop deployment of their models if the models attempt to escape is reasonable.

Probably the starting point is actually a set of reporting requirements, but I digress...

04.12.2024 00:55 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I joke I joke lol

03.12.2024 19:48 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I think they are called GPUs ~~

03.12.2024 19:47 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
x.com

laurenโ€™s views on The Curve conference

x.com/typewriters/...

03.12.2024 19:20 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image 03.12.2024 05:51 โ€” ๐Ÿ‘ 8    ๐Ÿ” 3    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I can't really tell whether he is saying something other than -> "the lisp syntax generally looks like lisp lists and it is easy to parse those into trees using the tools of the language".

03.12.2024 00:46 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Richard Ngo โ€“ Reframing AGI Threat Models [Alignment Workshop] YouTube video by FARโ€คAI

claimed - AI misuse risk and AI misalignment risk are the same thing form a policy and technical perspective

03.12.2024 00:28 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I think changing the link behavior and encouraging people to go back to linking their substacks would really bring a lot of a academics back to twitter - he doesn't have good "monetization" for academics who don't want $ but want reputation with peers.

02.12.2024 20:36 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I feel like it hit a bunch of touchstones of the current twitter convo - a lot of which make Twitter more boring now?

But - just a few tweaks from Elon could take some of the wind out of Bluesky's sails...

02.12.2024 06:09 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I mean the goal of Anthropic & co is to figure out how to get a small number of extremely high quality labels, and then use synthetic data to get great coverage from them.

So, it is the "average data labeler's answer" but that average data labeler is an IMO winner.

02.12.2024 05:56 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I think some of this is aimed at "in domain" style questions - which are likely to be much closer to the preference labeled prompts.

I think he's making a valuable point there, aimed more at people who expect LLM answers to be magic to a very regular question.

02.12.2024 05:55 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

going through more of the references - feels like "many such cases" type stuff

02.12.2024 05:11 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

But - AI Safety would be much improved if the optimization was more around ||clearly communicating real findings|| rather than ||getting a great paper title||

02.12.2024 04:49 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I mean - all of this just shows how the OpenAI superalignment team really was asking the right questions - (1) how do we survive with bad labels? (2) how do we give labelers tools to make better labels? (3) how do we help models give answers that labelers can more easily discriminate between?

02.12.2024 04:48 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

or, the data labelers don't check the 20 line code fragment that the model spit out

hmmm

02.12.2024 04:47 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

And, then the finding ends up being something like "our data labelers don't check the references" so we get bad labels : X

uhh, ok

02.12.2024 04:44 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

this paper is referenced, which should have examples of reward hacking and the authors are high quality authors (Ethan Perez!)

02.12.2024 04:44 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Reward Hacking in Reinforcement Learning Reward hacking occurs when a reinforcement learning (RL) agent exploits flaws or ambiguities in the reward function to achieve high rewards, without genuinely learning or completing the intended task....

like this blog post feels this way to me - very few real examples that I can inspect

lilianweng.github.io/posts/2024-1...

02.12.2024 04:43 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@fleetingbits is following 18 prominent accounts