@fleetingbits.bsky.social
Are base models the dreams of an LLM?
kjv?
07.12.2024 00:58 โ ๐ 4 ๐ 0 ๐ฌ 1 ๐ 0An example is state health insurance - if you move states, it feels complicated and confusing to fix your health insurance - no one wants this. But, hasn't changed in my lifetime.
07.12.2024 00:27 โ ๐ 3 ๐ 0 ๐ฌ 0 ๐ 0And insurance has become the scapegoat (not even recently) - like over the last 20-30 years maybe.
07.12.2024 00:13 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0I think people underestimate popular anger (whether or not it is justified - leaving that to another side) - there are a bunch of issues around healthcare that everyone seems to agree exist but don't seem to change (not even political stuff) - and I think that fuels the anger more
07.12.2024 00:13 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0feels like the Apollo Research review of o1 was a bit adversarial - just getting that vibe from the description in the system card
07.12.2024 00:11 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0It's interesting how many disasters come from a collection of small failures - often because people are not sufficiently motivated to coordinate.
www.youtube.com/watch?v=zRM2...
Interesting thread on what social media rewards in academic articles. I think overbroad claims but, you know, you take what you can get.
x.com/0xredJ/statu...
hmm I feel it would be fun
05.12.2024 04:24 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0inspiration
04.12.2024 02:09 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Another interesting video - I think the idea that providers should have to stop deployment of their models if the models attempt to escape is reasonable.
Probably the starting point is actually a set of reporting requirements, but I digress...
I joke I joke lol
03.12.2024 19:48 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0I think they are called GPUs ~~
03.12.2024 19:47 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0laurenโs views on The Curve conference
x.com/typewriters/...
I can't really tell whether he is saying something other than -> "the lisp syntax generally looks like lisp lists and it is easy to parse those into trees using the tools of the language".
03.12.2024 00:46 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0claimed - AI misuse risk and AI misalignment risk are the same thing form a policy and technical perspective
03.12.2024 00:28 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0I think changing the link behavior and encouraging people to go back to linking their substacks would really bring a lot of a academics back to twitter - he doesn't have good "monetization" for academics who don't want $ but want reputation with peers.
02.12.2024 20:36 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0I feel like it hit a bunch of touchstones of the current twitter convo - a lot of which make Twitter more boring now?
But - just a few tweaks from Elon could take some of the wind out of Bluesky's sails...
I mean the goal of Anthropic & co is to figure out how to get a small number of extremely high quality labels, and then use synthetic data to get great coverage from them.
So, it is the "average data labeler's answer" but that average data labeler is an IMO winner.
I think some of this is aimed at "in domain" style questions - which are likely to be much closer to the preference labeled prompts.
I think he's making a valuable point there, aimed more at people who expect LLM answers to be magic to a very regular question.
going through more of the references - feels like "many such cases" type stuff
02.12.2024 05:11 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0But - AI Safety would be much improved if the optimization was more around ||clearly communicating real findings|| rather than ||getting a great paper title||
02.12.2024 04:49 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0I mean - all of this just shows how the OpenAI superalignment team really was asking the right questions - (1) how do we survive with bad labels? (2) how do we give labelers tools to make better labels? (3) how do we help models give answers that labelers can more easily discriminate between?
02.12.2024 04:48 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0or, the data labelers don't check the 20 line code fragment that the model spit out
hmmm
And, then the finding ends up being something like "our data labelers don't check the references" so we get bad labels : X
uhh, ok
this paper is referenced, which should have examples of reward hacking and the authors are high quality authors (Ethan Perez!)
02.12.2024 04:44 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0like this blog post feels this way to me - very few real examples that I can inspect
lilianweng.github.io/posts/2024-1...