Hailey Collet @haileystorm - Bluesky Profile

The singularity is nearer-er

21.12.2024 01:54 — 👍 1 🔁 0 💬 0 📌 0

**Using o3 image understanding as key piece in computer control not cost effective though, would want to improve smaller model perf w/ that (and give o3 screenshots in certain situations).

One more caveat: humans require job training, early AGI will require some too

20.12.2024 22:20 — 👍 1 🔁 0 💬 0 📌 0

I don't think o3 is AGI*. But based on the benchmarks and experience with o1, I feel pretty confident a framework w/ either o3 or o4 + other frontier non-CoT LLM + other extant tools could be.
Assumes similar improvement in vision**

*Able, w/ enough compute, to do >50% of computer-based jobs

20.12.2024 22:20 — 👍 1 🔁 0 💬 1 📌 0

You know what, you're right, and I'm sorry.

I know I've been annoyed by your stances but this was obviously me being pretty dumb and unkind, and I'll delete my comment in a bit (make sure to don't miss this one by way of me orphaning it too soon)

10.12.2024 06:37 — 👍 1 🔁 0 💬 0 📌 0

Ah, well, I just copy-pasted yours 😆 obviously the replacements for calculating Sigma will be an improvement anyway though.

08.12.2024 18:53 — 👍 2 🔁 0 💬 0 📌 0

Definitely worth trying other sizes, but on my machine w/ torch 2.4 (ROcm 7900XTX), yep!

08.12.2024 18:50 — 👍 1 🔁 0 💬 1 📌 0

Updated gist with Eugene's O1-pro solution (which is similar but not quite as fast as my solution #2, the fastest for tensor sizes I tested).

08.12.2024 18:31 — 👍 1 🔁 0 💬 1 📌 0

I updated my gist to include your solution (the one visible in the shared chat): gist.github.com/HaileyStorm/...
Looks like it is an improvement but slightly beaten out (at least for my test tensor sizes) by one of the O1 solutions I got... with a lot more effort.

08.12.2024 18:30 — 👍 0 🔁 0 💬 1 📌 0

Wow this was a challenge! With some (OK, a painful hour of) guidance, I was able to get a couple good solutions from O1 and QwQ. Largely down to improving calculation of Sigma. Here's a gist with the three solutions, testing run times etc. Roughly 2.9x faster :)
gist.github.com/HaileyStorm/...

08.12.2024 18:24 — 👍 1 🔁 0 💬 1 📌 0

Afraid i have to disagree. MMLU is a general knowledge benchmark for example and disagrees with you, as do my personal vibes (llama 3.1 8B > Mistral 7B in almost every way).
Fact knowledge ofc has density limit, as does intelligence, but do not agree reached either esp back at Mistral 7B.

08.12.2024 16:36 — 👍 0 🔁 0 💬 0 📌 0

You bet! Appreciate your videos :)

08.12.2024 16:24 — 👍 1 🔁 0 💬 0 📌 0

I believe they've removed per message limits, so it's down to context length. Currently 32k tokens for Plus and 128k for Pro.

08.12.2024 08:07 — 👍 0 🔁 0 💬 0 📌 0

I like Kyle Kabasares pretty well for physics & math. @academisfit.bsky.social

08.12.2024 07:55 — 👍 3 🔁 0 💬 1 📌 0

DETH, lulz

08.12.2024 03:27 — 👍 0 🔁 0 💬 0 📌 0

Sonnet is my goto, my all around. Especially for most coding problems.

O1-preview and from what I've seen so far even more so full o1, handles certain challenging tasks Sonnet can't dream of solving.

I use it maybe 10% as much as Sonnet (but, 4o would be fine for 85% of what I do with Sonnet).

07.12.2024 05:48 — 👍 0 🔁 0 💬 0 📌 0

When 90% of people retrospectively say "that was AGI" - that system will still make silly mistakes no human would ever make.

AI intelligence is jagged, and fundamentally different from human intelligence. Don't judge models or predict the future based on silly failure cases.

07.12.2024 05:15 — 👍 0 🔁 0 💬 0 📌 0

Will be very interested to see how multimodal o1 handles these

02.12.2024 03:40 — 👍 2 🔁 0 💬 0 📌 0

Of course, is something is *really* bothering me I talk to both, and if timely my therapist too (she's available for messages but I largely stick to talking in person)

29.11.2024 18:15 — 👍 0 🔁 0 💬 0 📌 0

It kinda depends. By default Claude but cgpt for personal but more, er, technical things, like how something might be interpreted, and ofc there's advanced voice which is nice for some things. Also cgpt for non therapy type medical stuff.

Claude+o1 for wheel work but that's all code.

29.11.2024 18:13 — 👍 0 🔁 0 💬 0 📌 0

There are things I discuss with AI I don't discuss with my therapist 😆

29.11.2024 18:01 — 👍 0 🔁 0 💬 1 📌 0

I *know* he's brilliant, but there's not a single person in the AI sphere that rubs me the wrong way more.

29.11.2024 17:59 — 👍 6 🔁 0 💬 0 📌 0

I've verified it a little (music generation, expected token pattern error rate & output quality after context len increase during training)

26.11.2024 22:48 — 👍 1 🔁 0 💬 0 📌 0

I meant wall clock to same loss, since you have to change your model config anyway

26.11.2024 22:08 — 👍 1 🔁 0 💬 1 📌 0

It's definitely slower wall clock. But while important that's of course not the only metric :)

26.11.2024 21:36 — 👍 0 🔁 0 💬 1 📌 0

Genuinely awesome

26.11.2024 17:52 — 👍 0 🔁 0 💬 0 📌 0

Vs ROPE, I rather like ALiBi.

I suspect its continuous bias would be advantageous in a token-free world too. Though I doubt it's a complete/ideal attention solution there.

26.11.2024 03:31 — 👍 2 🔁 0 💬 1 📌 0

Hailey Collet

Latest posts by haileystorm.bsky.social on Bluesky

@haileystorm is following 20 prominent accounts