The singularity is nearer-er
21.12.2024 01:54 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0@haileystorm.bsky.social
Mother. Ex Controls Engineer. Software dev. AI enthusiast & tinkerer. Please stand by... (Migrating from X??)
The singularity is nearer-er
21.12.2024 01:54 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0**Using o3 image understanding as key piece in computer control not cost effective though, would want to improve smaller model perf w/ that (and give o3 screenshots in certain situations).
One more caveat: humans require job training, early AGI will require some too
I don't think o3 is AGI*. But based on the benchmarks and experience with o1, I feel pretty confident a framework w/ either o3 or o4 + other frontier non-CoT LLM + other extant tools could be.
Assumes similar improvement in vision**
*Able, w/ enough compute, to do >50% of computer-based jobs
You know what, you're right, and I'm sorry.
I know I've been annoyed by your stances but this was obviously me being pretty dumb and unkind, and I'll delete my comment in a bit (make sure to don't miss this one by way of me orphaning it too soon)
Ah, well, I just copy-pasted yours ๐ obviously the replacements for calculating Sigma will be an improvement anyway though.
08.12.2024 18:53 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0Definitely worth trying other sizes, but on my machine w/ torch 2.4 (ROcm 7900XTX), yep!
08.12.2024 18:50 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0Updated gist with Eugene's O1-pro solution (which is similar but not quite as fast as my solution #2, the fastest for tensor sizes I tested).
08.12.2024 18:31 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0I updated my gist to include your solution (the one visible in the shared chat): gist.github.com/HaileyStorm/...
Looks like it is an improvement but slightly beaten out (at least for my test tensor sizes) by one of the O1 solutions I got... with a lot more effort.
Wow this was a challenge! With some (OK, a painful hour of) guidance, I was able to get a couple good solutions from O1 and QwQ. Largely down to improving calculation of Sigma. Here's a gist with the three solutions, testing run times etc. Roughly 2.9x faster :)
gist.github.com/HaileyStorm/...
Afraid i have to disagree. MMLU is a general knowledge benchmark for example and disagrees with you, as do my personal vibes (llama 3.1 8B > Mistral 7B in almost every way).
Fact knowledge ofc has density limit, as does intelligence, but do not agree reached either esp back at Mistral 7B.
You bet! Appreciate your videos :)
08.12.2024 16:24 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0I believe they've removed per message limits, so it's down to context length. Currently 32k tokens for Plus and 128k for Pro.
08.12.2024 08:07 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0I like Kyle Kabasares pretty well for physics & math. @academisfit.bsky.social
08.12.2024 07:55 โ ๐ 3 ๐ 0 ๐ฌ 1 ๐ 0DETH, lulz
08.12.2024 03:27 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0Sonnet is my goto, my all around. Especially for most coding problems.
O1-preview and from what I've seen so far even more so full o1, handles certain challenging tasks Sonnet can't dream of solving.
I use it maybe 10% as much as Sonnet (but, 4o would be fine for 85% of what I do with Sonnet).
When 90% of people retrospectively say "that was AGI" - that system will still make silly mistakes no human would ever make.
AI intelligence is jagged, and fundamentally different from human intelligence. Don't judge models or predict the future based on silly failure cases.
Will be very interested to see how multimodal o1 handles these
02.12.2024 03:40 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0Of course, is something is *really* bothering me I talk to both, and if timely my therapist too (she's available for messages but I largely stick to talking in person)
29.11.2024 18:15 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0It kinda depends. By default Claude but cgpt for personal but more, er, technical things, like how something might be interpreted, and ofc there's advanced voice which is nice for some things. Also cgpt for non therapy type medical stuff.
Claude+o1 for wheel work but that's all code.
There are things I discuss with AI I don't discuss with my therapist ๐
29.11.2024 18:01 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0I *know* he's brilliant, but there's not a single person in the AI sphere that rubs me the wrong way more.
29.11.2024 17:59 โ ๐ 6 ๐ 0 ๐ฌ 0 ๐ 0I've verified it a little (music generation, expected token pattern error rate & output quality after context len increase during training)
26.11.2024 22:48 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0I meant wall clock to same loss, since you have to change your model config anyway
26.11.2024 22:08 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0It's definitely slower wall clock. But while important that's of course not the only metric :)
26.11.2024 21:36 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Genuinely awesome
26.11.2024 17:52 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0Vs ROPE, I rather like ALiBi.
I suspect its continuous bias would be advantageous in a token-free world too. Though I doubt it's a complete/ideal attention solution there.