's Avatar

@leventov.bsky.social

7 Followers  |  52 Following  |  28 Posts  |  Joined: 28.11.2024  |  2.3684

Latest posts by leventov.bsky.social on Bluesky

Preview
X. Itโ€™s whatโ€™s happening From breaking news and entertainment to sports and politics, get the full story with all the live commentary.

The reason for my skepticism is that I'm not sure xAI would give away Grok 3 and push it on x.com so aggressively if it cost arm and leg to run, as gpt-4.5 pricing indicates

01.03.2025 17:31 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I know there are no official info, of course. I'm following these rumors pretty closely, too. The compute flops they have had could have been achieved on ~2T model, no? I think Elon said that they used a ton of synthetic generated data too, and many rollouts to find good solutions for RL, too

01.03.2025 17:26 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Source that Grok 3 is 10T? I'm very skeptical of that. Maybe they scaled training data substantially but parameters not *that* much.

01.03.2025 05:42 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Perhaps this laziness is an intentional nudge towards using reasoning models (which are not yet available, though - I mean reasoners based on 4.5)

27.02.2025 20:54 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

We need an uncertainty knob similar to temperature

25.02.2025 14:30 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

They promised to open source prev gen after releasing the next. So we will know

21.02.2025 03:54 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

And then if anything goes wrong or unexpected during the "wet" phase a clueless wannabe would like to pull VLM with camera and ask the model for "debug instructions". You can't do it "on Google".

06.02.2025 20:04 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

You cannot get precise and detailed instructions for making a bomb or poison or other very dangerous stuff from things you can buy legitimately in just "a few clicks on Google". As a very minimum it's days of research, including how to gaslight vendors, how to prepare things, etc.

06.02.2025 20:02 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Analyzing the ethics and risks of autonomous agents is crucial. Thank you for your insightful work @mmitchell.bsky.social @evijit.io @sashamtl.bsky.social @giadapistilli.com

06.02.2025 15:58 โ€” ๐Ÿ‘ 62    ๐Ÿ” 11    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0

Bad take. Censorship of "recipes for ruin" is good. A blanket deontological rule "censorship is bad" doesn't work.

06.02.2025 19:41 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I guess the equivalent in AI agent(cy) engineering, the equivalent transition will be towards the method design and decomposition: dialogue? multi-role debate? argument tree? the data model the model is operating on top? Reward design for RL post-training/fine-tuning?

05.02.2025 01:07 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Region Adjacency Graphs (RAGs) โ€” skimage 0.25.1 documentation

(take not mine) the current AI/agent(cy) engineering is much like pre-DL computer vision, when people tried to massage the problem around a few fairly rigid algos like SIFT. There also was RAG: scikit-image.org/docs/stable/... and also didn't work very well

05.02.2025 00:59 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I guess the equivalent in AI agent(cy) engineering, the equivalent transition will be towards the method design and decomposition: dialogue? multi-role debate? argument tree? the data model the model is operating on top? Reward design for RL post-training/fine-tuning?

05.02.2025 01:07 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

With DL and end-to-end training in CV, loss design became a more important skill than heuristic bricolage

05.02.2025 01:01 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Region Adjacency Graphs (RAGs) โ€” skimage 0.25.1 documentation

(take not mine) the current AI/agent(cy) engineering is much like pre-DL computer vision, when people tried to massage the problem around a few fairly rigid algos like SIFT. There also was RAG: scikit-image.org/docs/stable/... and also didn't work very well

05.02.2025 00:59 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Wrong, this is still a liberal hysteria

03.02.2025 20:49 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I have yet to regret shooting a request to undermind. It always finds something interesting. My requests are always of the form "who have done research in roughly this shape" (where I'm sure that someone did, but hard to find via Scholar)

03.02.2025 13:31 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
You.com | AI for workplace productivity Artificial intelligence designed for collaboration - with AI Agents that can research, solve problems, and create content for you and your team.

FWIW in my impression, none of the services in this category (Perplexity, You.com, etc.) live up to the "deep" label except undermind.ai so far. Didn't try PaperQA though.

03.02.2025 06:37 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

Google's deep research is a total flop. I paid for a subscription to try it. Tried on maybe 10 requests, very broad range, from technical to culture to philosophical. It spit out bald, often outright wrong slop every. single. time. Idk why you keep praising it

03.02.2025 06:35 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Zero information. Consistently candid Sama will say whatever the specific audience likes. I'm sure in different rooms he says the opposite of this

01.02.2025 16:10 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Exa The Exa API retrieves the best, realtime data from the web to complement your AI

Claude is very hit-or-miss for perplexity-like questions. Same for everything else: ChatGPT, Gemini, exa.ai, You.com.

Meta search with all of them may be helpful, even if not fully automatable yet: if LLMs knew good answers to these searches they would not be so hit-or-miss to begin with.

20.01.2025 18:45 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Maybe no hard distinction. It's a continuum between Sonnet and "reasoning" models.

08.01.2025 05:32 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Good design. "Native" tool calling makes LLM APIs much more complex than they should be. Using just a single "tool" - code execution - to rule them all is better.

01.01.2025 05:20 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
profjohnkay.bsky.social

Does the book argue for expected utility/value based decision-making? "Radical Uncertainty" by @profjohnkay.bsky.social directly argues against that

26.12.2024 10:03 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I think o1/o3 should be better at this (I don't use them at the moment), but breaking the flow and waiting would be weird. o1-capable coder with access to the context that constantly does some analysis in background and makes insightful suggestions for me from time to time would be best

24.12.2024 05:43 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Similarly, when writing a somewhat long function that includes 2-3 copies of similar but not identical logic (e.g., loop bodies), LLMs are never capable of factoring those out to shorten the function overall.

24.12.2024 05:43 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I find it annoying that LLMs often tend to write their own functions for doing something instead of using the standard library or the "utils" that I've already created in my project.

24.12.2024 05:43 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Presumably, people on MechanicalTurk got 75%. However, I would argue that people on MechanicalTurk are self-selected for something like openness for new tasks and problems.

21.12.2024 06:43 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Ethan, do you cherry-pick the stuff you post on twitter/bsky? How many experiments do you do that never make it to your twitter in which none of the AIs do anything remarkable or badly misunderstand your intent?

16.12.2024 05:38 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Google just shadow banned my account, returning bullshit 403 to all requests. At least Anthropic and OpenAI don't have this BS

12.12.2024 13:54 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@leventov is following 19 prominent accounts