Deedy's Avatar

Deedy

@deedydas.bsky.social

VC at Menlo Ventures. Formerly founding team Glean, Google Search. Cornell CS. Tweets about tech, immigration, India, fitness and search.

799 Followers  |  18 Following  |  274 Posts  |  Joined: 14.11.2024  |  2.0588

Latest posts by deedydas.bsky.social on Bluesky

Try it here: yiyan.baidu.com

16.03.2025 17:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

Baidu, the Google of China, just dropped two models today:
β€” ERNIE 4.5: beats GPT 4.5 for 1% of price
β€” Reasoning model X1: beats DeepSeek R1 for 50% of price.

China continues to build intelligence too cheap to meter. The AI price war is on.

16.03.2025 17:17 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

"Make it look like I was on a luxury five star hotel vacation"

Google Gemini really cooked with this one.

This is next gen photo editing.

14.03.2025 02:24 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

WOW the new Google Flash model is the first time ever that you can do targetted edits of pictures with English.

"Make the steak vegetarian"
"Make the bridge go away"
"Make the keyboard more colorful"

And my favorite
"Give the OpenAI logo more personality"

13.03.2025 06:06 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
AI tools are spotting errors in research papers: inside a growing movement Nature - Study that hyped the toxicity of black plastic utensils inspires projects that use large language models to check papers.

Source: www.nature.com/articles/d4...

09.03.2025 03:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

AI is now making cutting edge science better.

The Nature published that reasoning LLMs found errors in 1% of the 10,000 research papers it analyzed with 35% false positive rate for $0.15-1/paper.

Anthropic founder’s view of β€œa country of geniuses in a data center” is happening.

09.03.2025 03:40 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

Source: arxiv.org/pdf/2503.00735

07.03.2025 17:02 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

HUGE New research paper shows how a 7B param AI model (90%) can beat OpenAI o1 (80%) on the MIT Integration Bee.

LADDER:
β€” Generate variants of problem
β€” Solve, verify, use GRPO (DeepSeek) to learn
TTRL:
β€” Do 1&2 when you see a new problem

New form of test time compute scaling!

07.03.2025 17:02 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

There are two categories:
β€” Daytona, for general purpose sort. Above numbers are Daytona.
β€” Indy, which can be specific to the 100-byte records with 10-byte keys.
Not super useful in practice though.

Link: sortbenchmark.org/
Google experiments on it: sortbenchmark.org/

03.03.2025 03:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

How well can computers sort 1 trillion numbers?

SortBenchmark, in distributed systems, measures this.
β€” How fast? 134s
β€” How cheap? $97
β€” How many in 1 minute? 370B numbers
β€” How much energy? ~59kJ or walking for 15mins

Every software engineer should know this.

03.03.2025 03:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
open-infra-index/202502OpenSourceWeek/day_6_one_more_thing_deepseekV3R1_inference_system_overview.md at main Β· deepseek-ai/open-infra-index Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation - deepseek-ai/open-infra-index

Source: github.com/deepseek-ai...

01.03.2025 05:07 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

BREAKING DeepSeek just let the world know they make $200M/yr at 500%+ profit margin.

Revenue (/day): $562k
Cost (/day): $87k
Revenue (/yr): ~$205M

This is all while charging $2.19/M tokens on R1, ~25x less than OpenAI o1.

If this was in the US, this would be a >$10B company.

01.03.2025 05:07 β€” πŸ‘ 9    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Claude Artifact Try out Artifacts created by Claude users

Source: claude.site/artifacts/3...

26.02.2025 02:34 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image Post image

Claude's new Github "talk to your code" integration changes how engineers understand software.

Fork a repo.
Select a folder.
Ask it anything.
It even shows you what %age of the context window each folder takes.


Here it visualizes yt-dlp's (Youtube downloader) flow:

26.02.2025 02:34 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
LLM Evolution and Comparison The 25 Most Important Large Language Models: A Comparative Analysis Large language models (LLMs) have rapidly advanced in recent years, transforming the field of artificial intelligence and natural language processing. This report presents a comparative analysis of the 25 most important LLMs, con...

Perplexity: www.perplexity.ai/search/read...
OpenAI: chatgpt.com/share/67a41...
Gemini: docs.google.com/document/d/...

15.02.2025 02:46 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

I asked all 3 Deep Researches to "compare 25 LLMs in a table on 20 axes" to figure out which one was the best.

The winner was OpenAI.

It had the most detailed, high-quality and accurate answer, but you do pay $200/mo for it.

15.02.2025 02:46 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Source: academics.hamilton.edu/documents/t...

14.02.2025 02:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

"The Mundanity of Excellence" [1989] is a timeless essay everyone ought to read in todays day and age.

Excellence is boring. It's making the same boring "correct" choice over and over again. You win by being consistent for longer.

Our short attention spans tend to forget that.

14.02.2025 02:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Source: arxiv.org/pdf/2502.06807

(Check out the detailed code submissions and scoring in the appendix)

12.02.2025 16:30 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

HUGE: OpenAI o3 scores 394 of 600 in the International Olympiad of Informatics (IOI) 2024, earning a Gold medal and 18 in the world.

The model was NOT contaminated with this data and the 50 submission limit was used.

We will likely see superhuman coding models this year.

12.02.2025 16:30 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

bbycroft.net/llm

12.02.2025 03:06 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Everyone should be using this website to understand the inside of an LLM.

I'm surprised more people don't know about it. Benjamin Bycroft made this beautiful interactive visualization to show exactly how the inner workings of each of the weights of an LLM work.

Here's a link:

12.02.2025 03:06 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
abazabaaaa's comment on "Deep Research is hands down the best research tool I’ve usedβ€”anyone else making the switch?" Explore this conversation and more from the ChatGPTPro community

Source: chatgpt.com/share/67a40...
Full step by step: www.reddit.com/r/ChatGPTPr...

11.02.2025 05:23 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

New research shows that LLMs don't perform well on long context.

Perfect needle-in-the-haystack scores are easyβ€”attention mechanisms can match the word. When you require 1-hop of reasoning, performance degrades quickly.

This is why guaranteeing correctness for agents is hard.

10.02.2025 16:53 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Dialogue at UTokyo GlobE #14: Mr. Sam Altman and Mr. Kevin Weil (CEO and CPO of Open AI)
On Monday, February 3, 2025, Dialogue at UTokyo GlobE #14 held an event with Mr. Sam Altman (CEO of OpenAI) and their CPO (Chief Product Officer), Mr. Kevin ... Dialogue at UTokyo GlobE #14: Mr. Sam Altman and Mr. Kevin Weil (CEO and CPO of Open AI)

Source: www.youtube.com/watch?v=8Lm...

09.02.2025 02:34 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Internal OpenAI models have improved to ~50 in the world or 3045 on Codeforces and will hit #1 by the end of the year, said Sam Altman yesterday in Japan, up from o3's 2727 (#175)!

This is a monumental result en route to AGI.

09.02.2025 02:34 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Source: www.sergey.fyi/articles/ge...

06.02.2025 17:37 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image Post image

PDF parsing is pretty much solved at scale now.

Gemini 2 Flash's $0.40/M tokens and 1M token context means you can now parse 6000 long PDFs at near perfect quality for $1

06.02.2025 17:37 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
Post image

Who has better Deep Research, Google or OpenAI?

Deep research generates ~10 page reports in ~15mins by scouring 100s of websites. This could replace a lot of human work. I tried both so you don't have to.

The verdict: OpenAI is faster and better quality despite being more $$

06.02.2025 02:44 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 3    πŸ“Œ 0
Post image

Gemini's just launched their new Flash models. They are cheaper, better and have 8x the context of GPT 4o-mini!

Per million input (cached), input and output tokens:
Gemini 2 Flash Lite: $0.01875, $0.075, $0.30
Gemini 2 Flash: $0.025, $0.1, $0.40
GPT 4o-mini: $0.075, $0.15, $0.60

05.02.2025 16:39 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@deedydas is following 18 prominent accounts