Deedy deedydas - Bluesky Statics

Try it here: yiyan.baidu.com

16.03.2025 17:17 — 👍 0 🔁 0 💬 1 📌 0

Baidu, the Google of China, just dropped two models today:
— ERNIE 4.5: beats GPT 4.5 for 1% of price
— Reasoning model X1: beats DeepSeek R1 for 50% of price.

China continues to build intelligence too cheap to meter. The AI price war is on.

16.03.2025 17:17 — 👍 5 🔁 0 💬 1 📌 0

"Make it look like I was on a luxury five star hotel vacation"

Google Gemini really cooked with this one.

This is next gen photo editing.

14.03.2025 02:24 — 👍 2 🔁 0 💬 1 📌 0

WOW the new Google Flash model is the first time ever that you can do targetted edits of pictures with English.

"Make the steak vegetarian"
"Make the bridge go away"
"Make the keyboard more colorful"

And my favorite
"Give the OpenAI logo more personality"

13.03.2025 06:06 — 👍 5 🔁 0 💬 0 📌 0

AI tools are spotting errors in research papers: inside a growing movement Nature - Study that hyped the toxicity of black plastic utensils inspires projects that use large language models to check papers.

Source: www.nature.com/articles/d4...

09.03.2025 03:40 — 👍 1 🔁 0 💬 0 📌 0

AI is now making cutting edge science better.

The Nature published that reasoning LLMs found errors in 1% of the 10,000 research papers it analyzed with 35% false positive rate for $0.15-1/paper.

Anthropic founder’s view of “a country of geniuses in a data center” is happening.

09.03.2025 03:40 — 👍 1 🔁 0 💬 2 📌 0

Source: arxiv.org/pdf/2503.00735

07.03.2025 17:02 — 👍 1 🔁 0 💬 0 📌 0

HUGE New research paper shows how a 7B param AI model (90%) can beat OpenAI o1 (80%) on the MIT Integration Bee.

LADDER:
— Generate variants of problem
— Solve, verify, use GRPO (DeepSeek) to learn
TTRL:
— Do 1&2 when you see a new problem

New form of test time compute scaling!

07.03.2025 17:02 — 👍 5 🔁 0 💬 1 📌 0

There are two categories:
— Daytona, for general purpose sort. Above numbers are Daytona.
— Indy, which can be specific to the 100-byte records with 10-byte keys.
Not super useful in practice though.

Link: sortbenchmark.org/
Google experiments on it: sortbenchmark.org/

03.03.2025 03:30 — 👍 0 🔁 0 💬 0 📌 0

How well can computers sort 1 trillion numbers?

SortBenchmark, in distributed systems, measures this.
— How fast? 134s
— How cheap? $97
— How many in 1 minute? 370B numbers
— How much energy? ~59kJ or walking for 15mins

Every software engineer should know this.

03.03.2025 03:30 — 👍 0 🔁 0 💬 1 📌 0

open-infra-index/202502OpenSourceWeek/day_6_one_more_thing_deepseekV3R1_inference_system_overview.md at main · deepseek-ai/open-infra-index Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation - deepseek-ai/open-infra-index

Source: github.com/deepseek-ai...

01.03.2025 05:07 — 👍 1 🔁 0 💬 0 📌 0

BREAKING DeepSeek just let the world know they make $200M/yr at 500%+ profit margin.

Revenue (/day): $562k
Cost (/day): $87k
Revenue (/yr): ~$205M

This is all while charging $2.19/M tokens on R1, ~25x less than OpenAI o1.

If this was in the US, this would be a >$10B company.

01.03.2025 05:07 — 👍 9 🔁 0 💬 1 📌 0

Claude Artifact Try out Artifacts created by Claude users

Source: claude.site/artifacts/3...

26.02.2025 02:34 — 👍 1 🔁 0 💬 0 📌 0

Claude's new Github "talk to your code" integration changes how engineers understand software.

Fork a repo.
Select a folder.
Ask it anything.
It even shows you what %age of the context window each folder takes.

Here it visualizes yt-dlp's (Youtube downloader) flow:

26.02.2025 02:34 — 👍 3 🔁 0 💬 1 📌 0

LLM Evolution and Comparison The 25 Most Important Large Language Models: A Comparative Analysis Large language models (LLMs) have rapidly advanced in recent years, transforming the field of artificial intelligence and natural language processing. This report presents a comparative analysis of the 25 most important LLMs, con...

Perplexity: www.perplexity.ai/search/read...
OpenAI: chatgpt.com/share/67a41...
Gemini: docs.google.com/document/d/...

15.02.2025 02:46 — 👍 1 🔁 0 💬 0 📌 0

I asked all 3 Deep Researches to "compare 25 LLMs in a table on 20 axes" to figure out which one was the best.

The winner was OpenAI.

It had the most detailed, high-quality and accurate answer, but you do pay $200/mo for it.

15.02.2025 02:46 — 👍 3 🔁 0 💬 1 📌 0

Source: academics.hamilton.edu/documents/t...

14.02.2025 02:47 — 👍 1 🔁 0 💬 0 📌 0

"The Mundanity of Excellence" [1989] is a timeless essay everyone ought to read in todays day and age.

Excellence is boring. It's making the same boring "correct" choice over and over again. You win by being consistent for longer.

Our short attention spans tend to forget that.

14.02.2025 02:47 — 👍 0 🔁 0 💬 1 📌 0

Source: arxiv.org/pdf/2502.06807

(Check out the detailed code submissions and scoring in the appendix)

12.02.2025 16:30 — 👍 2 🔁 0 💬 0 📌 0

HUGE: OpenAI o3 scores 394 of 600 in the International Olympiad of Informatics (IOI) 2024, earning a Gold medal and 18 in the world.

The model was NOT contaminated with this data and the 50 submission limit was used.

We will likely see superhuman coding models this year.

12.02.2025 16:30 — 👍 4 🔁 0 💬 1 📌 0

bbycroft.net/llm

12.02.2025 03:06 — 👍 2 🔁 0 💬 1 📌 0

Everyone should be using this website to understand the inside of an LLM.

I'm surprised more people don't know about it. Benjamin Bycroft made this beautiful interactive visualization to show exactly how the inner workings of each of the weights of an LLM work.

Here's a link:

12.02.2025 03:06 — 👍 4 🔁 0 💬 1 📌 0

abazabaaaa's comment on "Deep Research is hands down the best research tool I’ve used—anyone else making the switch?" Explore this conversation and more from the ChatGPTPro community

Source: chatgpt.com/share/67a40...
Full step by step: www.reddit.com/r/ChatGPTPr...

11.02.2025 05:23 — 👍 0 🔁 0 💬 0 📌 0

New research shows that LLMs don't perform well on long context.

Perfect needle-in-the-haystack scores are easy—attention mechanisms can match the word. When you require 1-hop of reasoning, performance degrades quickly.

This is why guaranteeing correctness for agents is hard.

10.02.2025 16:53 — 👍 4 🔁 0 💬 1 📌 0

Dialogue at UTokyo GlobE #14: Mr. Sam Altman and Mr. Kevin Weil (CEO and CPO of Open AI)

On Monday, February 3, 2025, Dialogue at UTokyo GlobE #14 held an event with Mr. Sam Altman (CEO of OpenAI) and their CPO (Chief Product Officer), Mr. Kevin ... Dialogue at UTokyo GlobE #14: Mr. Sam Altman and Mr. Kevin Weil (CEO and CPO of Open AI)

Source: www.youtube.com/watch?v=8Lm...

09.02.2025 02:34 — 👍 0 🔁 0 💬 0 📌 0

Internal OpenAI models have improved to ~50 in the world or 3045 on Codeforces and will hit #1 by the end of the year, said Sam Altman yesterday in Japan, up from o3's 2727 (#175)!

This is a monumental result en route to AGI.

09.02.2025 02:34 — 👍 0 🔁 0 💬 1 📌 0

Source: www.sergey.fyi/articles/ge...

06.02.2025 17:37 — 👍 0 🔁 0 💬 0 📌 0

PDF parsing is pretty much solved at scale now.

Gemini 2 Flash's $0.40/M tokens and 1M token context means you can now parse 6000 long PDFs at near perfect quality for $1

06.02.2025 17:37 — 👍 2 🔁 0 💬 2 📌 0

Who has better Deep Research, Google or OpenAI?

Deep research generates ~10 page reports in ~15mins by scouring 100s of websites. This could replace a lot of human work. I tried both so you don't have to.

The verdict: OpenAI is faster and better quality despite being more $$

06.02.2025 02:44 — 👍 5 🔁 0 💬 3 📌 0

Gemini's just launched their new Flash models. They are cheaper, better and have 8x the context of GPT 4o-mini!

Per million input (cached), input and output tokens:
Gemini 2 Flash Lite: $0.01875, $0.075, $0.30
Gemini 2 Flash: $0.025, $0.1, $0.40
GPT 4o-mini: $0.075, $0.15, $0.60

05.02.2025 16:39 — 👍 3 🔁 0 💬 0 📌 0

Posts by Deedy (@deedydas.bsky.social)