Robert Nishihara @robertnishihara

Speak at Ray Summit!

10.07.2025 06:20 — 👍 3 🔁 2 💬 0 📌 0

DuckDB goes distributed? DeepSeek’s smallpond takes on Big Data DeepSeek is pushing DuckDB beyond its single-node roots with smallpond, a new, simple approach to distributed compute. But does it solve the scalability challenge—or introduce new trade-offs?

More details in this blog post. mehdio.substack.com/p/duckdb-goe...

04.03.2025 06:33 — 👍 0 🔁 0 💬 0 📌 0

DeepSeek released smallpond, a big data processing framework built on top of Ray.
- Smallpond targets high performance data processing.
- It provides a high-level dataframe API
- Targets petabyte-level scaling

The challenges around training data prep only grow when you include multimodal data.

04.03.2025 06:33 — 👍 0 🔁 0 💬 1 📌 0

Amazon’s Exabyte-Scale Migration from Apache Spark to Ray on Amazon EC2 | Amazon Web Services Large-scale, distributed compute framework migrations are not for the faint of heart. There are backwards-compatibility constraints to maintain, performance expectations to meet, scalability limits to...

Amazon published this only 4 months ago, but it feels like an eternity. It's one of the most impressive large-scale data processing migration efforts. Rare to see companies truly achieving order of magnitude cost improvements (while simultaneously increasing scale).

aws.amazon.com/blogs/openso...

22.11.2024 03:00 — 👍 3 🔁 1 💬 0 📌 0

YouTube video by Anyscale ChatGPT Creator John Schulman on OpenAI | Ray Summit 2023

Talked with John Schulman last year about the ChatGPT backstory and scaling laws 😍 John co-founded OpenAI and created ChatGPT. www.youtube.com/watch?v=6Ctv...

22.11.2024 00:50 — 👍 0 🔁 0 💬 0 📌 0

Fine-tuning LLMs for longer context and better RAG systems Based on the popular “Needle In a Haystack” benchmark and RAG, we share our process of creating a problem-specific fine-tuning dataset to extend the context of models to build better RAG systems.

A good overview of the fundamentals of how to extend context windows for LLMs (if you care about RAG, you probably care about context lengths).

26.02.2024 06:35 — 👍 3 🔁 0 💬 0 📌 0

Robert Nishihara

Latest posts by robertnishihara.bsky.social on Bluesky

@robertnishihara is following 8 prominent accounts