Anders's Avatar

Anders

@dataders.bsky.social

DX @ dbt

1,554 Followers  |  812 Following  |  77 Posts  |  Joined: 29.10.2024  |  1.5957

Latest posts by dataders.bsky.social on Bluesky

๐Ÿ‘‹

03.04.2025 18:16 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Sippy ยท Cloudflare R2 docs Sippy is a data migration service that allows you to copy data from other cloud providers to R2 as the data is requested, without paying unnecessary cloud egress fees typically associated with moving ...

yeah the multi-cloud story is far from over. you can imagine that without egress costs, it might actually be performant to move data b/w AWS and Azure it the data centers are close enough.

Iceberg kinda makes DWH on Cloudflare R2 feasible given it has S3-compatible API. Sippy seems cool

01.04.2025 20:40 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Iceberg?? Give it a REST! The new abstraction that changes nothing... and everything

new rule: talk about Iceberg without mentioning Hive or ACID properties.

the vast majority of SQL users don't care (and shouldn't)! it's like explaining Dropbox starting w/ bringing up libfuse

If you're new to iceberg lmk if you get something from this!
roundup.getdbt.com/p/iceberg-gi...
#databs

01.04.2025 18:57 โ€” ๐Ÿ‘ 6    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
EDBT/ICDT 2025 Joint Conference - 25th March - 28th March, 2025 - Barcelona, Spain

lots of juicy looking papers in the agenda for EDBT/ICDT 2025 happening this year in Barcelona! definitely going to be diving in more later today
edbticdt2025.upc.edu?contents=det...
#databs #edbc2025 #icdt2025

24.03.2025 16:33 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
PEP 751 โ€“ A file format to record Python dependencies for installation reproducibility | peps.python.org This PEP proposes a new file format for specifying dependencies to enable reproducible installation in a Python environment. The format is designed to be human-readable and machine-generated. Installe...

I FINALLY asked for pronouncement on PEP 751 -- lock files for #Python : peps.python.org/pep-0751/ .

19.03.2025 22:04 โ€” ๐Ÿ‘ 60    ๐Ÿ” 14    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

p.s. also TI[F]L what "serde" stands for after seeing the word for years ๐Ÿคฆ

17.03.2025 14:44 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

this is the clearest case for Arrow that I've ever seen. I love that it's high-level but also doesn't shy away from details when it's important.

this should be a #databs canon text imho. thanks @ianmcook.bsky.social !

17.03.2025 14:44 โ€” ๐Ÿ‘ 7    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Relational Playground An exploration of relational algebra. Compare SQL queries with relational algebra expressions along with intermediate results.

TIL about relationalplayground.com

makes relational algebra accessible to a SQL monkey like me. much better than in a dry textbook where it's normally found.

#databs

13.03.2025 15:28 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Iceberg Meetup Japan #1 : Iceberg and Databricks 2ๆœˆ21ๆ—ฅใซ้–‹ๅ‚ฌใ•ใ‚ŒใŸIceberg Meetup #1ใงไฝฟ็”จใ—ใŸ่ณ‡ๆ–™ใซใชใ‚Šใพใ™ใ€‚ DatabricksใจIcebergใ‚’ไฝฟ็”จใ™ใ‚‹้š›ใฎใ‚ซใ‚ฟใƒญใ‚ฐใซใคใ„ใฆใ”็ดนไป‹ใ—ใฆใ„ใพใ™ใ€‚

imho, the most clear case Databricks has made in public on their Iceberg future post-Tabular acquisition.

I agree with this vision of the future and it's nice to see DBRX sharing how they see themselves participating in it.

worth clicking through the deck! #databs
speakerdeck.com/databricksja...

03.03.2025 15:42 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
GitHub - deepseek-ai/smallpond: A lightweight data processing framework built on DuckDB and 3FS. A lightweight data processing framework built on DuckDB and 3FS. - deepseek-ai/smallpond

11/11) p.s. forgot to link to the repo!
github.com/deepseek-ai/...

03.03.2025 15:31 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

10) So sick that a smallpond pipeline returns a LogicalPlan representing a DAG where each node is a distinct data processing task.

Imagine if a dbt DAG resulted in a single logical plan that operates across multiple engines.

and then they can optimize the plan before execution as well! ๐Ÿคฏ๐Ÿคฏ๐Ÿคฏ

03.03.2025 15:31 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

9) Arrow is the unsung hero of this project (and arguably all innovation in data ecosystem).

it's what enables:
1. all this interchangeability of query engines
2. (likely) using duckdb in a distributed environment in the first place

03.03.2025 15:27 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
One thing I found peculiar is that for the GraySort benchmark it dispatches to P... | Hacker News

8) this HN called out that smallpond abstracts supports using different query engines for different jobs (shuffling vs sorting).

Very bullish on this future of right tool for right job and making it as simple as a config

news.ycombinator.com/item?id=4323...

03.03.2025 15:23 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

7)
TIRED: "big vs. small" & "distributed vs single-node"
WIRED: tactical deployment of single-node query engines within distributed frameworks.

another great example is Apache Comet which plugs DataFusion into Spark to accelerate single-node operations resulting in overal Spark performance speedups

03.03.2025 15:20 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

6) Making smallpond 5 years ago would have been very difficult! but the emergence of lower-level, off-the-shelf components greatly accelerate the development time.

Within the year, we'll to see this new paradigm catch on. Future examples will probably be using DataFusion not DuckDB.

03.03.2025 15:18 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

5) there's been previous discussion on DeepSeek's scrappiness and I think it shows here. They had a vision of what they wanted and rather than paying for software or forcing their vision into an existing tool, were able to ship exactly what they wanted

03.03.2025 15:12 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

4) smallpond is a bespoke data processing framework using off-the-shelf, OSS components (ray, arrow, duckdb, polars).

Why didn't they use {TOOL}? My guesses
โŒ dbt: they wanted Python Dataframe API
โŒ Airflow: not as close to metal as Ray
โŒ pytorch or ray[data]: idk tbh

03.03.2025 15:10 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Scale Machine Learning & AI Computing | Ray by Anyscale Ray is an open source framework for managing, executing, and optimizing compute needs. Unify AI workloads with Ray by Anyscale. Try it for free today.

3) ray.io is the foundation of any training and inference infrastructure. I haven't had much exposure to Ray as a SQL monkey using DWHs, but it's just recently clicked for me how big of a deal it is

03.03.2025 15:01 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

2) so cool to begin to see the data infra that supports training LLMs. Open weights is cool, and RAG makes sense, but as a former XGBooster turned "data engineer", seeing the data cleaning pipelines is what I've most wanted to see.

03.03.2025 14:58 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

1) my top-level takeaways on DeepSeek's smallpond: a distributed data processing framework used for training LLMs #databs

03.03.2025 14:57 โ€” ๐Ÿ‘ 8    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

dude -- so cool! one of my self-described superpowers is being very "plugged in" but, this doesn't happen without significant time and attention costs.

what you've made changes the game imho. now I need the same for all the Slacks & Discords I'm in.

19.02.2025 16:47 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Mark Zuckerberg
Around?

Facebook engineer
Yeah

Mark Zuckerberg
If you could buy one of either Instagram, Foursquare or Pinterest, which would you buy?

Mark Zuckerberg Around? Facebook engineer Yeah Mark Zuckerberg If you could buy one of either Instagram, Foursquare or Pinterest, which would you buy?

Mark Zuckerberg messages Facebook engineer

April 5, 2012

08.02.2025 20:23 โ€” ๐Ÿ‘ 81    ๐Ÿ” 6    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 3

Just finished watching the webinar on introducing SDF by dbt team. After seeing SDF in action, I have to admit that I am really looking forward to the future of dbt engine. I was wondering when dbt was going to bring in notable changes to the developer experience and this might be it.

#databs

01.02.2025 16:00 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Building Query Compilers by Guido Moerkotte (695 pages)

Note: This is repost of @emresevinc.bsky.social's on X

Link: pi3.informatik.uni-mannheim.de/~moer/queryc...

26.01.2025 18:27 โ€” ๐Ÿ‘ 7    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Yeah that ADP chart hurt my friend y-axis so bad I think axisslaughter should be a punishable crime

26.01.2025 16:19 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Bernie Sanders meme labelled with Dave Connors's name: I am once again asking you to move up the stack and learn how database compilers work

Bernie Sanders meme labelled with Dave Connors's name: I am once again asking you to move up the stack and learn how database compilers work

post two! The key technologies behind SQL Comprehension

#databs, do not fear compiler concepts -- embrace them and the new world order they enable for us! read the great blog (& pretty diagrams!)

@daveconnors3.bsky.social, truly a masterpiece
docs.getdbt.com/blog/sql-com...

24.01.2025 18:22 โ€” ๐Ÿ‘ 4    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

written by my esteemed collegue @joellab.es

23.01.2025 19:40 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

post one of a new series kicking off today whose larger thrust is effectively: understanding SQL, not just a job for the database! Post 1 lays out what the levels of understanding
docs.getdbt.com/blog/the-lev...
#databs

23.01.2025 19:39 โ€” ๐Ÿ‘ 4    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

my feeling is that I haven't heard back, but have heard from others that they've been accepted so I'm operating under assumption that my talk has not been accepted

14.01.2025 21:31 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

Look forward to these every year. Always love Andy's candor and insight on our little corner of the world. #databs

04.01.2025 01:42 โ€” ๐Ÿ‘ 6    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@dataders is following 19 prominent accounts