๐
03.04.2025 18:16 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0@dataders.bsky.social
DX @ dbt
๐
03.04.2025 18:16 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0yeah the multi-cloud story is far from over. you can imagine that without egress costs, it might actually be performant to move data b/w AWS and Azure it the data centers are close enough.
Iceberg kinda makes DWH on Cloudflare R2 feasible given it has S3-compatible API. Sippy seems cool
new rule: talk about Iceberg without mentioning Hive or ACID properties.
the vast majority of SQL users don't care (and shouldn't)! it's like explaining Dropbox starting w/ bringing up libfuse
If you're new to iceberg lmk if you get something from this!
roundup.getdbt.com/p/iceberg-gi...
#databs
lots of juicy looking papers in the agenda for EDBT/ICDT 2025 happening this year in Barcelona! definitely going to be diving in more later today
edbticdt2025.upc.edu?contents=det...
#databs #edbc2025 #icdt2025
I FINALLY asked for pronouncement on PEP 751 -- lock files for #Python : peps.python.org/pep-0751/ .
19.03.2025 22:04 โ ๐ 60 ๐ 14 ๐ฌ 0 ๐ 0p.s. also TI[F]L what "serde" stands for after seeing the word for years ๐คฆ
17.03.2025 14:44 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0this is the clearest case for Arrow that I've ever seen. I love that it's high-level but also doesn't shy away from details when it's important.
this should be a #databs canon text imho. thanks @ianmcook.bsky.social !
TIL about relationalplayground.com
makes relational algebra accessible to a SQL monkey like me. much better than in a dry textbook where it's normally found.
#databs
imho, the most clear case Databricks has made in public on their Iceberg future post-Tabular acquisition.
I agree with this vision of the future and it's nice to see DBRX sharing how they see themselves participating in it.
worth clicking through the deck! #databs
speakerdeck.com/databricksja...
11/11) p.s. forgot to link to the repo!
github.com/deepseek-ai/...
10) So sick that a smallpond pipeline returns a LogicalPlan representing a DAG where each node is a distinct data processing task.
Imagine if a dbt DAG resulted in a single logical plan that operates across multiple engines.
and then they can optimize the plan before execution as well! ๐คฏ๐คฏ๐คฏ
9) Arrow is the unsung hero of this project (and arguably all innovation in data ecosystem).
it's what enables:
1. all this interchangeability of query engines
2. (likely) using duckdb in a distributed environment in the first place
8) this HN called out that smallpond abstracts supports using different query engines for different jobs (shuffling vs sorting).
Very bullish on this future of right tool for right job and making it as simple as a config
news.ycombinator.com/item?id=4323...
7)
TIRED: "big vs. small" & "distributed vs single-node"
WIRED: tactical deployment of single-node query engines within distributed frameworks.
another great example is Apache Comet which plugs DataFusion into Spark to accelerate single-node operations resulting in overal Spark performance speedups
6) Making smallpond 5 years ago would have been very difficult! but the emergence of lower-level, off-the-shelf components greatly accelerate the development time.
Within the year, we'll to see this new paradigm catch on. Future examples will probably be using DataFusion not DuckDB.
5) there's been previous discussion on DeepSeek's scrappiness and I think it shows here. They had a vision of what they wanted and rather than paying for software or forcing their vision into an existing tool, were able to ship exactly what they wanted
03.03.2025 15:12 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 04) smallpond is a bespoke data processing framework using off-the-shelf, OSS components (ray, arrow, duckdb, polars).
Why didn't they use {TOOL}? My guesses
โ dbt: they wanted Python Dataframe API
โ Airflow: not as close to metal as Ray
โ pytorch or ray[data]: idk tbh
3) ray.io is the foundation of any training and inference infrastructure. I haven't had much exposure to Ray as a SQL monkey using DWHs, but it's just recently clicked for me how big of a deal it is
03.03.2025 15:01 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 02) so cool to begin to see the data infra that supports training LLMs. Open weights is cool, and RAG makes sense, but as a former XGBooster turned "data engineer", seeing the data cleaning pipelines is what I've most wanted to see.
03.03.2025 14:58 โ ๐ 4 ๐ 0 ๐ฌ 1 ๐ 01) my top-level takeaways on DeepSeek's smallpond: a distributed data processing framework used for training LLMs #databs
03.03.2025 14:57 โ ๐ 8 ๐ 2 ๐ฌ 1 ๐ 0dude -- so cool! one of my self-described superpowers is being very "plugged in" but, this doesn't happen without significant time and attention costs.
what you've made changes the game imho. now I need the same for all the Slacks & Discords I'm in.
Mark Zuckerberg Around? Facebook engineer Yeah Mark Zuckerberg If you could buy one of either Instagram, Foursquare or Pinterest, which would you buy?
Mark Zuckerberg messages Facebook engineer
April 5, 2012
Just finished watching the webinar on introducing SDF by dbt team. After seeing SDF in action, I have to admit that I am really looking forward to the future of dbt engine. I was wondering when dbt was going to bring in notable changes to the developer experience and this might be it.
#databs
Building Query Compilers by Guido Moerkotte (695 pages)
Note: This is repost of @emresevinc.bsky.social's on X
Link: pi3.informatik.uni-mannheim.de/~moer/queryc...
Yeah that ADP chart hurt my friend y-axis so bad I think axisslaughter should be a punishable crime
26.01.2025 16:19 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Bernie Sanders meme labelled with Dave Connors's name: I am once again asking you to move up the stack and learn how database compilers work
post two! The key technologies behind SQL Comprehension
#databs, do not fear compiler concepts -- embrace them and the new world order they enable for us! read the great blog (& pretty diagrams!)
@daveconnors3.bsky.social, truly a masterpiece
docs.getdbt.com/blog/sql-com...
written by my esteemed collegue @joellab.es
23.01.2025 19:40 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0post one of a new series kicking off today whose larger thrust is effectively: understanding SQL, not just a job for the database! Post 1 lays out what the levels of understanding
docs.getdbt.com/blog/the-lev...
#databs
my feeling is that I haven't heard back, but have heard from others that they've been accepted so I'm operating under assumption that my talk has not been accepted
14.01.2025 21:31 โ ๐ 0 ๐ 0 ๐ฌ 2 ๐ 0Look forward to these every year. Always love Andy's candor and insight on our little corner of the world. #databs
04.01.2025 01:42 โ ๐ 6 ๐ 0 ๐ฌ 0 ๐ 0