How are the compile times?
18.11.2025 07:39 β π 0 π 0 π¬ 1 π 0@xevix.bsky.social
Software Developer interested in data, web, languages. Silicon Valley/Tokyo. https://medium.com/@xevix https://github.com/xevix
How are the compile times?
18.11.2025 07:39 β π 0 π 0 π¬ 1 π 0The PyData Amsterdam 2025 keynote βMinus Three Tier: Data Architecture Turned Upside Downβ by @hannes.muehleisen.org is out now.
www.youtube.com/watch?v=DxwD...
SQL Arena Planner Ranking (November 2025)
New database leaderboard from Yellowbrick ranks the quality of DBMS optimizer estimates and plans. They only evaluate TPC-H for now and report results for Postgres + DuckDB + MSSQL: sql-arena.com/components/p...
Repo: github.com/sql-arena/db...
LinkedIn Group: www.linkedin.com/groups/15775...
Today's Future Data Systems Seminar Speaker: Ian Cook (@ian.columnar.tech) will present @columnar.tech's work on Apache Arrow's database connectivity API (ADBC). ADBC is available in modern DBMSs. Zoom talk open to public at 4:30pm ET. YouTube video available after: db.cs.cmu.edu/events/futur...
20.10.2025 11:38 β π 16 π 8 π¬ 0 π 1Today's Future Data Systems Seminar Speaker: Will Manning (@willmanning.com) will present @spiraldb.com's Vortex file format. Vortex is now a @linuxfoundation.org project. Zoom talk open to public at 4:30pm ET. YouTube video available after: db.cs.cmu.edu/events/futur...
13.10.2025 11:10 β π 4 π 4 π¬ 0 π 0Processing 100Tb of CSV files on a single machine is insane, little over 1hr per query, even if on a powerful AWS instance. Question heavily the need for complex systems when this is whatβs possible now. Canβt wait for full write-up. Incredible work.
duckdb.org/2025/10/09/b...
Itβs interesting the tradeoffs if the main goal is no operating cost and decent startup time. Definitely painful to develop on regularly but for a one and done this makes a lot of sense. I wonder if Rust compile times will come down further one day.
04.10.2025 22:21 β π 1 π 0 π¬ 0 π 0Taking the DuckDb hoodie on a trip. Not exactly Amsterdam but Iβve heard they like columnar databases here too.
04.10.2025 12:06 β π 3 π 0 π¬ 0 π 0I didnβt quite make it in time for Hive filtering lazy list to speed up filtering Hive folder with many partitions, but will pick up again before next release w/ luck πββοΈ github.com/duckdb/duckd...
16.09.2025 16:25 β π 2 π 0 π¬ 0 π 0Congrats to DuckDB team on LTS release w/ many great improvements! Hidden among them you can now use Hive filtering with read_blob, and SHOW TABLES FROM specific db w/o USE.
16.09.2025 16:25 β π 2 π 0 π¬ 1 π 0π DuckDB 1.4.0 is out! This is our first LTS release which comes with *one year of community support*. It also supports database encryption, the MERGE SQL statement and Iceberg writes.
For more details, read the announcement blog post at
duckdb.org/2025/09/16/a...
I tried loading eBird data (1.5B rows CSV ZIP) using DuckDB for fun, inspired by a Clickhouse blog post and a bit of curiosity. Both did well, DuckDB slightly faster querying and Parquet ingest, Clickhouse w/ native zip support, optimized for ingest and multitenancy. xevix.medium.com/ebird-in-duc...
02.09.2025 01:15 β π 3 π 0 π¬ 0 π 0Thumbnail: Saving Private Hash Join
Vol:18 No:8 β Saving Private Hash Join
π₯ Authors: Laurens Kuiper, Paul Gross, Peter Boncz, Hannes MΓΌhleisen
π PDF: https://www.vldb.org/pvldb/vol18/p2748-kuiper.pdf
Is there too much duplicated effort in data tools? I sometimes wonder about this.
xevix.medium.com/data-tool-co...
Yeah, I donβt think MS is interested in 3rd party devs so much.
26.08.2025 16:45 β π 0 π 0 π¬ 0 π 0Unfortunately yes, I was already going to get one for something else and this put me over the edge. Maybe I'll also build a gaming rig one day in the distant future haha.
25.08.2025 22:12 β π 0 π 0 π¬ 1 π 0Compiling DuckDB on Windows 11 (ARM) using UTM VM on macOS to debug Windows compile issues. It's a shame msvc doesn't exist outside of Windows, mingw/clang don't work the same and cross-compiling is tricky. Compiling takes 5-10 mins (instead of 1-2 mins native), but it works π!
25.08.2025 21:30 β π 3 π 0 π¬ 1 π 0Just the 1 day of data above is ~125GiB compressed, ~585GiB uncompressed. One month is about 3.75TiB compressed, or 17.5TiB. It makes sense this dataset is so popular for testing and analysis, wow.
15.08.2025 03:19 β π 6 π 0 π¬ 0 π 0Stretching DuckDB w/ Common Crawl, ~1.7B rows, ~300 parquet files. ~2-3s for single-column aggregations, ~2-3 mins to SUMMARIZE the data, peaking at ~12-14GB memory usage. Not exactly real-time, but the fact you can do this on a laptop with no server setups or Spark pipelines is still amazing.
15.08.2025 03:10 β π 44 π 9 π¬ 1 π 1Uploaded a simplified query. Had to delete and repost since no edit button on the snippets site, sorry for the spam haha.
13.08.2025 06:08 β π 1 π 0 π¬ 0 π 0Haha already posted in case someone benefits there too.
13.08.2025 05:13 β π 1 π 0 π¬ 1 π 0Neat little hack to get Hive partition list in DuckDB, useful for an overview. Might be neat to have built-in. gist.github.com/xevix/04f33d...
12.08.2025 20:14 β π 4 π 0 π¬ 1 π 0Automator using simple shell script to call sqlfluff. Added keyboard shortcut for the service too. Easier than making browser extensions for each browser, although unfortunately not cross-platform.
29.06.2025 19:46 β π 0 π 0 π¬ 0 π 0Added an Automator quick action to run sqlfluff for formatting SQL in browser fields, used here in the DuckDB UI. Only needs sqlfluff, optionally configure rules. Would be cool to get built-in one day, but works for now.
29.06.2025 19:46 β π 0 π 0 π¬ 1 π 0The pieces were already there, but progress is not always linear. Majority of cases now handled by [Mother]Duck[DB|Lake], Spark for extreme cases. Single-node compute 10 years from now is going to be mindblowing, but already exciting what we can do today.
26.06.2025 23:17 β π 0 π 0 π¬ 1 π 0Apache Drill allowed storing metadata in an RDBMS, Iceberg scaling data, Arrow scaling columnar memory, Parquet columnar storage, Spark distributed compute, DuckDB single-node compute. DuckLake scales metadata and storage w/ compute on single node. Motherduck distributes compute.
26.06.2025 23:17 β π 1 π 0 π¬ 1 π 0ClickBench rankings as of June 2025.
Shots fired by @firebolthq.bsky.social with their new on-prem executable (www.firebolt.io/blog/introdu...). They have dethroned the Umbra system by The Germansβ’ at βͺ@tum.de in the ClickBench rankings: benchmark.clickhouse.com
24.06.2025 23:10 β π 17 π 3 π¬ 3 π 0The code is about the quality you'd expect from a quick vibecode but should work. Parquets are fetched from S3 and cached on demand.
github.com/xevix/noaa-d3
Vibe coding NOAA GHCN weather visualization from scratch w/ Claude Code and DuckDB MCP. There's Evidence and other vis tools but I don't want a pre-cached set of data, I want it to query live. Cool that this can be put together w/o writing my own HTML/CSS, as a web backend dev π
21.06.2025 22:58 β π 1 π 0 π¬ 1 π 0Known issue, but happy that there's a workaround for now. github.com/duckdb/duckd...
21.06.2025 20:08 β π 0 π 0 π¬ 0 π 0