's Avatar

@xevix.bsky.social

Software Developer interested in data, web, languages. Silicon Valley/Tokyo. https://medium.com/@xevix https://github.com/xevix

70 Followers  |  33 Following  |  116 Posts  |  Joined: 02.01.2024  |  1.9063

Latest posts by xevix.bsky.social on Bluesky

How are the compile times?

18.11.2025 07:39 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
KEYNOTE: Hannes Mühleisen - Data Architecture Turned Upside Down | PyData Amsterdam 2025
YouTube video by PyData KEYNOTE: Hannes Mühleisen - Data Architecture Turned Upside Down | PyData Amsterdam 2025

The PyData Amsterdam 2025 keynote β€œMinus Three Tier: Data Architecture Turned Upside Down” by @hannes.muehleisen.org is out now.

www.youtube.com/watch?v=DxwD...

31.10.2025 14:05 β€” πŸ‘ 24    πŸ” 3    πŸ’¬ 1    πŸ“Œ 1
SQL Arena Planner Ranking (November 2025)

SQL Arena Planner Ranking (November 2025)

New database leaderboard from Yellowbrick ranks the quality of DBMS optimizer estimates and plans. They only evaluate TPC-H for now and report results for Postgres + DuckDB + MSSQL: sql-arena.com/components/p...
Repo: github.com/sql-arena/db...
LinkedIn Group: www.linkedin.com/groups/15775...

03.11.2025 17:06 β€” πŸ‘ 14    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Preview
[Future Data] Where We're Going, We Don't Need Rows: Columnar Data Connectivity with ADBC - Carnegie Mellon Database Group ADBC (Arrow Database Connectivity) is Apache Arrow’s answer to ODBC and JDBC:... Read More +

Today's Future Data Systems Seminar Speaker: Ian Cook (@ian.columnar.tech) will present @columnar.tech's work on Apache Arrow's database connectivity API (ADBC). ADBC is available in modern DBMSs. Zoom talk open to public at 4:30pm ET. YouTube video available after: db.cs.cmu.edu/events/futur...

20.10.2025 11:38 β€” πŸ‘ 16    πŸ” 8    πŸ’¬ 0    πŸ“Œ 1
Preview
[Future Data] Vortex: LLVM for File Formats - Carnegie Mellon Database Group Apache Parquet revolutionized columnar storage after its initial release in 2013, but... Read More +

Today's Future Data Systems Seminar Speaker: Will Manning (@willmanning.com) will present @spiraldb.com's Vortex file format. Vortex is now a @linuxfoundation.org project. Zoom talk open to public at 4:30pm ET. YouTube video available after: db.cs.cmu.edu/events/futur...

13.10.2025 11:10 β€” πŸ‘ 4    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0
Benchmark Results for DuckDB v1.4 LTS DuckDB v1.4 LTS is both fast and scalable. In in-memory mode, it is the fastest system on ClickBench. In disk-based mode, it can run complex analytical queries on a dataset equivalent to 100 TB CSV fi...

Processing 100Tb of CSV files on a single machine is insane, little over 1hr per query, even if on a powerful AWS instance. Question heavily the need for complex systems when this is what’s possible now. Can’t wait for full write-up. Incredible work.

duckdb.org/2025/10/09/b...

10.10.2025 14:12 β€” πŸ‘ 8    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

It’s interesting the tradeoffs if the main goal is no operating cost and decent startup time. Definitely painful to develop on regularly but for a one and done this makes a lot of sense. I wonder if Rust compile times will come down further one day.

04.10.2025 22:21 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Taking the DuckDb hoodie on a trip. Not exactly Amsterdam but I’ve heard they like columnar databases here too.

04.10.2025 12:06 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Push Hive filtering into Glob() by xevix Β· Pull Request #18518 Β· duckdb/duckdb Summary Addresses part of #7620 for local filesystem. Part 1 of the work split off from the original PR #18430. The next part will handle fallback to eager loading in case of Hive issues. Push down...

I didn’t quite make it in time for Hive filtering lazy list to speed up filtering Hive folder with many partitions, but will pick up again before next release w/ luck πŸ™‡β€β™‚οΈ github.com/duckdb/duckd...

16.09.2025 16:25 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Congrats to DuckDB team on LTS release w/ many great improvements! Hidden among them you can now use Hive filtering with read_blob, and SHOW TABLES FROM specific db w/o USE.

16.09.2025 16:25 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸ“ˆ DuckDB 1.4.0 is out! This is our first LTS release which comes with *one year of community support*. It also supports database encryption, the MERGE SQL statement and Iceberg writes.

For more details, read the announcement blog post at
duckdb.org/2025/09/16/a...

16.09.2025 11:55 β€” πŸ‘ 53    πŸ” 22    πŸ’¬ 0    πŸ“Œ 3
Preview
eBird in DuckDB I saw this post by the Clickhouse team which was doing a cool test of the eBird dataset from Cornell University, and wondered how DuckDB…

I tried loading eBird data (1.5B rows CSV ZIP) using DuckDB for fun, inspired by a Clickhouse blog post and a bit of curiosity. Both did well, DuckDB slightly faster querying and Parquet ingest, Clickhouse w/ native zip support, optimized for ingest and multitenancy. xevix.medium.com/ebird-in-duc...

02.09.2025 01:15 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Thumbnail: Saving Private Hash Join

Thumbnail: Saving Private Hash Join

Vol:18 No:8 β†’ Saving Private Hash Join
πŸ‘₯ Authors: Laurens Kuiper, Paul Gross, Peter Boncz, Hannes MΓΌhleisen
πŸ“„ PDF: https://www.vldb.org/pvldb/vol18/p2748-kuiper.pdf

03.08.2025 06:00 β€” πŸ‘ 14    πŸ” 4    πŸ’¬ 0    πŸ“Œ 1
Preview
Data Tool Component Sharing There are many partly overlapping tools in the data world, which is what inspired things like Calcite to have modular components for…

Is there too much duplicated effort in data tools? I sometimes wonder about this.

xevix.medium.com/data-tool-co...

29.08.2025 20:08 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Yeah, I don’t think MS is interested in 3rd party devs so much.

26.08.2025 16:45 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Unfortunately yes, I was already going to get one for something else and this put me over the edge. Maybe I'll also build a gaming rig one day in the distant future haha.

25.08.2025 22:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Compiling DuckDB on Windows 11 (ARM) using UTM VM on macOS to debug Windows compile issues. It's a shame msvc doesn't exist outside of Windows, mingw/clang don't work the same and cross-compiling is tricky. Compiling takes 5-10 mins (instead of 1-2 mins native), but it works πŸŽ‰!

25.08.2025 21:30 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Just the 1 day of data above is ~125GiB compressed, ~585GiB uncompressed. One month is about 3.75TiB compressed, or 17.5TiB. It makes sense this dataset is so popular for testing and analysis, wow.

15.08.2025 03:19 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Stretching DuckDB w/ Common Crawl, ~1.7B rows, ~300 parquet files. ~2-3s for single-column aggregations, ~2-3 mins to SUMMARIZE the data, peaking at ~12-14GB memory usage. Not exactly real-time, but the fact you can do this on a laptop with no server setups or Spark pipelines is still amazing.

15.08.2025 03:10 β€” πŸ‘ 44    πŸ” 9    πŸ’¬ 1    πŸ“Œ 1

Uploaded a simplified query. Had to delete and repost since no edit button on the snippets site, sorry for the spam haha.

13.08.2025 06:08 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Haha already posted in case someone benefits there too.

13.08.2025 05:13 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Neat little hack to get Hive partition list in DuckDB, useful for an overview. Might be neat to have built-in. gist.github.com/xevix/04f33d...

12.08.2025 20:14 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

Automator using simple shell script to call sqlfluff. Added keyboard shortcut for the service too. Easier than making browser extensions for each browser, although unfortunately not cross-platform.

29.06.2025 19:46 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

Added an Automator quick action to run sqlfluff for formatting SQL in browser fields, used here in the DuckDB UI. Only needs sqlfluff, optionally configure rules. Would be cool to get built-in one day, but works for now.

29.06.2025 19:46 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The pieces were already there, but progress is not always linear. Majority of cases now handled by [Mother]Duck[DB|Lake], Spark for extreme cases. Single-node compute 10 years from now is going to be mindblowing, but already exciting what we can do today.

26.06.2025 23:17 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Apache Drill allowed storing metadata in an RDBMS, Iceberg scaling data, Arrow scaling columnar memory, Parquet columnar storage, Spark distributed compute, DuckDB single-node compute. DuckLake scales metadata and storage w/ compute on single node. Motherduck distributes compute.

26.06.2025 23:17 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
ClickBench rankings as of June 2025.

ClickBench rankings as of June 2025.

Shots fired by @firebolthq.bsky.social with their new on-prem executable (www.firebolt.io/blog/introdu...). They have dethroned the Umbra system by The Germansβ„’ at β€ͺ@tum.de in the ClickBench rankings: benchmark.clickhouse.com

24.06.2025 23:10 β€” πŸ‘ 17    πŸ” 3    πŸ’¬ 3    πŸ“Œ 0
Preview
GitHub - xevix/noaa-d3: NOAA GHCN data visualization using d3.js and DuckDB NOAA GHCN data visualization using d3.js and DuckDB - xevix/noaa-d3

The code is about the quality you'd expect from a quick vibecode but should work. Parquets are fetched from S3 and cached on demand.

github.com/xevix/noaa-d3

24.06.2025 21:52 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

Vibe coding NOAA GHCN weather visualization from scratch w/ Claude Code and DuckDB MCP. There's Evidence and other vis tools but I don't want a pre-cached set of data, I want it to query live. Cool that this can be put together w/o writing my own HTML/CSS, as a web backend dev πŸ˜…

21.06.2025 22:58 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Early directory tree pruning with hive-partitioned `parquet_scan` Β· duckdb duckdb Β· Discussion #7620 I occasionally work with 1TB+ hive-partitioned parquet datasets where I want to extract small 10GB chunks to process. An issue I encounter with duckdb is that despite of the WHERE constraints I use...

Known issue, but happy that there's a workaround for now. github.com/duckdb/duckd...

21.06.2025 20:08 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@xevix is following 18 prominent accounts